--------------------------------------------------------------------------- Citations If you publish results obtained using MrBayes you may want to cite the program. The appropriate citation is: Huelsenbeck, J. P. and F. Ronquist. 2001. MRBAYES: Bayesian inference of phylogeny. Bioinformatics 17:754-755. Ronquist, F. and J. P. Huelsenbeck. 2003. MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572-1574. If you use the parallel abilities of the program, you may also want to cite Altekar, G., S. Dwarkadas, J. P. Huelsenbeck, and F. Ronquist. 2004. Parallel Metropolis-coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20:407-415. You should also cite other papers for different ideas that are implemented in the program. For example, the program performs Bayesian inference of phylogeny, an idea that was first proposed in the following papers: Larget, B., and D. Simon. 1999. Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16:750-759. Li, S. 1996. Phylogenetic tree construction using Markov chain Monte carlo. Ph. D. dissertation, Ohio State University, Columbus. Mau, B. 1996. Bayesian phylogenetic inference via Markov chain Monte carlo methods. Ph. D. dissertation, University of Wisconsin, Madison. Mau, B., and M. Newton. 1997. Phylogenetic inference for binary data on dendrograms using Markov chain Monte Carlo. Journal of Computational and Graphical Statistics 6:122-131. Mau, B., M. Newton, and B. Larget. 1999. Bayesian phylogenetic inference via Markov chain Monte carlo methods. Biometrics. 55:1-12. Newton, M., B. Mau, and B. Larget. 1999. Markov chain Monte Carlo for the Bayesian analysis of evolutionary trees from aligned molecular sequences. In Statistics in molecular biology (F. Seillier- Moseiwitch, T. P. Speed, and M. Waterman, eds.). Monograph Series of the Institute of Mathematical Statistics. Rannala, B., and Z. Yang. 1996. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J. Mol. Evol. 43:304-311. Yang, Z., and B. Rannala. 1997. Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte carlo method. Molecular Biology and Evolution. 14:717-724. MrBayes uses Markov chain Monte Carlo (MCMC) to approximate the posterior probability of trees. MCMC was developed in the following papers: Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. 1953. Equations of state calculations by fast computing machines. J. Chem. Phys. 21:1087-1091. Hastings, W. K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97-109. In particular, MrBayes implements a variant of MCMC that was described by Charles Geyer: Geyer, C. J. 1991. Markov chain Monte Carlo maximum likelihood. Pages 156-163 in Computing Science and Statistics: Proceed- ings of the 23rd Symposium on the Interface. (E. M. Keramidas, ed.). Fairfax Station: Interface Foundation. MrBayes implements a large number of DNA substitution models. These models are of three different structures. The "4by4" models are the usual flavor of phylogenetic models. The "Doublet" model was first proposed by Schoniger, M., and A. von Haeseler. 1994. A stochastic model and the evolution of autocorrelated DNA sequences. Molecular Phylogenetics and Evolution 3:240-247. The program also implements codon models. Two papers, published back-to-back were the first to implement a codon model of DNA substitution in which the substitution process is modelled on the codon, not on a site-by-site basis: Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein coding DNA sequences. Molecular Biology and Evolution. 11:725-736. Muse, S., and B. Gaut. 1994. A likelihood approach for comparing synonymous and non-synonymous substitution rates, with application to the chloroplast genome. Molecular Biology and Evolution. 11:715-724. The program can be used to detect positively slected amino-acid sites using a full hierarchical Bayes analysis. The method is based on the excellent paper by Nielsen and Yang: Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 148:929-936. The previous four papers describe three different stuctures for the nuc- leotide models implemented in MrBayes--the four-by-four models, the 16-by-16 (doublet) models and the 64-by-64 (codon) models. The program implements three different substitution models within each model structure. These include the nst=1 models: Jukes, T., and C. Cantor. 1969. Evolution of protein molecules. Pages 21-132 in Mammalian Protein Metabolism. (H. Munro, ed.). Academic Press, New York. Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution 17:368-376. the nst=2 models: Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution. 16:111-120. Hasegawa, M., T. Yano, and H. Kishino. 1984. A new molecular clock of mitochondrial DNA and the evolution of Hominoids. Proc. Japan Acad. Ser. B 60:95-98. Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape split by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22:160-174. and the the nst=6 models: Tavare, S. 1986. Some probabilistic and statisical problems on the analysis of DNA sequences. Lect. Math. Life Sci. 17:57-86. 17:368-376. MrBayes implements a large number of amino-acid models. These include: Poisson -- Bishop, M.J., and A.E. Friday. 1987. Tetropad relationships: the molecular evidence. Pp. 123?139 in Molecules and morphology in evolution: conflict or compromise? (C. Patterson, ed.). Cambridge University Press, Cambridge, England. Jones -- Jones, D.T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275?282. Dayhoff -- Dayhoff, M.O., R.M. Schwartz, and B.C. Orcutt. 1978. A model of evol- utionary change in proteins. Pp. 345?352 in Atlas of protein sequence and structure. Vol. 5, Suppl. 3. National Biomedical Research Foundation, Washington, D.C. Mtrev -- Adachi, J. and M. Hasegawa. 1996. MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Computer Science Monographs of Institute of Statistical Mathematics 28:1-150. Mtmam -- Cao, Y., A. Janke, P.J. Waddell, M. Westerman, O. Takenaka, S. Murata, N. Okada, S. Paabo, and M. Hasegawa. 1998. Conflict amongst individual mitochondrial proteins in resolving the phylogeny of eutherian orders. Journal of Molecular Evolution Yang, Z., R. Nielsen, and M. Hasegawa. 1998. Models of amino acid substitution and applications to mitochondrial protein evolution Molecular Biology and Evolution 15:1600?1611. WAG -- Whelan, S. and Goldman, N. 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum- likelihood approach. Molecular Biology and Evolution 18:691-699. Rtrev -- Dimmic M.W., J.S. Rest, D.P. Mindell, and D. Goldstein. 2002. RArtREV: An amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. Journal of Molecular Evolution 55: 65-73. Cprev -- Adachi, J., P. Waddell, W. Martin, and M. Hasegawa. 2000. Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. Journal of Molecular Evolution 50:348-358. Blosum -- Henikoff, S., and J. G. Henikoff. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci., U.S.A. 89:10915-10919. The matrix implemented in MrBayes is Blosum62. Vt -- Muller, T., and M. Vingron. 2000. Modeling amino acid replacement. Journal of Computational Biology 7:761-776. MrBayes implements a simple Jukes-Cantor-like model for restriction sites and other binary data. A problem with some of these data is that there is a coding bias, such that certain characters are missing from any observable data matrix. It is impossible, for instance, to observe restriction sites that are absent in all the studied taxa. However, MrBayes corrects for this coding bias according to an idea described in Felsenstein, J. 1992. Phylogenies from restriction sites: A maximum- likelihood approach. Evolution 46:159-173. The model used by MrBayes for 'standard' or morphological data is based on the ideas originally presented by Lewis, P. O. 2001. A likelihood approach to estimating phylogeny from discrete morphological character data. Systematic Biology 50:913-925. For both DNA sequence and amino-acid data, the program allows rates to change under a covarion-like model, first described by Tuffley and Steel Tuffley, C., and M. Steel. 1998. Modeling the covarion hypothesis of nucleotide substitution. Mathematical Biosciences 147:63-91. and implemented by Huelsenbeck (2002) Huelsenbeck, J. P. 2002. Testing a covariotide model of DNA sub- stitution. Molecular Biology and Evolution 19(5):698-707. Galtier (2001) implements a different variant of the covarion model in a paper that is worth reading: Galtier, N. 2001. Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol. Biol. Evol. 18:866-873. A number of models are available that allow rates to vary across the characters. The program implements the proportion of invariable sites model and two variants of gamma distributed rate variation. Yang's (1993) paper is a good one to cite for implementing a gamma-distributed rates model. In the 1994 paper he provides a way to approximate the continuous gamma distribution: Yang, Z. 1993. Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Molecular Biology and Evolution 10:1396-1401. Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods. Journal of Molecular Evolution 39:306-314. The program also implements Yang's autocorrelated gamma model. In this model, the rate at one site depends to some extent on the rate at an adjacent site. The appropriate citation for this model is: Yang, Z. 1995. A space-time process model for the evolution of DNA sequences. Genetics 139:993-1005. Ancestral state reconstruction. These two papers show how ancestral states on a tree can be reconstructed. The Yang et al. paper implements an empirical Bayes approach. The Huelsenbeck and Bollback paper implements a hierarchical Bayes approach. The method implemented in MrBayes is hierarchical Bayes as it integrates over uncertainty in model parameters. -- Yang, Z., S. Kumar, and M. Nei. 1995. A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641 1650. Huelsenbeck, J. P., and J. P. Bollback. 2001. Empirical and hier- archical Bayesian estimation of ancestral states. Systematic Biology 50:351-366. You may also want to consult a more recent review of Bayesian reconstruction of ancestral states and character evolution: Ronquist, F. 2004. Bayesian inference of character evolution. Trends in Ecology and Evolution 19: 475-481. The program implements an incredibly parameter rich model, first described by Tuffley and Steel (1997), that orders trees in the same way as the so-called parsimony method of phylogenetic inference. The appropriate citation is: Tuffley, C., and M. Steel. 1997. Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull. Math. Bio. 59:581-607. ---------------------------------------------------------------------------