While all of these five mutations have in fact been observed in our full database of 78,590 samples, no combination of any two of them has appeared in any sample. However, since these six samples may not represent the full diversity of Neanderthal lineages, we have also investigated separately the level of divergence they show from our entire database. No sample in our database is as divergent as these Neanderthal samples, in terms of its distance from its nearest neighbor outside its own Hg, or its distance from the rCRS, which we take to represent a “random” modern human mtDNA (Table S11). We also observe that the most divergent samples in our database all carry well-known HVS-I motifs characteristic of African Hg L branches. While it is difficult to translate these findings into probabilities, it is clear that our results do not support the existence of mtDNA samples of Neanderthal (or other archaic Homo) origin in our database.PLoS Genetics
The Genographic Project Public Participation Mitochondrial DNA Database
Doron M. Behar et al.
The Genographic Project is studying the genetic signatures of ancient human migrations and creating an open-source research database. It allows members of the public to participate in a real-time anthropological genetics study by submitting personal samples for analysis and donating the genetic results to the database. We report our experience from the first 18 months of public participation in the Genographic Project, during which we have created the largest standardized human mitochondrial DNA (mtDNA) database ever collected, comprising 78,590 genotypes. Here, we detail our genotyping and quality assurance protocols including direct sequencing of the mtDNA HVS-I, genotyping of 22 coding-region SNPs, and a series of computational quality checks based on phylogenetic principles. This database is very informative with respect to mtDNA phylogeny and mutational dynamics, and its size allows us to develop a nearest neighbor–based methodology for mtDNA haplogroup prediction based on HVS-I motifs that is superior to classic rule-based approaches. We make available to the scientific community and general public two new resources: a periodically updated database comprising all data donated by participants, and the nearest neighbor haplogroup prediction tool.