July 12, 2015

Phylogeographic refinement of haplogroup E

Genome Biol Evol (2015) 7 (7): 1940-1950.

Phylogeographic Refinement and Large Scale Genotyping of Human Y Chromosome Haplogroup E Provide New Insights into the Dispersal of Early Pastoralists in the African Continent

Beniamino Trombetta et al.

Haplogroup E, defined by mutation M40, is the most common human Y chromosome clade within Africa. To increase the level of resolution of haplogroup E, we disclosed the phylogenetic relationships among 729 mutations found in 33 haplogroup DE Y-chromosomes sequenced at high coverage in previous studies. Additionally, we dissected the E-M35 subclade by genotyping 62 informative markers in 5,222 samples from 118 worldwide populations. The phylogeny of haplogroup E showed novel features compared with the previous topology, including a new basal dichotomy. Within haplogroup E-M35, we resolved all the previously known polytomies and assigned all the E-M35* chromosomes to five new different clades, all belonging to a newly identified subhaplogroup (E-V1515), which accounts for almost half of the E-M35 chromosomes from the Horn of Africa. Moreover, using a Bayesian phylogeographic analysis and a single nucleotide polymorphism-based approach we localized and dated the origin of this new lineage in the northern part of the Horn, about 12 ka. Time frames, phylogenetic structuring, and sociogeographic distribution of E-V1515 and its subclades are consistent with a multistep demic spread of pastoralism within north-eastern Africa and its subsequent diffusion to subequatorial areas. In addition, our results increase the discriminative power of the E-M35 haplogroup for use in forensic genetics through the identification of new ancestry-informative markers.

Link

3 comments:

Matty K said...

"All the chromosomes previously referred to as paragroup E-M35*(×V92, V42, V6, M123, V68, M293, and V257) are now assigned to five different branches all belonging to haplogroup E-V1515"

M123? Surely not. Not according to figure 2, anyway ....

z1wv1 said...

Is mutation M40 necessary to be Haplogroup E?

eurologist said...

"our phylogeographic analysis, based on thousands of samples worldwide, suggests that the radiation of haplogroup E started about 58 ka, somewhere in sub-Saharan Africa."

I have a hard time deducing that from the data selected and the data presented.

"One of the most interesting findings of our phylogeographic refinement is the identification of a new clade (E-V1515), which originated about 12 ka (95% CI: 8.6–16.4) in eastern Africa (posterior probability = 0.99) where it is currently mainly distributed. This clade includes all the sub-Saharan chromosomes belonging to the former paragroup E-M35*(×V92, V42, V6, M123, V68, M293, and V257), as well as all the sub-Saharan haplogroups (E-V42, E-M293, E-V92, and E-V6) reported as E-M35 basal clades in a previous phylogeny."

That I have no problem with.

Of course, it is great that they used a mutation rate based on very ancient, dated specimen - something everyone should be doing. Even then there are still the questions of correctly modelling the population size and also taking into account that many lineages are dead-ends or almost dead-ends due to the accumulation of deleterious mutations that eventually require very specific additional mutations (if not back-mutations) to be overcome. For example, if the average mutation rate since 50,000 ya is 0.7 × 10^−9/site/year, the effective (i.e., surviving lineages) mutation rate then might have been only 0.35 × 10^−9/site/year, while in the past 10,000 years it was perhaps closer to 1.05. This means that recently dated (e.g., neolithic, bronze age) divergences might actually be slightly more recent, while ancient (e.g., LGM and before) nodes might actually be up to twice as ancient (for example) as dated.

Note that a correction factor of two at "method-dated" 50,000 ya is likely an exaggeration - more likely is +80% at method-dated 70,000 ya, considering archaeological and climate data.

An effective mutation rate that varies linearly with time is the best first approximation (in the sense of a Taylor series), but a mathematical analysis is needed to confirm this (there are processes that are intrinsically non-linear at small scales). More importantly, this and the rate are testable hypotheses: date trees based on a succession of ever-older ancient specimen (e.g., 2ky, 4ky, 8ky, 16ky, 32ky, 64ky or similar). The computed dates on the older nodes should then systematically become older, and from the sequence one can calculate an approximately correct asymptotic value of the mutation rate offset and rate of change (if indeed linear).