The genomic landscape of rapid repeated evolutionary adaptation to toxic pollution in wild fish


The pace of the evolutionary change depends on the existence of genetic variation, population size and intensity of the selection. While environmental change very often exceeds the rate of evolution for many species, killifish (Fundulus heteroclitus), living in U.S Atlantic coast estuaries turn out to be remarkably resilient. They have adapted to survive levels of toxic industrial pollutants, tolerating concentrations up to 8000 times higher than sensitive fish. In this interesting study, Reid et al. use population genomic and transcriptomic analyses to reveal complex genetic basis of rapid adaptation in killifish to dramatic, human-induced, environmental change.


Four pairs of sensitive and tolerant populations were compared. Based on comparative trancriptomics and analysis of 384 whole genome sequences few candidate regions are identified to underlay tolerance to complex mixtures of polycyclic and halogenated aromatic hydrocarbons. Interestingly, they are shared among four tolerant populations and are highly ranked. This suggests that the most important targets of selection have evolved in parallel across polluted sites.

Within shared outliers are genes involved in aryl hydrocarbon receptor (AHR) signalling pathway. Role of this pathway is to mediate toxicity. Experiments showed that tolerant populations exhibit reduced inducibility of AHR regulated genes while sensitive populations showed up to 70 upregulated genes in response to pollutant. At the genetic level, the tolerant populations evolved in highly similar ways indicating constrained phenotypic variation. It seems that selection acts only on few genes.

Processes involved in the adaptation of killifish to lethal levels of environmental pollution are complex. AHR pathway is a key target of natural selection but potentially negative effects of its desensitisation lead to compensatory adaptations in genes responsible for estrogen and hypoxia signalling regulation of cell cycle or immune system function. Authors identified CYP1A dosage- compensating adaptation through gene duplications for impaired AHR signalling pathway. In northern tolerant populations CYP1A duplications have swept to high frequencies. Some individuals have up to eight copies of this gene. Other selective targets include genes outside AHR signalling pathway such as KCNB2 and KCNC3 genes whose products form conductance pore of the voltage-gated potassium channel. It seems to be very common that compensatory changes go along with rapid adaptive evolution.


This study underlies the role of high nucleotide diversity and extensive pre-existing genetic variation as crucial for selective sweeps and evolutionary rescue. Also, number of evolutionary solutions to this kind of pollution is limited. Even though this study showed that some species have the capacity to overcome severe environmental changes due to natural richness of their genetic pool, most of the species, unfortunately, are not able to adapt such rapid changes due to low level of genetic variation.

Reid, N. M., Proestou, D. A., Clark, B. W., Warren, W. C., Colbourne, J. K., Shaw, J. R., et al. (2016). The genomic landscape of rapid repeated evolutionary adaptation to toxic pollution in wild fish. Science (New York, N.Y.), 354(6317), 1305–1308.



Posted in Uncategorized | Leave a comment

The genomic landscape of rapid repeated evolutionary adaptation to toxic pollution in wild fish.


Environmental pollution is a widespread problem that living organisms have to contend with on a global scale. In contaminated sites especially, wild populations undergo intense selective pressure that may result in phenotypic adaptations to pollutants (Hendry et al., 2008). The scientific article (Reid et al., 2016) discussed in this blogpost explores the genetic mechanisms that have allowed the rapid adaptation to industrial pollutants in wild Atlantic killifish populations.


The genomic landscape of the killifish populations

Atlantic killifish (Fundulus heteroclitus) are non-migratory fish that are abundant along the US east coastline (Fig. 1A). Some killifish populations show inherited resistance to lethal levels of industrial pollutants in sites that have been contaminated for decades. For instance, the authors show that the percentage of larva that survive in increasing concentrations of a highly toxic pollutant called PCB 126, is higher in tolerant populations compared to the sensitive populations (Fig. 1B). To understand the genetic adaptations underlying the rapid adaptation to polluted sites in killifish populations, the authors sequenced the complete genomes from eight populations. Four tolerant populations that reside in highly polluted sites were sampled. Each one was paired with a sensitive population from a nearby site (Fig. 1A). The authors combined these genomic data with corresponding RNA sequencing (RNA-seq) to identify unique and shared pathways among tolerant populations as well as to uncover adaptive evidence in the populations.

The genomes from 43-50 individuals from each population were sequenced. One pair of tolerant and sensitive populations (T1 and S1) were sequenced to 7-fold coverage, while the remaining populations, to 0.6-fold coverage. These data indicate that the populations’ genetic variation is strongly by their geographical locations. Meanwhile, all tolerant and sensitive pairs of populations share the most similar genomic backgrounds and have low Fst values between them (0.01 – 0.08). Additionally, tolerant populations show a lower genome-wide nucleotide diversity (?) along with a positive-shifted Tajima’s D. Thus, the authors conclude that tolerant populations have recently and independently diverged from local ancestral populations.

Figure 1. Atlantic killifish populations. (A) Locations of pollution-tolerant and sensitive populations studied (“T”, filled circles; “S”, open circles respectively). (B) Larval survival (linear regression of logit survival to 7 days post hatch) when challenged with increasing concentrations of the pollutant PCB 126.

Signatures of convergent evolution in tolerant killifish populations

To identify genomic regions responsible for conferring pollution tolerance in killifish, the authors scanned the populations’ genomes looking for signals of selective sweeps in 5 kb sliding windows. Candidate regions were defined as those showing low values of genetic diversity and a skewed allele frequency spectrum (0.1% tails of ? and Tajima’s D, respectively) as well as high allele frequency differentiation (99.9% tails for Fst). Each tolerant population showed prevalent selection signatures compared to their sensitive counterparts (as seen by ? and Fst). Most of these outlier regions are small (52 – 69 kb, up to ~1.8 Mb) and specific to each tolerant population. Nevertheless, the highest ranked outlier regions are shared between tolerant populations (Fig. 2A). The shared outlier regions harbour genes involved in the aryl hydrocarbon receptor (AHR) signaling pathway (AHR2a, AHR1a, AIP, and CYP1A) (Fig. 2B). These results suggest repeated convergent evolution of pollutant tolerance in the sampled killifish populations.

The authors then tested whether the genes located in outlier regions showed distinct expression profiles in tolerant killifish. Individuals from sensitive and tolerant populations were reared in a common, clean environment for two generations. Following this, embryos were challenged with the toxic pollutant PCB 126 and RNA was collected ~10 days post fertilization. Indeed, AHR-regulated genes were less induced in individuals from tolerant populations (Fig. 2C). Concomitantly, AHR-regulated genes were enriched (P < 0.0001) in the set of genes that were up-regulated in response to PCB 126 treatment in sensitive populations exclusively. Notably, some of the dominant pollutants at the sampled “T” sites bind AHR. Also, aberrant AHR signalling leads to embryo and larval lethality (Pohjanvirta, 2011). The authors thus conclude that the AHR signalling pathway is a key and repeated target of natural selection in polluted sites given the multiple, independent “desensitizing” events in tolerant killifish populations.

Fig. 2. Structural and functional genomic signals of adaptation to pollutants. Adapted from (Reid et al., 2016). (A) Allele frequency differentiation (Fst, top) and nucleotide diversity (pi, bottom) difference (tolerant pi – sensitive pi) for each population pair studied for top- ranking outlier regions (including the top two per pair). Colored panels span the outlier region of each respective population comparison where number indicates outlier rank for each tolerant-sensitive pair. Red dashed lines indicate outlier thresholds. Each tick on x axis is at the 500-kb position on the scaffold, and each candidate gene name is indicated (top) for each outlier region. (B) Model of key molecules in the AHR signaling pathway, including regulatory genes and transcriptional targets (AHR gene battery). Boxes next to genes are color-coded by population pair; filled boxes indicate the gene is within a top-ranking outlier region for that pair, and number indicates ranking of the outlier region as in (A). (C) Gene-expression (of developing embryos) heat map shows up-regulated genes in response to PCB 126 exposure (“PCB”; 200 ng/liter) compared with control exposure (“Con”) for sensitive populations, most of which are unresponsive in tolerant populations. The bottom panel highlights genes characterized as transcriptionally activated by ligand-bound AHR.

It is important to note that genome sequencing coverage seems to have an effect on the ranking of outlier regions. For instance, the regions that contain key AHR-signalling genes (AIP, CYP1A, AHR1a/2a, and ARNT) are very highly ranked in low-coverage populations whereas they are lowly ranked in the high-coverage population pair (T1-S1). Given that outlier regions are ranked based on Fst and nucleotide diversity, these measures must be impacted by low genome sequencing coverage. It would be interesting to determine the ranking of the outlier regions if the other populations were sequenced to higher coverage. However, despite being lowly ranked, these regions are classified as outliers in all four population pairs, giving strength to the argument that impaired AHR-signalling is key to pollution tolerance.

In-depth analysis of genetic variants in tolerant populations

There is evidence for selection of AHR pathway genes in tolerant killifish populations. Tolerant populations harbour distinct deletions spanning AHR2a and AHR1a. On the contrary, individuals from their sensitive counterparts are almost completely devoid of such mutations. Furthermore, RNA-seq data revealed the expression of a chimeric transcript, part AHR2a, part AHR1a in T4 individuals. Meanwhile, AIP (a regulator of AHR stability and cellular localization) is found within a region showing the strongest signals of selection that is shared between all tolerant populations. CYP1A (a transcriptional target of AHR) is also in located in top-ranking outlier regions in all tolerant populations (except for T1 where the region is ranked #401). Interestingly, CYP1A duplications are found in high frequencies in tolerant populations, without a concomitant increase in expression. The authors hypothesize that CYP1A duplications may function as a dosage-compensation mechanism in tolerant populations with impaired AHR signalling because it has been reported that AHR knockout decreases CYP1A expression in rodents (Schmidt et al., 1996). Finally, other AHR-related genes lie within population-specific outlier regions such as the tandem paralogs AHR1b and AHR2b in T3 and T4 and five other AHR pathway genes in T4. Together these observations led the authors to conclude that AHR pathway genes are indeed common and repeated targets of selection, a clear example of convergent evolution.

Genes outside of the AHR signalling pathway are also targets of selection. For example, two genes that are implicated in AHR-independent cardiotoxicity (KCNB2 and KCNC3) are within outlier regions in T4, where such cardiotoxic pollutants are abundant. Additionally, the authors found adaptations that may compensate for the potential costs of pollutant tolerance. AHR signalling is interconnected with multiple other pathways, such as estrogen and hypoxia signalling, as well as cell cycle and immune system regulation (Beischlag et al., 2008). Consequently, estrogen receptor 2b lies within an outlier region in T2, while estrogen receptor-regulated genes are enriched in the gene set of the outlier regions in all tolerant populations (P < 0.001). Furthermore, the estrogen receptor is inferred as an upstream regulator of differentially expressed genes between tolerant and sensitive killifish (Fig. 2C). Alternatively, the hypoxia-inducible factor 2? is in an outlier window in T3, and interleukin and cytokine receptors are in outlier regions in T4. Thus, the authors highlighted the possibility that compensatory adaptation selection may be common following rapid adaptive evolution.


In this article, we can appreciate that genetic adaptations to pollution in wild killifish populations are complex. The authors attribute this to two main factors. Firstly, sites are contaminated with a complex mix of pollutants. This may affect how the AHR- and other pathways are impacted. Therefore, adaptations in multiple pathways, at different genetic levels, may be necessary for tolerance to diverse pollutant mixes to arise. Second, AHR pathway genes are interconnected with other gene-regulatory pathways, thus these genes’ functions may be impaired upon aberrant AHR signalling, It follows that adaptations that compensate these genes’ functions may also be selected for in tolerant populations.

The authors argue that their data clearly reveal signals of convergent evolution. The AHR pathway genes are shown to be repeated targets of selection in distinct pollutant-tolerant killifish populations. This also suggests molecular constraints in the adaptation to pollution. However, in spite of this, multiple variants were favoured in different tolerant populations. The authors say that their data show evidence of selection of preexisting common variants in multiple tolerant populations. In other words, it seems that soft sweeps have been important for the emergence of pollutant tolerance in killifish. This conclusion is supported by several lines of evidence: 1) the sensitive-tolerant populations are genetically close, which suggests that the selected variants were part of the standing variation, 2) sensitive populations have some of the variants that tolerant populations have, and finally 3) these fish are low dispersal. Interestingly, the authors point out that Atlantic killifish have large population size and a wide range of standing genetic variation. These little critters were not only the first space-going-fish, they are one of the most genetically diverse vertebrates, which positioned them well to evolve pollutant-tolerance.

However, it is important to realize that not all species are as well poised to adapt to ever-changing, increasing pollution in their habitats. Research like this gives us key information on how natural populations are dealing with our pollution. The best chance we can give all life forms on earth is to curb or rates of worldwide pollution. Luckily, it is on our power to do so!


  1. Beischlag, T. V., Morales, J. L., Hollingshead, B. D., & Perdew, G. H. (2008). The aryl hydrocarbon receptor complex and the control of gene expression. Critical Reviews in Eukaryotic Gene Expression, 18(3), 207–250.
  2. Hendry, A. P., Farrugia, T. J., & Kinnison, M. T. (2008). Human influences on rates of phenotypic change in wild animal populations. Molecular Ecology, 17(1), 20–29.
  3. Pohjanvirta, R. (2011). The AH Receptor in Biology and Toxicology. Wiley, Hoboken, NJ.
  4. Reid, N. M., Proestou, D. A., Clark, B. W., Warren, W. C., Colbourne, J. K., Shaw, J. R., et al. (2016). The genomic landscape of rapid repeated evolutionary adaptation to toxic pollution in wild fish. Science (New York, N.Y.), 354(6317), 1305–1308.
  5. Schmidt, J. V., Su, G., Reddy, J. K., Simon, M. C., & Bradfield, C. A. (1996). Characterization of a murine Ahr null allele: Involvement of the Ah receptor in hepatic growth and development. Proceedings of the National Academy of Sciences of the United States of America, 93(13), 6731–6736.
Posted in evolution, fish, genomics | Leave a comment

How the Galapagos cormorant lost its ability to fly


Novel traits play a key role in evolution by facilitating the access to new ecological niches. Novelty is often recognized at a phenotypic level and usually related to gain of new function. But can nature innovate through the loss of the function? Wing reduction and loss of flight in birds occurred several times in evolutionary history. It is found among 26 families of birds. However, it is difficult to determine genetic basis underlying this change.

In this interesting study Burga et al. are using flightless Galapagos cormorant (Phalacrocorax harrisi) as an interesting model to study evolution of recent loss of flight. Namely, P.harrisi diverged from its flighted relatives within the past 2 million years and represents the only flightless cormorant among 40 existing species. The entire population (approximately 1500 individuals) is distributed along the coastlines of Isabela and Fernandina islands in the Galapagos archipelago.

There are two evolutionary paths that could possibly explain the loss of flight. Flightlessness could be positively selected if it helps birds to develop alternative ability to escape from predators and to survive (like swimming). Alternatively, if flying was not essential for surviving (no need to escape from predators) the mutations that obstruct flight might accumulate in the gene pool. These two scenarios are not necessarily mutually exclusive, meaning that passive loss of flight might be followed by positive selection that will keep reducing wings.


In this study authors showed that comparative genomics is a powerful tool for disentangling evolutionary history and for understanding molecular mechanisms behind evolutionary changes. They sequenced, and de novo assembled the complete genomes of the Galapagos flightless cormorant and three flighted relatives.

Initially authors identified highly conserved regions of non-coding DNA in attempt to find potential candidates of the wing shortening but eventually they focused their attention on coding variants. Among coding variants, many genes were identified exclusively in the Galapagos cormorant genome. Variants related with dysfunction of the primary cilium (key role in mediating hedgehog signaling pathway during development) were selected as candidates linked to reduction of the wings. Interestingly, impaired function of the same genes in humans lead to bone development disorders described as skeletal ciliopathies.

To confirm hypotheses about effects of mutations in cilia-related genes, authors combine in vivo experiments in C. elegans and in vitro experiments in mouse chondrogenic cell lines. Experiments in C. elegans confirmed that a missense variant present in Galapagos cormorant IFT122 protein is sufficient to affect cilia function in vivo. Nevertheless, it would be interesting to see if knock-in of some other functional gene from Galapagos cormorant would lead to same behavior as wild type gene from C. elegans (used as positive control in this experiment).

Further analyses were focused on gene called CUX1. This gene is linked to shortened wings in chickens. Results from these experiments suggest that cilia and hedgehog signaling pathway related genes are likely transcriptional targets of CUX1 in chondrocytes. In normal functioning of hedgehog signaling pathway, CUX1 regulates expression of cilia related genes and promotes chondrogenesis. Since Galapagos cormorants carry different variant of the CUX1 gene, this possibly modify gene’s function, influencing both cilia formation and their functioning. All this reflects on hedgehog pathway activity, resulting in impaired bone growth.


This exhaustive study underlined importance of sophisticated genetic tools such as genomic analyses in explaining molecular mechanisms responsible for the changes observable at phenotypic level. Experiments revealed polygenic basis of the flightlessness. Even more, they select ciliary disfunction as a likely contributor to the evolution of loss of flight. Most valuable aspect of this study is its approach that can be used for identification of the other variants responsible for evolutionary innovation by analyzing genomes of closely related species.


Burga et al. (2017) A genetic signature of the evolution of loss of flight in the Galapagos cormorant. Science 356 (6341), eaal3345.

Posted in Uncategorized | Leave a comment

Convergent evolution of caffeine in plants by co-option of exapted ancestral enzymes

A biochemical story on convergent evolution


Convergent evolution is the process by which similar traits evolve independently in distantly related organisms, such as wings in bats and birds. This can target orthologous or unrelated genes, which gives a different view on the concept of convergent evolution : how much it is constrained to some pathways, or, reversely, how diverse the path to the same function can be.

For convergent evolution to arise, different proteins must be assembled into an ordered, functional pathway. Currently, Three hypotheses shed light on the matter. Under the cumulative hypothesis, enzymes catalyzing the earlier reactions of a pathway must evolved first, because, otherwise, enzymes that perform the following steps would have no substrate to react with. Later steps would arise by duplication of the first enzyme. This suppose that intermediates are advantageous. Reversely, under the retrograde hypothesis, enzymes catalazing the later steps of a pathway would evolved first, and then gene duplication would give rise to the enzymes catalysing earlier steps. This suppose that intermediates could be produced non-enzymatically but doesn’t assume anything on their potential effect. Finally, the patchwork hypothesis states that a novel pathway will arise by the recruitment and rerouting of an alternative, preexisting pathway – we talk about ‘exapted’, or ‘co-opted’ enzymes. This suppose that the older, recruited enzyme was catalazing a promiscous reaction.

In plants, one of the most studied example of convergent evolution is caffein biosynthesis, which seems to have independently appeared at least five times during flowering plant history : only a few representatives of each clade display caffein biosynthesis, wich means, under the parcimony rule, that rather to be a trait ancestrally shared, it is more likely to have independently and repeatedely emerged. For the past 30 years, only one biosynthetic path among the several possible was shown to have convergently eveloved (Fig. 1), though with paralougous enzymes in both Coffea (XMT) and Camillia (CS) from the SABATH family. Most of those enzymes actually catalyze O-methylation, but were recently co-opted in those species to catalyze N-methylation.

In their article, Huang et al. [1] shed light in how caffeine convergence occured in 5 different species : Theobroma, Paullinia, Citrus, Coffea and Camillia. They were able to uncover different biosynthetic paths, thereby contradicting the idea that convergence in caffeine biosynthesis was constrained to one and only path, and to reconstruct ancestral enzyme activities, thereby illustrating to a molecular level how a new function can arise.


Different biosynthetic pathways to caffeine exist in Theobroma, Paullinia, and Citrus

To uncover genes and pathways used in plants to produce caffeine, Huang et al. first identify SABATH enzymes from each species studied and mapped them to the EST database to uncover the ones expressed in the caffeine producing tissues. Next they conducted enzyme assays to identify substracts with which they were able to react and mass spectroscopy scans to identify products, reconstructing full pathways to caffeine in those distantly related species (Fig. 2A).

Indeed, they could reveal that Theobroma and Paullinia express CS-type enzymes orthologous to the ones expressed in Camellia in their caffeine producing tissues. Surprisingly, they catalyze a different biosynthetic path (Fig. 1). Importantly, the enzymes catalyzing the first steps, TcCS1 and PcCS1 for Theobroma and Paullinia respectively, and the second step, TcCS2 and PcCS2 are respectively more distantly related than are TcCS1 and TcCS2, and PcCS1 and 2, which was unexpected considering their catalytic similarity. Therefore, it represents a strong evidence towards convergent repeated duplication of those enzymes in each lineage, rather than ancestrally duplicated enzymes that would have been lost in other non-caffeine producing lineages. However, tough Huang et al. provide a phylogenetic tree of SABATH enzymes with bootstrap values for major nodes, the ones providing that statement are not supported, which would have give further credit to it.

Because of the phylogenetic proximity of Paullinia and Citrus, that are both part of the Sapindales family, one could expect that they share the same enzyme type involved in caffeine production. Nevertheless, Citrus do not express CS-type enzymes in sites of caffein prduction but rather express two recently duplicated XMT-type enzymes ortologous to the ones found in Coffea, but they are specialized in another biosynthetic path to caffeine (Fig. 1).

Therefore, contrary to what has been believed for more than 30 years, plants have a much broader biosynthetic repertoire than previously known, with at least three different paths leading to caffeine biosynthesis that convergently emerged. However, this is unclear which proteins were exapted and what function they previously served, allowing them to be preserved along million of years of evolution.

Ancestral XMT enzymes displayed O-methylation

Coffea and Citrus XMT enzymes ancestors needed to be maintained for more than 100 My from their common ancestor, to then independently give rise to N-methylating enzymes involved in caffeine production. To understand what allow them to be maintained, Huang et al. used a method allowing to ‘resurect’ ancient protein [2]. This consist in inferring ancestral sequences based on to-day descendant protein alignments and to synthetize them to characterize their function.

Using that method, Huang et al. ressurect the 100 My old XMT enzyme ancestor to Rosids and Asterids, hereby called RAAncXMT (Fig. 2A), and its descendant CisAncXMT1, at the node giving rise to the citrus lineage (Fig. 2B). They both exhibit high O-methylation activty (Fig. 2C), which explains why they would have been maintained over such a long time, but no N-methylation activity. It is still the case of one of its to-day descendants in Mangifera, whereas they have specialized in N-methylation in Citrus. Today, Citrus possess a SAMT enzyme capable of both methylations, which could account for the loss of that function in XMT enzymes.

They also resurrected CisAncXMT2, at the node giving rise to both to-day CisXMT1 and 2, responsible for caffeine production. Interrestingly, CisAncXMT2, while still maintaining small O-methylation activity, display N-methylation activity, including almost all of the activities of both to-day enzymes, reconstituting together the two last steps needed for caffeine production, tough it is still unclear how it was recruited to form a functional pathway.

Ancestral citrus XMT enzymes were only a few steps away from to-day caffeine production function

To understand how nowadays Citrus XMT enzymes arised from CisAncXMT2, Huang et al. mapped it against CisXMT1 and 2 and identified key mutations. They then mutagenized the resurrected enzyme. In the lineage leading to CisXMT2, they identified one key mutation, P25S, that was sufficient to reproduce qualitatively the activity of CisXMT2 (Fig. 2C). Similarly, in the lineage leading to CisXMT2, they identified H150N as the mutation sufficient to reproduce roughly today’s activity. Altough other mutations could have shifted the ancestral enzyme activity, this shows that, after duplication, from those 2 single mutations alone, a complete pathway to caffeine would have emerge.

Two very interesting points sould be noted here. First, that very few mutations are sufficient to shift one enzyme substrate preference, which may have been a more widespread fact during evolution. Second, that contrary to the very linear biosynthetic vision one may have, several activities can emerge at the same time, and, more importantly, while maintaining the original activity, as it was the case for CisAncXMT1, thereby reconciling several hypothesis.



Using the very concrete exemple of caffeine Biosynthesis, Huang et al. were able to nicely illustrate the mechanisms of convergent evolution, unveiling much more diversity than previously thougth in the biosynthetic path, and to give us a view on the transition from the ancestral enzyme to the nowaday ones, demonstrating that the hypotheses running in the field were not mutually exculsive since biological pathways are not as linear as one may think.

Altough enzymatic data were quite strong and the whole story quite convincing, the phylogenetic analysis leading to the resurrection of enzymes would have benefit from the authors sharing statistical confidence on the alignment, and especially in the sites they mutagenised thereafter. Nevertheless, the study Huang et al. conducted was well constructed and easily understandable form people outside of the field, and we hope to learn more about the other caffeine producing plants, such as Guayusa, that contains much more caffein that coffea itself.


[1] Huang R, O’Donnell AJ, Barboline JJ, Barkman TJ (2016) Convergent evolution of caffeine in plants by co-option of exapted ancestral enzymes. Proc Natl Acad Sci USA 113:10613–10618

[2] Thornton JW (2004) Resurrecting ancient genes: Experimental analysis of extinct molecules. Nat Rev Genet 5(5):366–375

Posted in Uncategorized | Leave a comment

The parallel evolution in amniotes seen through the eye of functional nodal mutations


In this article the authors describe an evolutionary convergence in mammals, birds, and reptiles, based on genomic data from NCBI. The evolution of different species and lineages is due to mutations that can appear and accumulate in organisms over time. Those mutations need a high functional potential and have to be conserved in time in order to form new species. The conservation of mutations can occur via selection pressure, mutational compensation, and/or by the separation of members from the same species by geological and environmental events.

In this comprehensive study, the authors describe, a genomic landscape of the parallel evolution by analysing functional nodal mutations (fNMs) by using different types of DNA (mitochondrial and nucleic), the thermostability of mtDNA encoding RNA genes, and the structural proximity of proteins, using the available 3D structures from PDB database. Functional nodal mutations (fNMs) can be separated in single nodal (fSNMs), recurrent nodal mutations (fRNMs), occured independently in unrelated lineages and recurrent combinations of nodal mutations (fRCNMs) recurred independently along with other nodal mutations in combinations in more than a single lineage. The recurrent ones can be taken in consideration the most when we are talking about the convergent adaptive responses, that means the parallel evolution of different species. In this study, one of the aim is to find the best candidate for this adaptive mutations that was present in the evolution of the amniotes. The compensated ones are used to identify the adaptive mutations. The main explanation for the convergent evolution is the presence of the recurrent nodal mutations. Many fNMs are in combination with potential compensatory mutations in RNA and protein-coding genes. The compensation of a functional mutation is the co-occurrence with additional mutations that are “affecting” the original function.


In the article it is claimed that the evidence for parallel evolution is mainly due to the presence of a high number of uncompensated reccurent fNMs. The best candidate to show the parallel evolution is the emergence of body thermoregulation in mammals and birds, that seems to be independent.

The mtDNA, the maternal genetic information was used to identify the fNMs in the amniotes. The study is based on mtDNA from 1003 species and nDNA from 91 species. The mtDNA was used for the structure-base alignment for 24 mtDNA-encoded RNA genes (tRNA and rRNa) and 13 protein-coding gene. To this, they added 4 more mtDNA proteins with the 3D structure: CO1-3 and Cytb, as the cytochromes are highly conserved proteins across various species. The mtDNA genes are usually the same, but what seems to be different it is the order of the genes, that are changed by evolutionary rearrangements. Because of this, they first aligned the genes individually and after this, they concatenated the 37 proteins to the human mtDNA gene order.

The sequence alignment revealed a number of 25234 nodal non-synonimous and RNA gene mutations. To see the potential of this mutations, there were calculating a score that include: evolutionary conservation, physical-properties (of non-synonymous changes) and the molecular thermostability (the free estimated energy (?G) for the two RNA sequences was calculated before and after the mutational event). The score, from 1 to 9 is depending to the level of conservation and physico-chemical properties of the tested amino acid.After calculating the potential function score of all the nodal mutations, there were 3262 non-synonimous fNMs, mainly in RNA genes with mutations related to disease-causing.

The next step was to identify the best candidate for adaptive fNMs by studying the compensated and non-compensated mutations, but the approach chosen by the authors cannot reveal the exact order of compensation process. Meanwhile, there are some compensatory mutations that could gain lower functionality scores than the co-occurring fNMs. In the Figure 1, we can see a demonstration of the potential compensation and a possible adaptation in a protein-coding gene (COX2) through different species. The panel b shows the locations of the fNMs (S155T) and different other co-occurring compensatory mutations. The S155T mutation appears as independently re-occurrent as well as compensatory co-occurring mutations. As we can see, this approach is pure theoretical, because cannot show all the compensations, only the best ones, that got fixed in evolution. The Figure 2 shows the prevalence of different types of mutations that could be compensated or not. The predictive results reveal a high probability of fRCNMs to be compensated for RNA and protein-coding genes. Here are introduced also the information from the nDNA, that is compared with mtDNA in term of prevalence of the compensatory and non-compensatory mutations. Because there was a big difference of the number of species involved in this approach, the evolutionary resolution was reduced. So, the authors decided to analyze the same 91 species for mtDNA and nDNA and reducing the bias. Because of the reduction in the resolution, they redid the analysis by using the most ancient mutations, that occurs in deeper nodes in the case of mtDNA, but this revealed almost the same proccent as they were working with the 91 species (37% for the ancient mutations and 34% by including the younger ones) (Figure 2e & Supplementary 5b,c). So, the older mutations appear to be less compensated and this give more uncompensated mutations that are best candidates in the ancient adaptative mutations. In the supplementary Figures, the authors are using the OXPHOS complexes to compare the fNMs in mtDNA and nDNA by using 91 species. For the intra-mtDNA the albeit is less prominent (31%).

For the nDNA data is used the whole genome of the species. So, the information is much more comprehensive by the presence of a higher number of genes. In comparison with the mtDNA, the compensation prevalence is lower, having a difference of 10%, but in both case the proccent of possible compensation is higher than can be explained by the mutation rate or the chance.

In the end, to determine the best adaptive mutations over the evolution, they used the fRNMs from mtDNA, but maybe because of the low number of the samples, the result did not show any proof of the impact of non-compensated fRNMs in being the main reason for the convergent evolution. Instead, the nDNA revealed a significant pattern with highest number of potential non-compensated fRNMs shared between birds and mammals (N=51). The best candidates resulted by being the mutations in the genes related to the thermoregulation in the birds and mammals.


In this comprehensive study, the authors merged several information, including different types of DNA, from many species, with various physico-chemical parameters. The results of this work reveal, that the ancient functional mutation are the best for being studied, because of their possibility to overcome negative selective. The best candidates for the adaptive nodal mutations are in the end the non-compensated fNMs, that are in a higher presence in the case of old fNM. This seems to be the main helper for the evolution of the thermoregulation in birds and mammals. The protein analysis reinforces the main conclusion: for enriching the adaptative mutations, the non-compensated mutations are the best candidates.

Taken together this study provides new insights into how different lineages and species might have developed over time. It also shows a new way how to combine data from different sources. However, the authors fail in giving an adequate explanation for the fNMs, together with the fact that they lack references that describe this term makes the article difficult to understand, especially for people that are not from the field and this is in fact the contrary of how scientific writing should be done.


Levin & Mishmar, 2017, The genomic landscape of evolutionary convergence in mammals, birds and reptiles. Nature Ecology & Evolution 1: 0041



Posted in adaptation, conservation, evolution, genomics | Leave a comment

Reconstructing prehistoric African population structure


The highest genetic diversity in humans is found in Africa, in line with Africa being the cradle of humanity. While the three articles we discussed previously during this tutorial (1,2,3) mainly focused on determining the most parsimonious “out-of-Africa” scenarios based on genetic diversity data, this article (Skoglund et al. 2017 4) investigates the population structure of Africa prior to the expansion of food producers (i.e. herders and farmers). In order to reconstruct the prehistoric population structure, the authors analyzed the genomes from 16 ancient African individuals who lived up to 8100 years ago (including 15 newly sequenced genomes), as well as SNP genotypes from 584 present-day Africans, and 300 high coverage genomes from 142 worldwide populations. This is the first study to gather and analyze such a high number of ancient genomes, thereby providing an unpreceded insight into the prehistoric human population structure.


An ancient cline of southern and eastern African hunter-gatherers

The authors used principal component analysis (PCA) and automated clustering in order to relate the 16 ancient individuals to present-day sub-Saharan Africans. This reveals that while the two ancient South African individuals share ancestry with present-day South Africans (Khoe-San), 11 of the 12 ancient individuals living in eastern and south-central Africa between ?8100 and ?400 years BP form a gradient of relatedness to the eastern African Hadza on one extremity and to Khoe-San on the other. This genetic cline is also correlated with geography along a North-South axis. Another pattern which emerged from this analysis is the lack of heterogeneity between the seven ancient individuals from Malawi, indicating a long-standing and distinctive population in ancient Malawi which persisted for at least 5000 years but which is extinct today.

Subsequently, the authors built a model where ancient and present-day African population trace their ancestry to a putative set of nine ancestral populations. They then used data from both ancient and present-day populations showing substantial ancestry to major lineages present in Africa today as proxies for these ancestral populations. These proxy populations consisted of three ancient Near Eastern populations representative of Anatolia, the Levant and Irak, respectively, and six African populations representative of different components of ancestry (western African, southern African before agriculture, northeastern African before agriculture, central African rainforest hunter-gatherer, eastern African early pastoralist context and distinctive ancestry found in Nilotic speakers today). By using qpAdm (a generalization of f4 symmetry statistics), they tested for 1-, 2- or 3-source models and admixture proportions for all other ancient and present-day African populations, with a set of 10 non-African populations as outgroups. We note that the f4 statistics are poorly explained in this article, making it hard for a non-initiated reader to grasp its meaning and the relevance of the results. The main finding from this analysis is that ancestry closely related to the ancient southern Africans was present much farther north and east in the past than is apparent today.

Displacement of forager populations in eastern Africa

Unsupervised clustering and formal ancestry estimation both indicate that present-day Hadza in Tanzania can be modeled as deriving all their ancestry from a lineage related to ancient eastern Africans such as Ethiopia_4500BP. However the contribution of this lineage to present-day Bantu speakers in eastern Africans is small, who instead trace their ancestry to a lineage related to present-day western Africans and additional ancestry components. In present-day Malawians, population replacement by incoming food producers seems to have been almost complete as witnessed by a near absence of ancestry from the ancient individuals sampled, and by most of their ancestry coming from the Bantu expansion of western African origin.

Importantly, of all ancient individuals analyzed, only a 600 BP individual from Zanzibar has a genetic profile similar to present-day Bantu speakers, with even more western African ancestry. Using linkage disequilibrium, the authors estimate that the admixture between western- and eastern-African-related lineages occurred 800-400 years ago. This indicates that there was genetic isolation between early farmers and previously established foragers during the Bantu expansion into eastern Africa, and that this barrier disappeared over time as mixture occurred. However this delayed admixture did not occur in all African populations, as shown in present-day Malawians who display no signs of admixture from previously established hunter-gatherers.

Early Levantine farmer-related admixture in a ?3100-year-old pastoralist from Tanzania

The authors compared estimated the ancestry component from a 3100 BP individual from Tanzania and found that 38% of her ancestry was related to the pre-pottery farmers of the Levant (10000 BP), indicating a critical contribution of Levant-Neolithic-related populations to present-day eastern Africans. The best fitting ancestry component model in Somali indicates that they have ancestry from the 3100 BP Tanzanian individual but also Dinka-related ancestry as well as 16% ancestry related to Iranian-Neolithic-related ancestry. This suggests that ancestry related to the Iranian Neolithic appeared in eastern Africa after an earlier gene flow related to Levant Neolithic populations.

Direct evidence of migration bringing pastoralism to eastern and southern Africa

All three ancient southern Africans show affinities to the ancestry predominant in present-day Tuu speakers in the southern Kalahari. Among them, the 1200 BP sample from western Cape found in a pastoralist context has a similar ancestry composition as present-day pastoralists like the Nama, with affinity to three groups: Khoe-San, western Eurasians and eastern Africans. This is in line with the hypothesis of a non-Bantu-related population transporting eastern African and Levantine ancestry to southern Africa by at least 1200 BP. Using their model to determine the proportions of different ancestries present in western cape 1200 BP, they find mainly a mixture of non-southern African population. This is consistent with the hypothesis that the Savanna Pastoral Neolithic archaeological tradition in eastern Africa is a possible source for the spread of herding to southern Africa.

The earliest divergences among modern human populations

Previous studies indicate that the primary ancestry in the San population (southern Africa) comes from a lineage that separated from all other lineages present in modern humans, before separation of the different modern human lineages. While Skoglund et al. obtain a similar model in absence of admixture, the tree-like representation is a poor fit since ancient southern Africans (2000 BP) were not strictly an outgroup of all other African populations and several examples also show inconsistencies with this model. In order to find models that fit the data, the authors performed admixture graph modeling of the allele frequency correlations and found two parsimonious models. In the first one, present-day western Africans have ancestry from a basal African lineage that contributed more to the Mende than in did to the Yoruba, with the other source of western African ancestry being related to eastern Africans and non-Africans. In the second model, gene flow over long periods of time and over long distances has connected southern and eastern Africa to other groups in western Africa.

A selective sweep targeting a taste receptor locus in southern Africa

The authors then searched for the genomic signature of natural selection in ancient genomes, by searching for regions of greater allele frequency differentiation between ancient and present-day populations than predicted by the genome-wide background. To do this, the researchers compared the two ancient southern African genomes (2000 BP) to six present-day San genomes with minimal recent mixture. Since the small number of ancient genomes does not allow to infer changing allele frequencies at single loci, a scan for high allele frequency differentiation was conducted in 500 kb windows using 10kb steps. This led to the identification of the most differentiated locus which overlapped a cluster of eight taste-receptor genes. Although it is reported that taste receptors have already been identified as targets of natural selection as they affect the ability to detect poisonous compounds in plants, we must be wary that any analysis is bound to find something with such huge datasets, and that the biological interpretation of such finding may not be as straight-forward.

Polygenic adaptation

Skoglund et al. tested for evidence of selection on specific functional gene categories between present-day San and the two ancient genomes from southern Africa using allele frequency differentiation estimation. The functional category with the most extreme allele frequency differentiation between present-day San and the ancient southern Africans corresponded to “response to radiation”. In order to control that this was not a general inflated allele frequency differentiation, the same statistics were used using the Mbuti central African rainforest hunter-gatherer for which no enrichment for “response to radiation” was found. Instead, the top category for Mbutis was “response to growth”. Based on this, the authors speculate that the small stature of hunter-gatherer populations may be an acquired adaptation.



This study brings a first and unique view on the genetic makeup of prehistoric Africans. It is indeed a feat realized by 44 authors from institutions in 11 countries, which take advantage of 15 newly sequenced ancient genomes in addition to the only one that was previously available. The results indicate that an ancient lineage related to the San had a wider distribution in the past, depict two plausible scenarios of gene flow that led to the earliest divergences among modern populations and give new insights into the spread of herding and farming within Africa . As a side note, we noticed that all ancient individuals come from eastern or southern Africa, probably because this is where conditions were most favorable for the conservation of these ancient remains, although this could also introduce some biases, it seems to be the only possible way to go.


  1. Pagani et al 2016 Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538: 238–242 (corresponding blog post)
  2. Mallick et al. 2016 The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538: 201–206 (corresponding blog post)
  3. Malaspinas et al 2016 A genomic history of Aboriginal Australia. Nature 538: 207–214 (corresponding blog post)
  4. Skoglund et al 2017 (and references therein) Reconstructing Prehistoric African Population Structure. Cell 171: 59–71.e21
Posted in genomics, human | Leave a comment

A genomic history of Aboriginal Australia;

Blogpost on:

Malaspinas et al 2016 A genomic history of Aboriginal Australia. Nature 538: 207–214.


Prior to the publication of Malaspinas et al. 2016, investigation of Aboriginal Australian genome sequences had been quite limited. In fact, only 3 whole genome sequences from Aboriginals had been analyzed, 2 of these obtained with limited information concerning their place of origin (Rasmussen et al. 2011).

Malaspinas et al. 2016 is the first comprehensive study aimed at uncovering how the settlement of Australia occurred. The study combines genomic, linguistic and archeological studies in order to obtain more detailed information on how the settlement of Australia occurred.

For the largest part of the past 100000 years, Tasmania, New Guinea and Australia were part of the same continent known as Sahul. This continent was detached from mainland Asia, and its settlement process by human populations still remains poorly understood.

Previous archaeological evidence has led to the hypothesis that the settlement of Australia occurred from an African emigration wave, which predates the African emigration wave that settled in Eurasia (Lahr, M. et al. 1994). This has been coined the 2 Out of Africa event hypothesis (2OoA). Yet, other genetic studies support the notion that one major migration out of Africa (OoA) followed by 1 or 2 independent migratory waves led to the settlement of the modern Eurasian and Oceanic continents respectively.

The authors find that the data collected in the study more closely fits a model of single out of Africa dispersal (OoA), followed by divergence of Eurasians from Australo-Papuans. Finally the divergence of Aboriginals and Papuans from their common ancestral population ensued between 25000 and 40000 years ago.


The study is based on 108 newly-sequenced Aboriginal and Papuan genomes (83 Aboriginals and 25 Papuans) and genotype data for 45 additional Papuans. Moreover, SNP genotype data on Aboriginal Australians from Arnhem Land and from the European Collection of Cell Cultures Panel defined in previous studies was taken advantage of for admixture studies.

Colonization of Sahul

The authors use sparse non-negative matrix factorization (sNMF) on the combined datasets in order to determine the genomic ancestry proportions of Papuans and Aboriginals (Frichot, E. et al. 2014). The authors find that Aboriginals are mainly a mix of European, East Asian, New Guinean and Aboriginal ancestry. The most significantly contributing ancestry proportions stemming from Europeans and from Aboriginal ancestry. As expected, individuals from the Australian coastline displayed higher proportion of European ancestry compared to individuals from the desertic Australian inland.

Papuans instead display a majority of genomic ancestry stemming from New Guineans and East Asians. The proportion of New/Guinean ancestry in Aboriginal Australians is related to the distance from Papua, with Northeastern Australians containing a significantly higher proportion of Papuan ancestry compared to Southwestern Australians (Fig. 2a).

Based on f3 statistics, multidimensional scaling analyses (MDS) and genomic ancestry proportion inference, the authors show that Australians and New-Guineans are more similar to each other than to the other populations analyzed in the study (Fig. 2b,c) This favors the hypothesis that they share a common ancestral population which settled the continent of Sahul.

For the subsequent analyses the authors mask data stemming from non-Aboriginal ancestry or select samples based on their Aboriginal Ancestry. Specifically, the authors filter the information from the ancestry proportions and maintain only loci in which both loci show Aboriginal ancestry (Suppl Inf S06).

In order to shed some light on whether the settlement of Australia proceed through one or 2 separate founding waves, the authors use a simulation-based framework initially presented in (Excoffier et al. 2013). Specifically, this composite likelihood method compares the observed joint site frequency spectrum (SFS) to the expected one, allowing inference of coalescence based on SNPs (Excoffier et al. 2013).

The SFS approximation and the MDS analysis results both suggests that, a one wave founding model followed by divergence of a common ancestor into Papuan and Aboriginal populations fits the data more closely (Fig. 2a,b, Fig.3).

Fig.1 (Malaspinas et al. 2016) Describes locations for analyzed Australo-Papuans datasets

Fig.2 (Malaspinas et al. 2016) Australian Aboriginal Ancestry. A)Analysis of admixture in Australo-Papuans by sNMF. B-C) MDS analysis and f3-statistics to assess relationships within the Aboriginal population and between Australo-Papuans




Archaic Admixture

Next, the authors focus on characterizing the extent of archaic (Neanderthal and Denisovan) admixture contributing to the Australian and Papuan genomes. They do so based on the previously described SFS modeling-based approach, a D-statistics based on goodness-of-fit analysis (Green et al. 2010) and a putative archaic haplotype derivation method (Suppl Inf. Section 10).

D-statistics test was initially used in genetics (Green et al. 2010) in order to determine the extent of admixture between 3 populations. It compares which pair in a trio of tested populations is more closely related based on SNPs. The archaic haplotype derivation method used instead, is based on enhanced D-statistics (Meyer, M. et al. 2012) and linkage disequilibrium approaches (Wall, JD. et al. 2013)

Based on the 3 approaches the authors report that Aboriginal and Papuan genomes display an accumulation of Denisovan introgressed genes compared to non-Africans, and have the highest proportion of putatively Denisovan derived haplotypes compared to non-Africans. Additionally, they show that the estimated number of Denisovan derived haplotypes correlates with the proportion of Australo-Papuan ancestry across individuals (Ext. Fig. 3a,b,c). In summary the evidence indicates that Denisovan admixture predates the split of Australo-Papuans and the widespread Eurasian admixture into Aboriginal Australians.

Ext. Fig 3 (Malaspinas et al. 2016) Archaic (Denisovan and Neanderthal) genome introgressed haplotypes. A) Analysis of putative introgressed Denisovan and Neanderthal sites in European, East Asian, Australo-Papuan and South American Populations. B-C) Analysis of estimated archaic haplotype number in world populations. D-E) metrics of archaic derived haplotypes in populations of study.


Out of Africa

In order to determine whether the OoA or 2OoA wave scenario is more likely, the authors peform D-statistic on the following trios: Aboriginals and Eurasians compared to Africans VS Aboriginals and Eurasians compared to Ust’-Ishim (proxy for modern human from Asian (Fu, Q. et al. 2014)).

The authors find that if not accounting for Denisovan admixture, Africans and Ust’-Ishim are closer to Eurasians than to Aboriginal Australians, supporting a 2OoA model. Yet when accounting for the previously identified Denisovan admixture events, the test results indicate that Aboriginals and Eurasians are equally related to Ust’-Ishim favouring a OoA wave model. The same is seen when taking into account Denisovan admixture and considering populations across the whole world (Ext. Data Fig. 4a,b).

Implementation of the SFS-analysis and accounting for moderate Denisovan admixture also shows a more accurate fit of the data to the OoA model. The most accurately fitting model of the SFS-analysis shows that, first, Australo-Papuan divergence from Eurasians most likely occurred about 58000 years ago, and European divergence from East Asians occurred about 42000 years ago (Fig. 4).

Multiple Sequential Markovian Coalescence analysis also supports a model in which Australo-Papuans and Eurasians split from one ancestral population (Ext. Fig. 4, Ext. Fig. 6).


Fig. 3 (Malaspinas et al. 2016) SFS based modelling approximation of Australia settlement founding waves







Fig. 4 (Malaspinas et al. 2016) SFS based modelling approximation of most likely OoA migration


Genetic Structure of Aboriginal Australians

Subsequent investigation of mitochondrial DNA (mtDNA) and Y chromosome between-group variation shows that male-mediated migration was a driving factor in the substructuring of Aboriginals. MDS analyses performed on Aboriginals masked and non-masked for non-Aboriginal ancestry and geographic location of samples (Ext. Fig. 7a,b)suggests a population separation between Southwestern and Northeastern groups. This is in line with the model proposed by the SFS analysis of Australian continent settlement.

Usage of an ulterior modelling approach based on a three layer neural network (Bishop, CM. 1996,; Heaton, J. 2011), reveals that a majority of the gene flow took place along the Australian coasts (Ext. Fig. 7e-g). This result is consistent with the hypothesis that desertic internal Australian regions formed a natural barrier to gene flow.

Implementation of Bayesian statistics approaches for European, East Asian and Papuan admixture among Aboriginals reveals that Papuan admixture predated both East Asian and European admixture (Ext. Fig. 8a). Additionally, local ancestry inference based on tract length also underlines that Papuan gene flow into Aboriginals occurred before European and East Asian gene influx.

Ext. Fig. 4 A)D-statistics based proposed model for African emigration models. B) Sum of squared errors for the possible odels predicted by D-statistics. C)MSMC derived crosscoalescence rates determined by analysis of pairs of individuals. D) Assessment of archaic admixture on cross coalescence presented in C by modelling


Pama-Nyungan languages and genetic structure

Next, the study investigates how closely linguistic demographics could reflect genetic relationships by comparing phylogenetic trees obtained based on linguistics and trees based on Fixation index masking for Eurasian tracts (Ext. Fig. 7). Analysis of the common patterns identified in the two trees by distance-matrices and correlation analysis of linguistics and genetics reveal an initial divergence of populations beginning about 30000 years ago, followed by population size changes and highly reduced gene flow from Northeastern to Southwestern Australia, due in great part to the desertic geographic barrier.


Ext. fig. 7 A-B) MDS analysis performed when including all genome sequences (A) and when masking non-boriginal variants. C-D) Comparison of phylogenetic trees computed based on genetic and linguistic analysis respectively.








Selection in Aboriginal Australians

Finally, the authors perform scanning analyses to identify Aboriginal genomic regions which diverge highly in allele frequency since the split from Papuans (about 10000-30000 years ago), followed by identification of genomic regions which diverge highly based on geo-ecological location. Based on this analysis the authors identify two mutations which may have played a role in Aboriginal adaptation to the arid and desert Australian interior.


The authors analyses supports a model by which a OoA movement, followed by the split of an ancestral population which first colonized Australia and predated the physical separation of Sahul into Mainland Australia from the Papuan/New Guinea islands.



Rasmussen, M. et al. An Aboriginal Australian genome reveals separate human dispersals into Asia. Science 334, 94–98 (2011)

Davidson, I. The colonization of Australia and its adjacent islands and the evolution of modern cognition. Curr. Anthropol. 51, S177–S189 (2010)

Lahr, M. M. & Foley, R. Multiple dispersals and modern human origins. Evol. Anthropol. Issues News Rev . 3, 48–60 (1994)

Frichot, E., Mathieu, F., Trouillon, T., Bouchard, G. & François, O. Fast and efficient estimation of individual ancestry coefficients. Genetics 196, 973–983 (2014)

Excoffier, L. Dupanloup, I. Huerta-Sanchez, E. Sousa, VC. Foll ,M. et al. Robust Demographic Inference from Genomic and SNP data. Plos Genetics 10, 1-17(2013)

Green RE, Krause J, Briggs AW, et al. A draft sequence of the Neandertal genome. Science. (56 co-authors). 2010;328(5979):710–722.

Meyer M, et al. A high-coverage genome sequence from an archaic denisovan individual. Science. 2012;338:222–226. doi: 10.1126/science.1224344


Wall JD, et al. Higher levels of Neanderthal ancestry in East Asians than in Europeans. Genetics. 2013;194:199–209


Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, Bondarev AA, Johnson PL, Aximu-Petri A, Prüfer K, de Filippo C, et al. 2014. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514: 445–449

Bishop, CM 1996. Neural Networks For Pattern Recognition. 1 edition. Oxford New York: Clarendon Press

Heaton, J. 2011: Programming neural networks with Encog3 in Java. 2 edition. Heaton Research, Inc.

Posted in Uncategorized | Leave a comment

A journey through the The Simons Genome Diversity Project: more genomes sequenced, more diverse populations


Since the first genome of Bacteriophage MS21 was completely sequenced, in 1976, until 2001 when the first draft of human genome2 was released, a lot of work was done to improve and to make accessible different methods to get inside of the genetics of various organisms. For human genome, this step was a very important one and the Human Genome Project was declared complete in 20033. During the last years, more and more projects are involved in deciphering the human wanderlust. To all of previous studies, we can add The Simons Genome Diversity Project, that brought us more information by sequencing 300 new genomes from 142 diverse populations. One of the aim was to chose populations that differ in genetics, language and culture. The study shows that some of the populations separated 100000 years ago and reveals more information about the ancestors of Australian, New Guinean and Andamanese people.


One of the most important thing in discovering the real human peopling of the Earth is to sequence as many as possible genomes, but from individuals coming from diverse populations, that could differ in many aspects. In this study, the 300 samples were prepared by using PCR-free library, through Illumina Ltd. method and the median coverage it was 42-fold (Figure S1.1; Supplementary Data Table 1). The method is using an improved genome coverage to identify the greatest number of variants with some of them previously reported. The single-sample genotypes was made by using the reference-bias free modification of GATK, but the some preprocessing was conducted for eliminating some adapter sequences. For increasing the data accuracy, it was used a filtering system, highly specific to the SGDP dataset. The levels are from 0 to 9 for each sample as a single character and the first level is the best for having a good balance between sensitivity and low error rate, but level 9 is good to be used when there is needed to low the errors rates (Figure S2.1).

The first part of the study is offering us more information about the time needed for the worldwide populations subjected to the study to get separated. The pairwise sequential Markovian coalescent (PSMC) and multiple sequentially Markovian coalescent (MSMC) was used to interpret the changes in size of the populations and the split time, the phased haplotypes of split time estimation were made by using the SHAPEIT and IMPUTE2. The filter used was the level 1. From the Figure 2a we can see evidence about the ancestors of some present populations that were isolated by at least 100kya, that could be an obstacle of certain mutations across the ancestors of all populations. The gene flow continued until around 50kya among the great majority of ancestral populations. The graphs show the moments when the substructure of different populations starts: in the Figure 2a, we can see that the substructure between french and africans start around 200 kya. In the next ones there is a comparison between only africans (the Yoruba separated from KhoeSan 87kya, from Mbuti 56kya and from the Dinka 19kya) or only non-Africans (the oldest substructure is from 50kya, taking part during or shortly after the deepest part of the shared non-African bottleneck 40-60kya). For the Figure 2d-f, it was used the PSMC and PS1 that show the effective population sizes inferred and the cross-coalescence rates inferred.

By using the neighbours-joining tree (pairwise divergence per nucleotide) and FST, Mallick et. al could reconfirm the previous studies regarding the fact that the deepest splits happened among the Africans. Previous studies showed that all non-Africans today possess Neanderthal ancestry and Figure 1c shows that the higher proportion of Neanderthal ancestry we can find it in East Asians. If we compare the EuroAsians between them, the South Asians have highest Denisovan ancestry (heatmap from Figure 1d). Another result is that there are more Denisovan ancestry in eastern than in western EuroAsians. If we take Australia, New Guinea or Oceania we can see that the results from other studies are confirmed by having more ancestry than in mainland Eurasians. In the Figure 3 the deeper the split is, the more divergent is the early dispersal ancestry. By using the cross-population coalescence pattern and allele frequency correlations, the best model is that the Australian, New-Guinean and Andamanese history doesn’t involve ancestry from an early- diverging source. In this study there is no archeological data taken in consideration regarding southern Asia or Australia. So, by using only the data from this study, it is released that the Australians, New Guineans and Andamanese are lacking in an analogous deep ancestry component. All the data referring to Australians seems to be consistent with descending
from a common homogeneous population since separation from New Guineans. Also, New Guineans, Australians and Andamanese appear as part of an eastern clade together with mainland EastAsians.

The 3P-CLR was used to scan the genome for positive selection. In the end, 38 of the largest peaks emerged for selection in the common ancestors of all modern humans. These peaks are the sweeps at the time that the archeological data shows an accelerated evidence of behavioral modernity. This data does not search for the sweeps on chromosome X or in repetitive or difficult-to-analyze sections of the genome.

For the rate of mutation accumulation between the non-Africans (grouped in America, CentralAsiaSiberia, EastAsia, WestEurasia, Oceania) and sub-Saharan Africans (grouped in Pygmy, Khoesan and Africa) it was supposed to be quite equal, but this study revealed an significant average of 0,5% difference. For this part, they used a highly restriction to the samples, by choosing only the samples processed in the same way and the highest level of filtering, pooling the samples from the same regions together. The one strength of this experiment is the fact that they avoid the bias due to different heterozygosity level in different populations (the heterozygosity is higher in Africans), by using only the chromosome X for males. Although, they map everything to chimpanzee, which is equally distant to all present populations. There are differences in observations related to other studies, by having a different rate of CCT>CTT mutation, that is close to Africans in Europeans, but not in East Asians. This could be explained by the decrease in generation interval in non-Africans since separation. Previous studies5 showed a higher X-to-autosome heterozygosity ratio in sub-Saharan Africans than in non-Africans. Mallick et al. confirmed this results by adding more populations to be analyzed: Khoesan for sud-Saharan Africans and New Guineans, Australians, Native Americans, Near Easternes and indigenous Siberians for the non-Africans. The only one exception, that showed a lower X-to-autosome heterozygosity ratio in sud-Saharan African than in non-Africans is in Pygmies (eastern Mbuti and western Biaka). In the Figure 1b through a scatterplot we can observe the two primary clusters: sud-Saharan Africans and all other populations, but without a big difference among the groups, except of the Pygmies with a high autosomal heterozygosity. If we compare the two Pygmies populations with a lower X-to-autosome ratio, we can see that the Mbuti are closer to non-Africans than to Africans, even if in the Neighbour-joining tree based on pairwise divergence, they are integrated to the Africans. The reduction of the X-to-autosome ratio in the non-African compared to African populations could be explained by the repeated waves of male mixture in already mixed population, but in the Pygmy populations, the strongest argument is the sex-biased gene flow supported by the anthropological data.

In the last part, Mallick et al. shows that the non-Africans are presenting a higher accumulation of mutations. This can be explained in two ways: the rate of mutations in non-Africans is increasing by acceleration of it or by a deceleration within Africans. The Extended Data Table 1, shows that none of the populations with strong signals of non-Africans could be in fact a deceleration of Africans. The acceleration in non-Africans could be caused by many possibilities: the life history traits (eg. generation interval) could change after the dispersal of modern humans outside of Africa, increasing the latitudes conquered by the humans or the colder climates, the gene conversion (GC to A or T alleles) was more effective in Africans or a Neanderthal admixture into the ancestors of non-Africans, that could accumulate more mutations than in the modern humans after separation (but there are not clear evidence about this fact).


The Simons Genome Diversity Project is bringing more information by studying 300 new genomes, from 142 diverse populations, that shows an acceleration of accumulation of mutations in non-Africans compared to Africans. Also, the Pygmies seem to be the only African group with a low X-to-Autosome diversity ratio. Regarding the ancestors, the highest proportion of Neanderthal it was present in EastAsians and an excess of Denisovan in some SouthAsians compared to other Euroasians.


  1. Min Jou, W., Haegeman, G., Ysebaert, M., Fiers, W., Nucleotide Sequence of the Gene Coding for Bacteriophage MS2 Coat Protein, Nat., 237, 5350, pp. 82-88, 1972
  3. International HapMap Consortium. The International HapMap Project. Nat., 426,789–796, 2003.
  4. Keinan, A., Mullikin, J. C., Patterson, N. & Reich, D., Accelerated genetic drift on chromosome X during the human dispersal out of Africa, Nat Genet 41, 66­?70, doi:10.1038/ng.303, 2009.
  5. Mallick, S., Li, H., Lipson, M., Mathieson, I., Gymrek, M., Racimo, F., Zhao, M., Chennagiri, N., Nordenfelt, S., Tandon, A. et al., The Simons Genome Diversity Project: 300 genomes from 142 diverse populations, Nat., 538: 201–206, 2016.
Posted in genomics, human | Tagged , , | Leave a comment

Genomic analyses inform on migration events during the peopling of Eurasia


In the past two decades, considerable research effort has been made to sequence the human genome and subsequently trying to unveil the demographic history underlying the genetic patterns of diversity we observe today across the globe. Here we discuss a recent research article by Pagani et al. 1 that addresses genomic diversity and historic migration patterns of human populations in Eurasia. The first human genome was sequenced in 2003 by the Human Genome Project2 and larger projects rapidly followed, such as HAPMAP3 and the 1000 Genomes Project4, largely due to the considerable technological improvement of sequencing technologies. Despite being extremely useful tools for a number of studies, these genome databases have some important sampling caveats that limit their use to address some particular topics. Indeed, HAPMAP sampled a reduced number of populations whereas the 1000 Genomes sampled a large number of populations but did not attempt to sample individuals of “pure” ancestry. For instance, the sampling in North America focused considerably on city-based individuals that were found to have a very diverse recent ancestry thus blurring the signal of ancient colonisation history. Importantly, in the studied paper, a considerable effort was made on sampling a broad panel of 447 unrelated individuals of pure ancestry from 148 distinct populations, particularly including previously unstudied regions like Siberia and western Asia.

One of the main topics of the demographic history of humans that has long been of interest to researchers is the Out of Africa (OoA) of Anatomically Modern Humans (AMH) – a turning point in which humans dispersed from Africa and colonised Eurasia and ultimately Oceania and the Americas. Among other aspects, the number of OoA events has been the focus of discussion from which two major hypotheses emerged. The first, arguably the most wide-accepted, advocates for a single OoA event estimated at around 40 to 80 kya which gave origin to all extant non-african populations. The second hypothesis, dubbed the multiple-dispersal model5, considers multiple migration waves, more or less successful in settling in new continents, and possible admixture events between them at various points in time, which appears to be supported by previously described fossil evidence6,7. Interestingly, Tucci & Akey8 argue that these theories are not necessarily mutually exclusive but rather complementary as there could have been several failed or low-success OoA events followed by a major one that effectively colonised and subsisted in most continents.

In this study, Pagani et al. argue in favour of a multiple-dispersal scenario based on small remaining genetic contributions in the genomes of extant Papuans from an extinct lineage of AMH OoA earlier than the main OoA 75 kya.

Genetic structure and barriers across space

To obtain the first insight on the genetic structure among the sampled genomes, Pagani et al. employed two different approaches: first, treating SNP as independent markers (with ADMIXTURE9) and second taking into account linkage blocks (with fineSTRUCTURE10). Both strategies identified the major biogeographic groups of populations despite differences in resolution, defining 14 main genetic clusters across the globe (Extended Data Figure 1C). The detailed output from fineSTRUCTURE was interestingly used for a range of analyses from spatial patterns of genetic differentiation (Figure 1), co-ancestry (Extended Data Figure 3) and demographic history reconstruction (Extended Data Figure 7).

Taking advantage of their detailed sampling from Eurasia to Sahul, the authors employed a spatially explicit framework to study genetic differences and gene flow between populations as well as their association with environmental/geographic features at a large scale. Figure 1 illustrates this by representing the magnitude of the gradient of allele frequencies from SNPs across space, allowing to pinpoint the regions of major genetic gradients, i.e. potential barriers to gene flow, specifically mountain ranges, deserts and large water masses. These were consistent in broad strokes among the different analyses with the fineSTRUCTURE output (Figure S2.2.2-I) as well as the complementary migration-based EEMS (Estimating Effective Migration Surfaces; Extended Data Figure 5H). Importantly, the authors tested whether the geographic gaps in their sampling could bias the interpolation of barriers and showed their model remained robust in the face of new gaps (Extended Data Figure 5E-G).

On a second stage, Pagani et al. measured the association between the gradients of allele frequencies (termed as SNPs in Figure 1) and fineSTRUCTURE with three environmental barriers – elevation, temperature and precipitation – to determine the relative importance of the role each played in shaping the genetic patterns observed today. As one can see in the inset of Figure 1, SNPs indicated that elevation and precipitation had a strong spatial correlation with genetic differences whereas fineSTRUCTURE gave higher support to precipitation and temperature. This dissimilarity is likely due to the fact that the latter, as explained above, is dependent on linkage patterns. Linkage blocks are physical associations of loci that recombination renders temporary, unless they are specifically maintained by selection. Thus, current neutral linkage patterns reflect relatively recent demographic history, whereas the bulk of raw allelic frequencies reveal older patterns that influenced the majority of the genome. In the same sense, when taking into account only the rare variants (i.e. more recent), the association of SNPs with elevation was reduced (Figure S2.2.2-II).

The authors conclude these observations by suggesting that elevation contributed to shaping old migration routes (as confirmed by patterns of isolation by distance; Extended Data Figure 5A-C) but has not recently impeded the persistence of human populations. On the other hand, precipitation seems to be of paramount importance as populations continue to this day to avoid inhabiting low-precipitation regions such as deserts.

Despite the credibility of the conclusions, we raised some important questions on the analysis that could bias the interpretation. First, the authors did not address the innate correlation between the environmental variables (ex.: elevation and temperature) nor how or whether it was taken into account. Additionally, it is unclear which time period was used for temperature and precipitation as the study spans 120 thousand years of demographic history. Both these points could change the relative importance of a given variable, and should therefore have been specified clearly in the main text.

Selection screening

The authors scanned the genomes for evidence of purifying and positive selection through a series of different approaches and identified multiple candidate loci, some of which had been identified as targets of positive selection in previous studies. Additionally, the authors highlighted different levels of inter-population purifying selection, such as on olfactory receptor genes in Asians. Interestingly, they identified significantly stronger purifying selection in pigmentation and immune response genes in Africans than in the remaining populations, with the single exception of Papuans for the pigmentation genes (Extended Data Figure 6B). However, the authors did not discuss the possible factors behind such selective forces nor how this section on selection contributed to the main storyline and conclusions of the study.

Demographic history of Papuans

The results of fineSTRUCTURE were summarised with ChromoPainter and revealed very interesting patterns of haplotype co-ancestry and length as well as proportion of shared genome between populations. Leading is the observation that African populations display the highest co-ancestry (Extended Data Figure 3) and the shortest haplotypes (Figure S2.2.1-III), confirming their status as the oldest and most diverse populations. Short haplotypes reflect multiple recombination events through time indicating older ancestry. Thus, the most surprising observation was that Papuans have the shortest average haplotype length of all non-African populations (Figure S2.2.1-III), as well as the shortest African-inherited haplotypes (Extended Data Figure 7), which suggests an older ancestry with Africans than that of the remaining populations.

To investigate this further, the authors used multiple sequential Markovian coalescent (MSMC) to determine mean split times between genomes of Papuans and other populations, and it is represented in Figure 2A. This figure depicts the proportion of genome coalescing between populations over time (in logarithmic axis). However, it is important to take into account that for these calculations they used a generation time of 30 years, whereas the selection scans were done with a 25 years’ generation time. The latter is the most commonly used in the literature and no justification is given for this change. This analysis revealed an old split between the Papuan and African at about 90 kya (represented as Koinanbe in Figure 2A, red line), predating the split between Eurasian and African estimated at 75 kya (black line) and between Papuan and Eurasian at 40 kya (blue line). Despite the possible fluctuation in the absolute split times due to the chosen generation time, the relative differences between them is in line with Papuans harboring high amounts of short haplotypes, all suggesting an older population split than previously thought.

To explain the demographic history behind the observed patterns, the authors propose that a previously unknown admixture event took place in Sahul with either an archaic non-AMH (different from Denisovan and Neanderthal) or with a AMH resulted from an extinct OoA (xOoA). The latter hypothesis, which fits into the multiple-dispersal model explained earlier, would have taken place after the split of AMH with Neanderthal but before the main OoA.

Using coalescent simulations, the authors tried to replicate the split times by adding varying amounts of admixture with a non-AMH or with an AMH from a xOoA. There was no plausible scenario simulated of archaic admixture with non-AMH that could mirror the observed data. On the other hand, including in Papuans a genomic component that diverged from the main human lineage prior to the main OoA, replicated somewhat similar population split times. It is noteworthy that the main text indicates the “observed shift in the African-Papuan MSMC split curve can be qualitatively reproduced” under these conditions. In detail, it obtained a 3ky difference between the Papuan-African and Papuan-Eurasian splits (Figure S2.2.8-III) whereas the observed time-gap between the two is actually 15 kya (Figure 2A). The authors suggest that they may not be able to reach a comparable gap due to higher complexities of the demographic model that were not simulated within this study, such as population expansion and bottlenecks. Although this explanation appears reasonable, we believe it ought to have been made clear in the main text of the article.

To discern the weight of admixture with non-AMH, the authors masked putatively introgressed Denisovan haplotypes in Papuan genomes, which did not change the split times estimated between Papuans and the other populations (dashed lines in Figure 2A). Furthermore, the authors confirmed that MSMC behaved linearly through multiple events of admixture by studying populations with known admixture proportions in time (African Americans and Central and East Asians; Extended Data Figure 8), which allowed the calculation that the hypothesized xOoA would have split from most Africans around 120 kya (Supplementary Information 2.2.4).

On a supplementary line of examination, Pagani et al. looked at the age of African haplotypes in Papuans not present in other Eurasian populations by accessing the density of non-African alleles (nAAs) within them. The rationale behind this lies on the assumption that the rate of accumulation of nAAs, i.e. alleles not found among African genomes, within a haplotype of determined African origin in a non-African genome is proportional to the split date of that given population with Africans. First, this analysis revealed that Papuans had an overall higher amount of nAAs within African haplotypes along the genome than Eurasians (Figure 2B), indicating an older coalescent time with the Africans. Further, the proportions of nAAs within African haplotypes in Papuans were modeled under demographic scenarios of single and multiple-dispersal. The results showed that a xOoA of AMH that split around 120 kya from Africans was necessary to explain the constant elevated proportions of nAAs in Papuans (Figure 2D).

Combining results from the different approaches, the authors support an xOoA that split from Africans around 120 kya, and conclude by estimating it contributes to approximately 2% in contemporary Papuan genomes.


In this wide-ranging study, Pagani et al. discussed three main topics of human evolutionary biology in Eurasia using their extensive sampling: i) detect main geographic barriers to gene flow, ii) identify loci and ultimately pathways under selective pressure and iii) propose an extinct Out of Africa event earlier than 75 kya.

The latter was arguably the most important finding of this study with, as described above, the description of a 2% contribution in the genome of Papuans from an early xOoA. The authors provided multiple lines of compelling evidence pointing to an extinct Out of Africa expansion around 120 kya from Africans that admixed with the main OoA later in Sahul. The complete scenario is described in Extended Data Figure 10.

Nevertheless, the results presented in this paper and their associated methods are consistently poorly detailed and/or not self-explanatory. Such a paper covering a trendy topic in a high impact journal should be less indigestible for neophytes or even to fellow evolutionary biologists. Furthermore, the connection between the three main sets of analyses of the study (geographic barriers to gene flow, selection screening and the possibility of an xOoA) seems to be lacking as there is no global discussion bringing all points together.

Written by Ana Paula Machado and Clément Train.

Studied papers
Tucci & Akey 2016 Population genetics: A map of human wanderlust. Nature 538: 179–180
Pagani et al 2016 Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538: 238–242


  1. Pagani, L. et al. Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538, 238–242 (2016).
  2. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
  3. International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).
  4. 1000 Genomes Project Consortium et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
  5. Lahr, M. M. & Foley, R. Multiple dispersals and modern human origins. Evolutionary Anthropology: Issues, News, and Reviews 3, 48–60 (2005).
  6. Groucutt, H. S. et al. Rethinking the dispersal of Homo sapiens out of Africa. Evol. Anthropol. 24, 149–164 (2015).
  7. Liu, W. et al. The earliest unequivocally modern humans in southern China. Nature 526, 696–699 (2015).
  8. Tucci, S. & Akey, J. M. Population genetics: A map of human wanderlust. Nature 538, 179–180 (2016).
  9. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
  10. Lawson, D. J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012).
Posted in genomics, human | 2 Comments

Papers to discuss Autumn 2017: Human demography and Convergent evolution

This Autumn, we will discuss two series of papers:

Series 1: Human demography

News & Views: Tucci & Akey 2016 Population genetics: A map of human wanderlust. Nature 538: 179–180

6 Oct 2017: Pagani et al 2016 Genomic analyses inform on migration events during the peopling of Eurasia. Nature 538: 238–242

2 7 Oct 2017: Mallick et al. 2016 The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538: 201–206

3 Nov 2017: Malaspinas et al 2016 A genomic history of Aboriginal Australia. Nature 538: 207–214

10 Nov 2017: Skoglund et al 2017 Reconstructing Prehistoric African Population Structure. Cell 171: 59–71.e21

Series 2: Convergent evolution

17 Nov 2017: Levin & Mishmar 2017 The genomic landscape of evolutionary convergence in mammals, birds and reptiles. Nature Ecology & Evolution 1: 0041

24 Nov 2017: Burga et al 2017 A genetic signature of the evolution of loss of flight in the Galapagos cormorant. Science 356: eaal3345 (also see preprint comment by Berger and Bejerano)

8 Dec 2017: Reid et al 2017 The genomic landscape of rapid repeated evolutionary adaptation to toxic pollution in wild fish. Science 354: 1305-1308

15 Dec 2017: Huang et al 2016 Convergent evolution of caffeine in plants by co-option of exapted ancestral enzymes. PNAS 113: 10613–10618

Posted in paper list | Leave a comment