The genetic sex-determination system predicts adult sex ratios in tetrapods

Genetic sex determination, i. e. the determination of sexual phenotypes by the effect of sex-determining genes, is found in the majority of vertebrates. Sex determination genes have evolved multiple times independently and can be located on different chromosomes. Depending on whether the presence of the sex determining region (SDR) determines female or male sex, genetic systems of sex determination are called ZW or XY systems respectively and the sex which is heterozygous for the SDR is called the heterogametic sex. Lower fitness in the heterogametic sex has long been observed in interspecific hybrids in a wide range of animal and even plant species, an observation called Haldane’s rule. In this paper the authors find a similar pattern in (non-hybrid) tetrapod species: by comparing the adult sex ratio in XY and ZW systems in 344 tetrapod species, they find that the ASR is skewed towards the homogametic sex (towards females in an XY system and towards males in a ZW system).

This observation is based on a dataset containing known genetic sex determination systems and adult sex ratios (ASRs) of species across the vertebrate phylogeny. Within amphibians and reptiles (in which both XY and ZW systems are found), the authors show that ASRs in ZW systems are significantly more male biased than in XY systems and that the proportion of species with male-biased ASRs is greater in ZW than in XY systems. Furthermore these observations hold true for the combined dataset of amphibians, reptiles, mammals (which have a conserved XY-system and male-biased ASRs), and birds (which have a conserved ZW system and female-biased ASRs).

It is important to test whether these observations are actually caused by the GSD or whether there are other factors, which could systematically influence ASR:

– ASRs could be influenced by body size and breeding latitude through correlated life history traits like development, growth and reproductive ecology.

– Differences in body size and dispersal between sexes can lead to differences in mortality which influence ASRs.

The authors account for potential effects of sex-biased dispersal, body size, breeding latitude and sexual size dimorphism in a phylogenetically corrected multi-predictor analysis. Although they do find a significant correlation between sexual size dimorphism and ASR as well as between sex-biased dispersal and ASR, the effect of the GSD remains significant in all cases. Because the dataset for sex-biased dispersal is limited to 32 species in total, which is less than 10% of the number of species in the complete dataset, it is not included in the main multi-predictor model.

Another important factor is the effect of phylogenetic relatedness between species: The effects of GSDs on ASRs of more closely related species are more likely to be correlated due to shared genetic and phenotypic traits.

To account for this, phylogenetic corrections, which are based on composite phylogenies of different tetrapod groups, are applied. As these composite phylogenies don’t include branch length information, different methods are used to assign arbitrary branch lengths, which has surprisingly little effect on the results. Two different methods are applied to account for phylogenetic relatedness across samples: Phylogenetic generalized least squares (PGLS) models to test for differences in ASRs between XY and ZW taxa and Pagel’s discreet method (PDM) to test the fit of dependent and independent models of transitions in ASR bias and GSD. As the second model implies, the number of transitions between GSDs should be more important than the phylogenetic relatedness between species. The author’s claim to take this into account by rerunning their analyses while reducing three large groups with a known shared sexual system (mammals, birds and snakes) to a single datapoint, resulting in unchanged significant differences in ASRs between GSDs.

I wonder whether it would also make a difference to reduce further groups, which share non-independent evolution of SDRs, to single datapoints. For example this dataset includes five species of lizards from the family Lacertidae, which are assumed to share a conserved GSD (Rovatsos et al. 2016) and 9 lizard species of the genus Anolis included in the dataset are likely to share a common sex chromosome system (Gamble et al. 2014). Furthermore in many amphibians and reptiles nothing is known about synteny across sex chromosomes and it is likely that a rigorous reduction of GSDs with common ancestry into single datapoints would reduce the number of independent observations and thus statistical power.

However, the number of relevant datapoints in amphibians is fairly limited anyway: Amphibian species with an XY sex determination system show no significant ASR bias (or even a slight male bias after phylogenetic correction). Thus the observed effect within amphibians relies on data for only 11 species with a ZW system.There are good reasons to be careful when making general conclusions from this dataset:

Sex reversal is common in some amphibian species, which could bias the observed ASRs. Furthermore, although the authors claim to have included only species with known GSDs, the GSD for amphibians with homomorphic, microscopically indistinguishable sex chromosomes is difficult to determine and frequent subject of scientific dissent.

One example for this is Bufo viridis. The ASR of B. viridis is strongly male biased (0.70), and the GSD is supposed to be a ZW system based on the entry from However, the claim that B. viridis is female heterogametic is based on a single study, which detected that all seven females examined in a single Moldavian population were heterozygous for a chromosomal inversion. Such a pattern has never been found in any other green toad population, but instead multiple sex linked genetic markers have been developed, which show male-heterogametic segregation patterns in crosses from different B. viridis populations as well as in the closely related species B. siculus, B. balearicus and B. variabilis (Stöck et al. 2011). In my opinion it would be more appropriate to assign B. viridis to species with XY system, which would result in a decrease in the overall differences in ASRs between both groups.

Possible reasons for the effect of the sex-determination system on adult sex ratios

In general, a skewed adult sex ratio can have two different reasons: a skewed gametic sex ratio or higher mortality of one sex resulting in different sex ratios in adults. In more detail six potential not mutually exclusive explanations of how the GSD could bias adult sex ratios are proposed and discussed:

– Sexual selection in males could increase mortality.

This would be expected to result in a bias towards females in XY and ZW systems and cannot explain male biased ASRs in ZW systems.

– Recessive deleterious mutations on X/Z chromosomes or Y/W specific deleterious mutations.

Recombination suppression on sex chromosomes leads to degeneration of the sex-linked region on Y /W chromosomes, which can result in adverse fitness effects caused by either deleterious mutations on the Y/W, or deleterious recessive mutations on the hemizygous part of the X/Z chromosome.

Based on a population genetic model they develop, the authors claim that the accumulation of deleterious mutations may not be enough to cause the observed adult sex-ratio bias. However, they admit that many of their parameter estimates are very crude and results may vary when other factors are taken into account, like large differences in the rate of deleterious mutations.

The number of deleterious mutations is expected to increase with increasing sex chromosome differentiation and degeneration. Sex chromosome differentiation in tetrapods spans a wide range from completely homomorphic sex chromosomes in many lizards and amphibians but also in some families of snakes and birds to complete loss of the Y chromosome in some mammals. It would thus be interesting to look if there is an association between variable sex chromosome degeneration and skews in the ASR within groups with homologous sex chromosomes.

– Imperfect dosage compensation.

In the heterogametic sex, genes located in the hemizygous region of the X/Z chromosome are present in only one functional copy. In order to reach similar expression levels as in the homogametic sex, the expression of these genes has to be increased. However, research has shown that not all genes are upregulated in the same way and as a result many sex chromosomal genes have a lower expression levels in the heterogametic than in the homogametic sex.

This explanation is unlikely to result in a general pattern across tetrapods, because there are different mechanisms of dosage compensation in vertebrates: mammals deactivate one X chromosome in females to compensate for gene loss on the Y chromosome, while birds show incomplete dosage compensation on a gene-by-gene basis. Since one X is deactivated in the homogametic sex in mammals, we would expect to find sex-specific fitness differences based on dosage compensation only for non-mammals.

– Meiotic drive:

Meiotic drive systems are genetic variants, which favor their own transmission by distorting sex ratios at meiosis. The authors point out, that the observed skews in ASR are unlikely to be caused by meiotic drive, because the sex ratio at birth does not predict the adult sex ratio in mammals and birds. However, there is little information on sex ratio at birth in reptiles or amphibians. Furthermore, a better measure for the effect meiotic drive would be the gametic sex ratio, since the sex ratio may be already skewed at birth due to sex-specific differences in embryonic mortality.

– More rapid degeneration of X and Y chromosomes during lifetime:

The author’s propose, that the Y/W may be more affected by further degeneration during lifetime (for example by increased telomere shortening or loss of epigenetic marks). To my knowledge this is rather speculative, as I am not aware of any results supporting this hypothesis.

– Sexually antagonistic selection:

Loci, which are only beneficial to one sex, but may be detrimental to the other are expected to accumulate on sex chromosomes. In an XY-system, male beneficial loci are expected to be found in linkage disequilibrium with the SDR, which ensures that they are exclusively transmitted to males. The positive fitness effects of these Y/W-linked sexually antagonistic mutations would thus result in a postive skew towards the heterogametic sex (although the evolution of recombination suppression may introduce further degeneration of the Y/W chromosome, which can be detrimental). Furthermore, the authors develop a model for sexually antagonistic selection of loci located on X/Z chromosomes and come to the conclusion, that there are no robust generalizations about the direction of the skew of the adult sex ratio resulting from these loci.

The authors point out, that there is no clear support for any of these hypothesis. Further research could test the assumptions of some of these hypotheses: Recessive deleterious mutations on X/Z chromosomes or Y/W specific deleterious mutations, imperfect dosage compensation and sexually antagonistic selection are all related to sex chromosome degeneration and recombination suppression. Although it is difficult to comparatively quantify sex chromosome degeneration across species, more high quality sequences of sex chromosomes are becoming available and it may soon be possible to link sex chromosome degeneration on a gene level to sex specific fitness differences. A very crude proxy for this would be to include whether sex chromosomes are microscopically distinguishable (heteromorphic) or indistinguishable (homomorphic) in this analysis and test whether this explains significant variance in ASRs. Also further research could clarify whether there is a connection between ASR and sex ratio at birth or even better gametic sex ratio in amphibians or reptiles, which could be indicative of meiotic drive.


Overall, I am skeptical that comparing sexual systems as a simple binary character (male or female heterogametic) does adequately represent the diversity of tetrapod sex chromosome systems and I expect that fitness differences should be more related to sex chromosome degeneration than to the GSD itself. Although a significant proportion of the interspecific variation in ASRs is explained by the GSD in groups with variable sex determination systems, there are multiple possible confounding factors (like sex reversal, problems in determining GSDs, uncertainty of common ancestry of GSDs), which could easily lead to biases in the relatively small number of observations in these groups.


Gamble T, Geneva AJ, Glor RE, Zarkower D (2014). Anolis sex chromosomes are derived from a single ancestral pair. Evolution.68(4):1027-41

Rovatsos M, Jasna V, Altmanova M, Johnson Pokorna M (2016). Conservation of sex chromosomes in lacertid lizards. Molecular Ecology.

Stöck M, Croll D, Dumas Z, Biollay S, Wang J, Perrin N (2011). A cryptic heterogametic transition revealed by sex-linked DNA markers in Palearctic green toads. Journal of Evolutionary Biology. 24:1064-1070

Posted in Uncategorized | Leave a comment

Identification of a large set of rare complete human knockouts

High throughput genotyping and sequencing has led to the discovery of numerous sequence variants associated to human traits and diseases. An important type of variants involved are Loss of Function (LoF) mutations (frameshift indels, stop-gain and essential sites variants), which are predicted to completely disrupt the function of protein-coding genes. In case of Mendelian recessive diseases, for the condition to occur, the LoF variants must be biallelic, i.e. affecting both copies of a gene. The affected gene is then defined as “knockout”.

By studying the Icelandic population, authors aim to identify rare LoF mutations (Minor Allele Frequency, MAF < 2%) present in individuals participating in various disease projects. They then investigate at which frequency in the population these LoF mutations are homozygous (i.e. knockout) in the germline genome.

The Icelandic population Iceland is well-suited for genetic studies for three main reasons. The island was colonized by human population around the 9th century by 8-20 thousand settlers. Since then the population grew to around 320’000 inhabitants today. The initial founder effect and rare genetic admixture make the Icelandic population a genetic isolate. In addition to an unusual genetic isolation, Iceland’s population benefits of a genealogical database containing family histories reaching centuries back in time, as well as a broad access to nationwide healthcare information.

These characteristics led to the development of large-scale genomic studies of Icelanders by deCODE Genetics. This biopharmaceutical company has published various studies, including this paper, related to genetic variants and diseases in Icelanders.

Loss of function mutation and rare complete knockouts Authors sequenced the whole genome of 2’626 Icelanders participating in various disease projects and identified variants in protein coding genes. These variants were annotated with the predicted impact that they have on the gene: LoF, moderate or low impact. A total of 6’795 LoF mutations in 4’924 genes were identified, with most of these variants (6’285) being rare (MAF < 2%).

The identified LoF variants were imputed into an additional 101’584 chip-genotyped and phased Icelanders, allowing the identification of the number of knockout genes in the population. Authors found that 1’485 previously identified LoF mutations (MAF <2%) are contributing to the knockout of 1’171 genes and that 8’041 individuals possess at least 1 of these knockout genes. Out of these 1’171 genes, 88 had been already linked by previous studies to conditions through a recessive mode of inheritance.

Double transmission deficit of LoF variants Because knockout genes should be deleterious for an organisms, we expect a deficit of homozygous for these genes in the population due to embryonic/fetal, perinatal or juvenile lethality. To investigate whether such a deficit was present, authors calculated the transmission probability of LoF variants from parents to their offspring.

Under Mendelian inheritance, the expected percent of transmission of the LoF mutated gene from heterozygous parents to their offspring (i.e. double transmission) is of 25%. However, results show a statistically significant deficit in double transmission, the observed double transmission probability being of 23.6%.

The rare LoF mutations were ranked according to the Residual Variation Intolerance Score (RVIS) percentiles and essentiality score percentiles. Both measures attempt to classify genes according to their tolerance to functional variation, with the lowest rank corresponding to genes being more sensitive to mutations. As expected, the lowest double transmission rate was found for the most sensitive genes (first percentile), suggesting that a homozygous state of LoF mutation in these genes is deleterious.

Tissue specific expression of knockout genes Authors investigated if genes were more likely to be knockout when expressed in specific tissues. By retrieving the information from previous studies of the number of genes that are highly expressed in 1 or more – but not all – 27 tissues, they calculated the fraction of these genes that were knockout in each tissue. They found that the brain and placenta were the tissue with the lowest fraction of knockout genes (3.1% and 3.9%, respectively), and that in testis, small intestine and duodenum were observed the highest fraction of biallelic LoF mutations (5.8%, 6.4%, and 6.9% respectively).

Conclusion and Comments The characteristics of Icelandic population and the incredibly large sample size (~ 1/3 of the total population) allowed authors to identify a large number of new and rare LoF mutations. Part of these mutations was shown to contribute to the knockout of an unexpected large number of genes in an unexpected large number of people. This study is the first to shed a light on the astonishing number of knockout present in human populations. In addition, by investigating the transmission probability, a deficit in homozygous loss-of function offspring was identified, especially when LoF mutations affected essential genes. This result was expected because of the predicted deleterious effect of biallelic LoF mutations.

Besides the aforementioned interesting results of the paper, some aspects were slightly disappointing. First, I was expecting authors to focus more on the genotype-phenotype aspects. Even if they pinpoint a deficit in double transmission, suggesting deleterious consequences for the organism, authors did not discuss the function of the identified knockout genes and their effect on the phenotype. Second, the paper was not an easy read. Many results were only mentioned without additional information on the methods or data used, and it was sometimes difficult to link them with the main aim of the study. Additionally, figures were sometimes misleading because of different axis scales or incomplete legends.

Finally, authors suggested that important tissues, such as the brain, have a lesser number of knockout compared to other tissues, writing that “genes that are highly expressed in the brain are less often completely knocked out than other genes”. However, this result is questionable as we do not have any measure of the number of knockout genes that we expect to be expressed only by chance in the tissues. In other words, the brain could have a lower number of knockout genes expressed compared to other tissues only because the total number of expressed genes in the brain is lower. Therefore we do not know if the lower number of knockout genes in the brain is due to chance or to biological reasons.

Nevertheless, this study opens the door to understanding how many knockout genes occur without phenotypic consequences in humans, what are the genes function and essentiality, and the role of the environment in the buildup of phenotype. The classical search for genetic variants associated to a phenotype, as in GWAS studies, could be reversed by first identifying individuals with the same genetic variants and then precisely phenotyping them.

Sulem, P., Helgason, H., Oddson, A., Stefansson, H., Gudjonsson, S., Zink, F., Hjartarson, E., Sigurdsson, G., Jonasdottir, A., Jonasdottir, A., Sigurdsson, A., Magnusson, O., Kong, A., Helgason, A., Holm, H., Thorsteinsdottir, U., Masson, G., Gudbjartsson, D., & Stefansson, K. (2015). Identification of a large set of rare complete human knockouts Nature Genetics, 47 (5), 448-452 DOI: 10.1038/ng.3243

Posted in genomics, human | Leave a comment

Supergenes and social organization in a bird species




Cindy Dupuis, Xinji Li, Casper van der Kooi


The development of new molecular mechanisms and next generation sequencing techniques have advanced our knowledge on the genetic basis underlying phenotypic polymorphism. Over the coarse of recent years, scientific studies have documented large genomic regions with drastic phenotypic effects, the so-called supergenes. A supergene is a set of genes on the same chromosome that exhibit close genetic linkage and thus inherits as one unit.

The evolution of a supergene requires that multiple loci with complementary effects become linked (i.e. they are genetically clustered and recombination between the loci is suppressed) and that optimal alleles at the linked loci are combined. Genetic clustering of different loci can occur when, via mutation, an adaptive interaction between two closely placed loci is created. In addition, gene duplications or translocations that generate a series of (novel) complementary genes can give rise to supergenes. The probability of a recombination event occurring in between loci depends on various factors. The chance of a recombination event occurring in between two loci will be small when the loci are located closely together, as the chance of a recombination event in between two loci generally decreases with physical distance between the loci. Given the large size of supergenes, additional mechanisms seem, nonetheless, important. This can, for instance, be maintained via structural differences, such as inversions, between the supergene and their homologous chromosomal region.

An interesting example of a supergene in an invertebrate is the case documented by Purcell et al. (2014). They documented a large, nonrecombining region that is association with social organisation in an ant species. The nonrecombining region was found to largely constitute one chromosome and was hence aptly called the ‘social chromosome’. They find a structurally similar region with similar effects in another ant species, however the regions exhibit no homology, suggesting parallel evolution of the social chromosome. Examples of vertebrates social systems determined by supergenes are, to our knowledge, unknown.

Two recent articles (Küpper et al., 2016; Lamichhancy et al., 2016) revealed a single supergene controlling alternative male mating tactics in the ruff (Philomachus pugnax). The studies were carried out independently by two research groups, but reach almost the same conclusions. The ruff (Philomachus pugnax) is a lekking wader known for the great diversity in the male plumage color and behavioral polymorphism. Three types of males can be distinguished; these types are characterized by differences in territoriality and behavior that are highly correlated with differences in nuptial plumage and body size. Predominantly dark-colored Independent males are most common (80-95% of males), these males defend small territories on a lek. Smaller, lighter colored Satellite males (5-20%) are non-territorial and less strict to a particular lek. Satellite males make use of – and are largely tolerated by – the residences of Independent males. The third type are the Faeder males, which are very rare (<1% of males). Faeder males lack male display, are small and resemble the unornamented females; however, they have disproportionately large testes.

Previous studies using pedigrees of large, captive populations showed that reproductive polymorphism follows a single-locus autosomal pattern of inheritance (Lank et al., 1995; Lank et al., 2013). The dominant Faeder allele controls development into Faeder males, whereas the Satelllite allele (that is dominant to Independent) controls development into Satellite or Independent males. Ekblom et al. (2012) studied the nucleotide sequence variation and gene expression in ornamental feathers from 5 Independent and 6 Satellites males using transcriptome sequencing. No significant expression divergence of pre-identified coloration candidate genes was found, but many genetic markers showed nucleotide differentiation between the two morphs. Later, Farrell et al. (2013) used linkage analysis and comparative mapping to locate the Faeder locus, and found linkage to microsatellite markers on avian chromosome 11 that included the Melanocortin-1 receptor (MC1R) gene, a strong candidate in alternative male morph determination, because it is considered to be important in plumage coloration.

Using the captive population that was previously phenotyped, Küpper et al. now set out to determine the genomic structure of the existing morph divergence in P. pugnax. The first step in their analysis was to generate and annotate the full genome for one Independent male. Followingly, the authors identified SNPs in the population using RAD sequencing. More than one million SNPs could be distinguished, and Faeder and Satellites could be mapped to a genetic map based on 3’948 SNPs. Interestingly, both morphs mapped to the same region on chromosome 11, but exhibited clear structural differences. This was corroborated by a GWAS analysis on 41 unrelated Satellite, Independant and Faeder males from a natural population.


In order to characterize the genomic region more precisely, they conducted a whole genome sequencing of a small set of Independent, Satellite and Faeder males. They showed that the region on chromosome 11 was highly differentiated between Satellite and Faeder morphs and that this region contained a greater nucleotide variation compared to the adjacent regions. Using the reads orientation, they found clear evidence for an inversion of the chromosomal regions between the different morphs. Interestingly, they found that one breakpoint occurs within an essential gene, CENPN (encoding centromere protein N, recessive lethal), which implies that individuals homozygous for the inversion are not viable – an observation that is confirmed by breeding experiments. The authors also suggested a recombination event or gene conversion to have occurred between the Satellites and Independent alleles.


By comparing gene sequences among morphs, the authors discovered that 78% of the gene sequences were different between morphs, and that those differences had the potential to change the encoded protein. Among the divergent genes, some where found to be involved in hormonal production, like HSD17B2, an enzyme inactivating testosterone and estradiol. Varying specifically depending on the morph, this enzyme may alter steroid metabolism and explain partly why plumage patterns and behavior is different between morphs. The MC1R gene was also found within the altered genomic region. This gene is considered an important locus controlling color polymorphism, which could be at the source of the reduced melanin levels in satellites. The PLCG2 gene, which has been rearranged in Faeders, was found to be a candidate gene for the rather feminine appearance and non-aggressive behavior in Faeders. Presumably, this gene is part of a cascade leading to the development of the usual impressive plumage of other males morphs.


In a second article, Lamichhancy et al., 2016 studied a natural ruff population using whole-genome sequencing. They first established a high-quality reference genome assembly from an Independent male and conducted functional annotation based on both evidence data and de novo gene predictions. Then, whole-genome resequencing and SNP calling were performed for 15 Independent, 9 Satellite and 1 Faeder males. Their genome-wide screen for genetic divergence estimates (FST) between different male morphs identified a 4.5-Mb region, based on which Independents and Satellites could be phylogenetically clustered as distinct groups. Screening for structural variants identified a 4.5-Mb inversion in Satellites that perfectly overlapped with the differentiated region. In addition, PCR-based sequencing confirmed the positions of proximal and distal breakpoints and identified a 2,108-bp insertion of a repetitive sequence at the distal breakpoint. Diagnostic tests showed that Satellite males were heterozygous (S/I), while most Independent males were homozygous (I/I). They suggested the Independent allele to represent the ancestral state, which is consistent with the conserved synteny among birds.

The comparison between Faeder and Independent males showed that the genetic differentiation was equally strong across the same region, creating a mirror image of the differentiation pattern between Satellites and Independents. Accordingly, the region could be subdivided into two parts: region A where Satellite and Faeder chromosomes were closely related and less closely related to Independent, and region B where the Satellite and Independent loci were closer related and divergent from Faeder. Since an inversion is expected to reduce the amount of recombination within the region between the wild-type (I) and mutant alleles (either S or F), the disruption of the differentiation pattern might be considered the result of one or two recombination events between an Independent and a Faeder-like chromosome. The divergence time between the Independent allele and Satellite or Faeder alleles was estimated to be approximately 4 million years, using the nucleotide divergence and estimated mutation rates for birds. The last recombination event was estimated to occur 520,000 ± 20,000 years ago.

To better understand the genetic consequences of the inversion and relate it to the phenotypic variantion in male ruffs, the authors searched for candidate mutations amongst the genes in the inverted region. Mutations in several genes with important functions were found on Satellite and Faeder chromosomes, including the abovementioned CENPN, HSD17B2 and MC1R genes as well as and SDR42E1 (the latter one is important for the metabolism of sex hormones). Missense mutations in derived MC1R were found to be associated to the Satellite and Faeder alleles, hinting at a potential mechanism explaining the male plumage polymorphism during breeding season.

In conclusion, these two studies demonstrated presence of a genomic inversion that led to the evolution of a supergene. This supergene determines the complex phenotypic variation in male ruffs. These two papers contribute to our understanding of supergenes, complex phenotypes and social organization.


Küpper C, Stocks M, Risse JE, Dos Remedios N, Farrell LL, McRae SB, Morgan TC, Karlionova N, Pinchuk P, Verkuil YI, Kitaysky AS, Wingfield JC, Piersma T, Zeng K, Slate J, Blaxter M, Lank DB, & Burke T (2016). A supergene determines highly divergent male reproductive morphs in the ruff. Nature genetics, 48 (1), 79-83 PMID: 26569125

Posted in evolution, genomics, Uncategorized | Tagged | Leave a comment

Reconstructing human population history : ancestry and admixture

Understanding the evolutionary history of our own species, how migration and mixture of ancestral populations have shaped modern human populations is a key question in evolutionary biology. Here we present three articles related to this topic, the first two dealing with India and the third one focusing on a single Ethiopian group :

1) Moorjani et al 2013 Genetic Evidence for Recent Population Mixture in India AJHG 93,: 422–438

2) Basu et al 2016 Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure PNAS online before print

3) Van Dorp et al 2016 Evidence for a Common Origin of Blacksmiths and Cultivators in the Ethiopian Ari within the Last 4500 Years: Lessons for Clustering-Based Inference PLOS Genetics 11(8): e1005397

All of them use genome wide data from micro array. After a brief abstract of each paper, showing their similarities and differences, we discuss their methodological approaches.

Ancestral populations of India

The aim of the first two articles is to understand the history of the populations of the Indian subcontinent. The first one (Moorjani et al 2013) reports data from 73 groups living in India for more than 570 individuals sampled. The authors filtered out the data by removing all individuals with evidence of recent admixture or recent ancestry from out of India. The populations that were included in the analysis can be classified into two linguistic categories: the ones speaking Indo-European languages and the ones speaking Dravidian languages.

Figure 1 : map of sampled population (A) and PCA of 70 indians groups and some non-indians, highlighting the “Indian cline” (B)

Previous genetic evidence indicates that most of the groups of India descend from a mixture of two distinct ancestral populations: Ancestral North Indians (ANI) and Ancestral South Indians (ASI). Three different hypothesis exist for the date of mixture of these two populations:

1) arrival of ANI is due to migration prior to agriculture about 30,000-40,000 years ago

2) ANI arrived with the spread of agriculture who probably began around 8,000 and 9,000 years ago

3) ANI arrived very recently (3,000-4,000 years ago) when the Indo-European languages presumably began to be spoken in India.

To prove the admixed origin of Indian groups and estimate the proportion of each ancestry in each population they use a PCA and a statistic called F4 ratio that infers the mixture proportion measuring the correlation in allele frequencies between each pair of groups. They demonstrated that all populations are admixed and lie along an “Indian cline”, that is a gradient going from 17% of ANI ancestry to 71%. These results correlate well with geography and language, with the northern Indo-European populations having more ANI ancestry than the southern Dravidian ones. Then they use linkage disequilibrium (LD) to estimate the dates of admixture : LD blocs are longer if the admixture is younger. By fitting an exponential function to the decay of LD (that is expected from a sudden cessation of admixture) they could estimate that admixture occurred between 1,856 and 4,176 years ago, supporting the third hypothesis. These results correspond with demographic and cultural changes observed in India with the establishment of the caste system leading to strong endogamy that stopped the admixture rapidly. Moreover they found that Indo-Europeans groups have more recent admixture dates, which could be explained by multiple waves of mixture in these populations. Another finding of this paper is that aboriginal Andaman Islanders (Onge) belong to a sister group of ASI.

The second article (Basu et al 2016) has the same focus region and use the same basic dataset, except that the authors kept the all populations in the analyses, including the austro asiatic (AA) and tibeto burman (TB) speakers. They first ran ADMIXTURE on all populations and showed that islanders and mainland populations have distinct ancestral components (islanders share ancestry with oceanic peoples like Papuans). In a second time they ran the same analysis on mainland populations only (thus excluding population from the Andaman and Nicobar islands). The best model was composed of four ancestral components, the ANI, the ASI as well as the ancestral AA and TB and they found that several present day populations are almost pure representatives of these ancestral components (figure 2).

Fig. 2 : PCA of the 18 mainland Indian populations, the four clusters identified by the authors are surrounded (A). Admixture plot of mainland Indian populations with four ancestral components (K = 4, the most parsimonious) (B).

They further estimated the time and extent of admixture using the degree of fragmentation (due to recombination) of haplotypes blocs originating from a donor population into the recipient population. In each population, the distribution fitted again with an exponential curve. They showed that admixture abruptly came to an end about 1575 years ago in upper-caste populations, most likely due to the establishment of endogamy, while tribal populations seemed to have admixed until 1500-1000 years ago.

In short, although they share a common topic, these two papers propose divergent versions of the history of Indian population : while the first considers a priori that austro asiatic and tibeto burman speakers are not component of the ancestral populations of India and only focuses on the mixture between the ANI and ASI components, the second paper claims that the genetic structure of Indian population is the result of admixture events between four ancestral components. However the two views converge on the idea that admixture was a common phenomenon in India that ceased rapidly with the establishment of the caste systems that enforced endogamy.

Common origin of two subgroups of Ari people

The 3rd paper investigates the history of human populations at a smaller scale, focusing on a single ethnic group, the Ari people of Ethiopia. The Ari are composed of two socially and genetically distinct subgroups : the cultivators (Aric) and the blacksmiths (Arib). Anthropologists have proposed two alternatives hypothesis to explain the division of the Ari : under the remnant hypothesis (RN), the blacksmiths are the remnants of an indigenous group that was assimilated by the more recently arrived cultivators, whereas the marginalization (MA) hypothesis proposes that the two groups share a common ancestry but the blacksmith were recently marginalized due to their activity. While anthropologists traditionally favour the MA hypothesis, recent genetic studies have provided support for the RN hypothesis. In this article the authors use a new methodology on the same genetic dataset to bring evidence for the MA hypothesis. They show that when ADMIXTURE, fineSTRUCTURE or CHROMOPAINTER analysis are run on a complete dataset of 237 samples of 12 Ethiopian and neighbouring populations, the Arib are grouped into a single homogeneous cluster. But when the patterns of haplotype sharing are inferred by composing the Ari as a genetic mixture of all other groups, except themselves, the genetic differences between Arib and Aric disappear. In fact, their analyses reveal that the two Ari groups have the same mixture events with non Ari populations (figure 3).

Fig. 3 : Top : Inferred ancestry composition of recipient groups when forming each group as mixtures of (a) all sampled groups, (b) all sampled groups except the Ari. Bottom : TVD XY values comparing the painting profiles for all pairwise comparisons of groups X, Y under each analysis, with scale at far right. Ari groups (ARIb/ARIc) are highlighted with black outlines in each plot.

To explain this pattern they propose that the genetic differentiation of the blacksmith is due to a bottleneck effect. Their hypothesis is supported by the fact that identity-by-descent (IBD) is stronger in blacksmiths than cultivators which is consistent with reduced genetic diversity in the blacksmiths. Using the D-statistic, they also show that the Arib and Aric are more closely related to each other than they are to any other Ethiopian group. Therefore they conclude that the observed genetic differentiation between the Arib and Aric does not represent separate ancestry but is rather the result of strong genetic drift due to a bottleneck effect induced by the social marginalization of the blacksmiths.

Methodological discussion

What stands out from reading these three articles is that selection of a proper methodology is crucial within an hypothesis testing framework. While the two articles on Indian populations use the same initial dataset, the way they filter and analyse it results in very different conclusions. The inclusion or exclusion of some populations from an admixture analysis or outgroup selection for an f4 ratio estimation directly impact the output of these analysis and can lead the authors to tell very different stories. Before disclaiming or putting forward one hypothesis, it is important to be aware of the limitations of the method that is used to produce the results. For example the authors of the second paper on India’s ancestral populations, claim to demonstrate a more complex history than shown in the first paper but their result is solely based on a clustering analyse (implemented in various softwares such as STRUCTURE or ADMIXTURE).

The basic principle of those STRUCTURE/ADMIXTURE like programs is to take the K most different groups of the dataset, consider them as the pure ancestral groups and force the others to be a combination of those. This means that the results depend on the populations and the number of clusters K that are input in the program. There are different methods to determine which K provide the best fit to the data (cross-validation error, delta K …) but in numerous cases the inferred mixture proportions are wrong. Only in very simple cases, like the African American genetic history (well explained in Daniel Falush’s blog) that involves three clearly defined and very differentiated ancestral populations (West Africans, Europeans and Native Americans) we can be confident in the results of the clustering analyse.


Fig. 4 : Admixture plot of African American population (ASW) with his three ancestral populations, West Africans(YRI), Europeans (CEU) and Native Americans (MEX). Source : Daniel Falush’s blog

But in many cases the history is more complex and no current population actually corresponds to a pure ancestral population because of multiple waves of admixtures. In this case the most differentiated groups correspond only to the most extreme groups but it does not mean that these groups are pure or ancestral. This is well explained in Razib Khan’s blog using the simple example of Uygurs and Europeans : it is known that the Uygurs are a recently mixed group (between European and Asian) but if K is fixed to 2 with Uygurs and Europeans, STRUCTURE will form two different clusters at 100% levels, one with the Uygurs and one with Europeans. This is why, in the 2nd paper, the apparently pure AAA, ATB, ASI and ANI populations and all the clustering implications are probably meaningless. In fact, when using the f4 ratio (as in the first paper) all groups are found to be admixed to a certain extent (with the smallest rate of admixture being 17%).

This critic of clustering analysis is a key element of the study on the Ari people where the authors point out that results from such methods should not be taken for granted but interpreted with caution. Indeed this kind of method cannot discriminate between alternative scenarios of recent mixture of separate populations or shared ancestry followed by population divergence. Therefore support for one of these hypotheses should rely on additional tests. Instead of directly accepting the story suggested by a clustering analysis, a more reasonable work-flow would be to use other methods in order to address the specific implications of one hypothesis. This is exactly what is done in the third article where, as we previously explained, the authors constrain the analysis of mixture by forbidding self ancestry in the two groups of interest which remove the confounding effect of recent bottleneck. In such complex cases, associating PCA and STRUCTURE-like analyses with F-statistics and simulations allow to draw a more robust conclusion. Indeed statistics such as Fst or Dxy that estimate the genetic differentiation between two populations can be simulated under alternative scenarios, representing competing hypothesis (figure 5). These simulated statistics can be subsequently compared with the ones estimated from real data to favour one hypothesis over the other. Simulations can also give an idea of how difficult it is to discriminate between the different hypothesis, which avoid over interpretation of the results. In the second paper, where the authors put forward an new hypothesis, radically different from the classical hypothesis of anthropology and other genetic studies, additional tests like these seem necessary to strengthen their conclusions.

Fig. 5 : Differences in inferred ancestry under analyses A and B using F XY from real data on the top and from simulated data on the bottom (under MA and RN hypotesis). Here the MA hypothesis is obviously the closest to the reality.

Although it was not mentioned in any of the articles, the quality of the data and the way to obtain them, i.e. the kind of sequencing methodology, should also be a matter of precaution. Indeed, they all use micro arrays designed from European populations. These micro arrays consist of thousands of DNA spots containing a predefined sequence, known to be polymorphic in Europeans and only the complementary sequence can fix to this spot and be sequenced. So using these micro arrays to study the history of non european populations may be problematic as only SNPs that are variable for europeans will be targeted, probably leading to the exclusion of meaningful information for non European populations. Today, with New Generation Sequencing (NGS) there are many alternatives, such as RAD sequencing or Whole Genome Sequencing, that allow to sequence tens of thousands non-predefined SNPs.


To conclude, the take home messages from these three articles are :

– Social systems leading to endogamy can influence and modify rapidly and dramatically the genetic structure and patterns of humans populations.

– It is difficult to reconstruct the ancestry of human populations, especially when they involve a complex process with multiple waves of admixture.

– Clustering methods are designed to find a structure in a genetic dataset but they do not necessarily reflect real shared ancestry. Further test using other methods are required to robustly support one hypothesis.

Posted in genomics, human, PNAS | Leave a comment

Papers to discuss Spring 2016

This Spring, we have decided to have a few series of papers on related topics, to discuss one per week:

Series 1 (see also discussion on Razib Khan’s blog):

  1. Moorjani et al 2013 Genetic Evidence for Recent Population Mixture in India AJHG 93,: 422–438
  2. Basu et al 2016 Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure PNAS online before print
  3. van Dorp et al 2016 Evidence for a Common Origin of Blacksmiths and Cultivators in the Ethiopian Ari within the Last 4500 Years: Lessons for Clustering-Based Inference PLOS Genetics 11(8): e1005397

Series 2 (see also perspective in Science Magazine):

  1. Küpper et al 2016 A supergene determines highly divergent male reproductive morphs in the ruff Nature Genetics 48: 79–83
  2. Lamichhaney et al 2016 Structural genomic changes underlie alternative reproductive strategies in the ruff (Philomachus pugnax) Nature Genetics 48: 84–88

The following form less of a series (although two are sex-related):

  1. Pipoly et al 2015 The genetic sex-determination system predicts adult sex ratios in tetrapods Nature 527: 91–94
  2. Barson et al 2015 Sex-dependent dominance at a single locus maintains variation in age at maturity in salmon Nature 528: 405–408
  3. Sulem et al 2015 Identification of a large set of rare complete human knockouts Nature Genetics 47: 448–452


Posted in paper list | Leave a comment

Evolution of Darwin’s finches and their beaks revealed by genome sequencing

The recent formation and habitat diversity of the Galápagos archipelago, in conjunction with its relative isolation from the mainland, has helped the islands become rich in endemic species that have much to offer for the study of evolutionary biology.

As a result of their volcanic origin and fluctuating climates, the islands of the Galápagos archipelago vary in age, size, topography and vegetation. In conjunction with their isolation from the mainland, this diversity of relatively new environments, both within and between islands, are perfect breeding grounds for speciation. The finches of the Galápagos archipelago and Cocos Island are the product of a fascinating adaptive radiation that started only about 1.5 million years ago, following the arrival of a common ancestor from South America. These finches are most notable for their diversity in beak morphology, which reflect the differences in their respective adaptations to exploiting various food resources. Charles Darwin’s observations of this diversity in beak morphology played an important role in the development of his theory of natural selection.

“Seeing this gradation and diversity of structure in one small, intimately related group of birds, one might really fancy that from an original paucity of birds in this archipelago, one species had been taken and modified for different ends” – Charles Darwin

There has since been great interest in the study of Darwin’s finches, and much research has been done towards the efforts of resolving their phylogenetic history and elucidating the mechanisms that drive their variation. In the paper reviewed here, the authors took the extraordinary step of sequencing the whole genomes of 120 individuals, representing all 15 of the Darwin’s finch species across the Galápagos and Cocos Islands as well as two close relatives (Tiaris bicolor and Loxigilla noctis) from Barbados. Analyzing this rich data set, they find important deviations from previous taxonomies and identify several genomic regions associated with beak shape.


Figure 1: (a) Sample locations and (b) phylogeny based on all autosomal sites

Figure 1: Sample locations and phylogeny based on all autosomal sequences

Species tree from F based largely on mitochondrial DNA

Species tree from Farrington et al. (2014)

After sequencing and assembly, they generated four phylogenies according to (i) autosomal DNA (see Figure 1 above), (ii) mitochondrial DNA and (iii, iv) sequences linked to sex chromosomes Z and W. Their phylogenies largely supported previous taxonomies (compare Figure 1 with the tree to the right, which was generated using 14 nuclear introns and two short sequences of mitochondrial DNA (Farrington et al., 2014). However, this new genome-based phylogeny also showed some important differences. For one, as we can see in figure 1 above, the species classified as G. difficilis actually forms three distinct groups, which cluster geographically by the islands of (1) Pinta, Santiago and Fernandina, (2) Wolf and Darwin and (3) Genovese. Apparently this is consistent with taxonomies proposed in two studies that appeared in 1931 and 1945, but it is unclear to me why they remained classified as a single species until now.

Similarly, they found that G. conirostris is likely also paraphyletic, given that G. conirostris on Española was most similar to G. magnirostris, and G. conirostris on Genovesa was most similar to G. scandens. Following these findings, the authors of this study recommend that the taxonomy for G. difficilis and G. conirostris be revised to reflect their paraphyly, according to the new genome-based phylogeny.

Gene flow

Evidence for introgression was found by comparing the autosomal phylogenetic tree with those of the sex-linked loci and mtDNA, and through ABBA-BABA tests. They found that there has been extensive gene flow and hybridization between the species throughout the radiation, which likely contributed to their rapid evolution.

While this is certainly a very interesting find, it is perhaps unsurprising, given the proximity of the islands and the relative ease for individuals to fly from one to the other. This does offer a nice comparison to the adaptive radiations of cichlid fishes, which occur in geographically isolated lakes without the opportunity for gene flow between them (see blog post).

Genetic basis of beak shape

Network tree of Darwin's finches showing diversity of beak shape

Network tree of Darwin’s finches with images showing diversity of beak shape

Now that they had this large genomic dataset, they wanted to address the question of how molecular differentiation contributes to beak morphology. They did this by choosing four closely related populations that differed in beak shape (two blunt and two pointed), and then scanned the whole genomes to identify regions with high genetic differentiation (Z-transformed FST, ZFST) between the two phenotypes. In figure 3a below, we can see that they have marked 15 regions with the highest ZFST values, along with the genes identified within them.Screen Shot 2015-06-01 at 11.14.08

Of those genes, they found 6 that were previously reported to be involved with craniofacial/beak development in mammals and birds. Interestingly, they did not find high genetic differentiation in bone morphogenetic protein 4 (BMP4), a gene that was previously reported to show differential expression between beak types. This may be due to differential expression, and it’s a pity this study does not include any RNA-seq work to complement their huge genomic dataset, but perhaps they’re saving that for another Nature.

Haplotype tree of the ALX1 region

Haplotype tree of the ALX1 region

The highest ZFST peak contained the gene ALX1, which is involved in craniofacial development, and whose loss in humans can even cause severe facial clefting. The authors found that two variants of this ALX1 gene are present, and each remarkably corresponds to one of two categories of beak shape: blunt and pointed. A phylogenetic tree constructed from this region (figure 3c left) shows that the blunt shape was an early adaptation that seems to have been quite favorable; the short branch lengths among the blunt haplotypes (red) are indicative of a selective sweep, which is further supported by the low nucleotide diversity shown in figure 3b below (although it looks as though G. difficilis from Wolf may also be showing low nucleotide diversity in part of this region, possibly from introgression with G. magnirostris?).

Nucleotide diversity in the ALX1 region

Nucleotide diversity in the ALX1 region

G. fortis populations show substantial diversity in beak shape, and so the authors then genotyped an additional 62 birds from this species and found a textbook association between beak shape and genotype (figure 3e below; BB is blunt haplotype homozygote, PP is pointed haplotype homozygote, and BP is heterozygote). While beak morphology certainly involves multiple genes, as evidenced by the 15 significant genomic regions, their work shows that ALX1 alone is one of the most important, if not the most important contributor.

Linear regression analysis of beak shape score by genotype

Linear regression analysis of beak shape score by genotype


The authors put in a tremendous effort to sequence the genomes of all of these individuals, representing each of the 15 Darwin’s finches, and once these are made accessible to the public they will no doubt be a valuable resource for any future studies involving those species, and indeed for anyone interested in the field of adaptive radiation. However, given such a large dataset, it would have been nice to see some additional work done, e.g. an assessment for possible differential gene expression of the genes within the 15 observed ZFST peaks, or further analyses of some of the other genes found in the ZFST peaks. I wonder also whether they might be able to apply an approach used in Zhan et al. (2014), which was used to identify regions of the genome associated withmigratory behavior in Monarch butterflies. This, or a similar approach, might yield more information than only scanning for high FST.

Update 21/10/2015: a previous version of this post stated incorrectly that the species tree from Farrington et al. (2014) was based largely on mitochondrial DNA.

Lamichhaney, Sangeet, Jonas Berglund, Markus Sällman Almén, Khurram Maqbool, Manfred Grabherr, Alvaro Martinez-Barrio, Marta Promerová, et al. “Evolution of Darwin’s Finches and Their Beaks Revealed by Genome Sequencing.” Nature 518 (2015): 371–375.

Posted in Uncategorized | Leave a comment

Convergent evolution of the genomes of marine mammals


Convergent evolution is the independent evolution of similar features in species of different lineages. Marine mammals from different mammalian orders share several phenotypic traits adapted to the aquatic environment is a very classic example of convergent evolution. Although there are potentially several genomic routes to reach the same phenotypic outcome, it has been suggested that the genomic changes underlying convergent evolution may to some extent be reproducible and that convergent phenotypic traits may commonly arise from the same genetic changes. To investigate convergent evolution at the genomic level, the authors present high-coverage whole-genome sequences for four marine mammal species: the walrus (Odobenus rosmarus), the bottlenose dolphin (Tursiops truncatus), the killer whale (Orcinus orca) and the West Indian manatee (Trichechus manatus latirostris)(figure 1). Here are some interesting results of this paper.

Fig 1: Phylogeny of 20 eutherian mammalian genome sequences, rooted with a marsupial outgroup.

Fig 1: Phylogeny of 20 eutherian mammalian genome sequences, rooted with a marsupial outgroup.

Detecting positively selected protein-coding genes

In order to study the molecular mechanism of convergence evolution, firstly, they focused on detecting positive selected protein-coding genes in all three orders; Branch-site likelihood ratio test is a powerful polygenetic method to detect relatively ancient selection. This test is useful for identifying positive selection along prespecified lineages that affects only a few sites in the protein. Applying branch-site likelihood ratio method, they totally tested a series of four different branches. One on the combined marine mammal branches and one on each of the individual branches leading to manatee, walrus and the order containing dolphin and killer whale (see the branches colored red in Fig. 1). They identified 191 genes under positive selection across the combined marine mammal branches, 5 after conservatively correcting for multiple testing.

Identifying Convergent amino acid substitutions in positively selected genes

Secondly, they focused on identifying convergent amino acid substitutions encoded within positive selected genes found in the first part. They found such parallel nonsynonymous changes in coding genes mapping to the same amino acid site in more than one marine mammal lineage were widespread across the genome. In a word, they identified 44 parallel nonsynonymous amino acid substitutions occurred along all 3 marine mammal lineages. To specifically, they found 15 of the 44 identical nonsynonymous amino acid substitutionsin all 3 marine mammal lineages encoded within genes evolving under positive selection in at least one lineage; 8 of these genes were inferred to have evolved under positive selection in the test including all 3 marine mammal lineages (Fig. 2 and Table 1).

Table 1 Positively selected genes that encode parallel substitution in all three marine mammal lineages

Table 1 Positively selected genes that encode parallel substitution in all three marine mammal lineages

Figure 2 Genome scans for convergence. Marine mammal genomes showed a large number of parallel substitutions (blue) that occurred along the branches of at least two marine mammal lineages since they evolved from a terrestrial ancestor. Parallel substitutions that occurred in positively selected genes are shaded red.

Figure 2 Genome scans for convergence. Marine mammal genomes showed a large number of parallel substitutions (blue) that occurred along the branches of at least two marine mammal lineages since they evolved from a terrestrial ancestor. Parallel substitutions that occurred in positively selected genes are shaded red.

Is phenotype associated with genotype identified in this study? Indeed, they found several of the 15 genes under positive selection have known functional associations that suggest a role in the convergent phenotypic evolution of the marine mammal lineages. For example, S100A9 and MGP encode calcium-binding proteins that have a role in bone formation, SMPX has a role in hearing and inner ear formation16, C7orf62 has known links to hyperthyroidism17, MYH7B has a role in the formation of cardiac muscle18 and SERPINC1 regulates blood coagulation19. These genes could therefore be linked to convergent phenotypic traits such as changes in bone density (S100A9 and MGP), which is high in shallow-diving species such as the manatee and walrus to overcome neutral buoyancy but low in deep-diving cetacean species that collapse their lungs to overcome neutral buoyancy.

For me, the most interesting result they found is an unexpectedly high level of convergence along the combined branches of the terrestrial sister taxa (cow, dog and elephant) to the marine mammals, for which there is no obvious phenotypic convergence. This finding suggests that the options for both adaptive and neutral substitutions in many genes may be limited, possibly because substitutions at alternative sites have pleiotropic and deleterious effects.


This paper nicely showed that convergent amino acid substitutions were widespread throughout the genome and that a subset of these substitutions were in genes evolving under positive selection and putatively associated with a marine phenotype. However, the authors also found higher levels of convergent amino acid substitutions in a control set of terrestrial sister taxa to the marine mammals. These results suggest that, whereas convergent molecular evolution is relatively common, adaptive molecular convergence linked to phenotypic convergence is comparatively rare.

Foote, A., Liu, Y., Thomas, G., Vinař, T., Alföldi, J., Deng, J., Dugan, S., van Elk, C., Hunter, M., Joshi, V., Khan, Z., Kovar, C., Lee, S., Lindblad-Toh, K., Mancia, A., Nielsen, R., Qin, X., Qu, J., Raney, B., Vijay, N., Wolf, J., Hahn, M., Muzny, D., Worley, K., Gilbert, M., & Gibbs, R. (2015). Convergent evolution of the genomes of marine mammals Nature Genetics, 47 (3), 272-275 DOI: 10.1038/ng.3198

Posted in Uncategorized | Leave a comment

Evolution of Darwin’s finches and their beaks revealed by genome sequencing


Darwin’s finches from Galapagos and Cocos Island are classic example of young adaptive radiation, entirely intact because none of the species having become extinct as a result of human activity. They have diversified in beak sizes and shapes, feeding habits and diets in adapting to different food resources. Although traditional taxonomy of Darwin’s is based on morphology and has been largely supported by observations of breeding birds finches, in this paper, authors showed the results of whole-genome re-sequencing of 120 individuals representing all of the Darwin’s finch species inhabiting Galapagos archipelago (Fig. 1a) and two close relatives, trying to analyse patterns of intra-and interspecific genome diversity and phylogenetic relationships among the species.

Figure 1a. Sample location of Darwin’s finches

blog post 2

Summary and comments of the paper

The authors analyzed location and phylogeny of Darwin’s finches and found widespread evidence of interspecific gene flow that may have enhanced evolutionary diversification throughout phylogeny. They also reported discovery of a locus with the major effect on beak shape. They generated 10x sequence coverage per individual bird and using 2×100 base-pair (bp) paired-end reads and found evidence of introgression from three sources: ABBA-BABA tests, discrepancies between phylogenetic trees based on autosomal and sex linked loc, and mtDNA. Extensive sharing of genetic variation among populations was evident, particularly among ground and tree finches, with almost no fixed differences between species in each group. Their maximum-likelihood phylogenetic tree based on autosomal genome sequences is generally consistent with current taxonomy showing several interesting deviations (Fig. 1b).

Figure 1b. Phylogeny of Darwin’s finches

blogpost 1

Revised and dated phylogeny of Darwin’s finches shows that the adaptive radiation took place in the past million years, with a rapid accumulation of species recently. Genomic characterization of the entire radiation revealed a striking connection between past and present evolution. Evidence of introgressive hybridization is found throughout the radiation, showing that hybridization always gives rise to species of mixed ancestry, which is explained in detail (species and location) in this paper. The most obvious morphological difference among Darwin’s finches concerns beak shape. The authors performed a genome wide scan on the basis of populations that are closely related but show different beak morphology. In this study, they indicated a polygenic basis for beak diversity, discovering 15 regions with strong genetic differentiation between groups of finches with blunt or pointed beaks. Their analysis revealed that ALX homeobox 1 is an excellent candidate for variation in beak morphology, because it encodes a paired-type homeodomain protein (transcription factor), that plays a crucial role in development of structures derived from craniofacial mesenchyme, the first branchial arch and the limb bud, and have influence on migration of cranial neural crest cells, highly relevant to beak development. They observed single nucleotide polymorphisms (SNPs) in ALX1 gene of various finch species and concluded that blunt haplotype has a long evolutionary history because it’s origin predates the radiation of vegetarian, tree and ground finches. The haplotype might have evolved by accumulating both coding and regulatory changes affecting ALX1 function. Natural selection and introgression affecting this locus have contributed to the diversification of beak shapes among Darwin’s finches and hence to their expanded utilization of food resources on different Galapagos islands.

Lamichhaney, S., Berglund, J., Almén, M., Maqbool, K., Grabherr, M., Martinez-Barrio, A., Promerová, M., Rubin, C., Wang, C., Zamani, N., Grant, B., Grant, P., Webster, M., & Andersson, L. (2015). Evolution of Darwin’s finches and their beaks revealed by genome sequencing Nature, 518 (7539), 371-375 DOI: 10.1038/nature14181

Posted in adaptation, conservation, evolution, genomics | Leave a comment

The genomic landscape underlying phenotypic integrity in the face of gene flow in crows

In this paper authors returned to the question about the role of interspecific gene flow for the evolution and species diversification. Authors studied hybrid zone between two bird classes of the all-black carrion crows (Corvus corone) and the gray-coated hooded crows (C. cornix). Their morphological hybrid zone in Europe gives the possibility to study the effects of introgression on evolution during early species divergence. Authors identified genome-wide introgression and showed the divergence in the expression levels of genes, implicated in plumage coloration in both species, and genes, involved in visual perception, that could be important for maintaining phenotypic differences and responsible for heterogeneity in introgression landscapes.

Principal results

Firstly, authors assembled a high-quality reference genome of one hooded crow male which was aligned to chicken and zebra finch genomes and, then, annotated through mRNA sequencing. Consequently, a set of 20.794 protein coding genes containing open reading frames of more than 100 amino acids was found. RNA seq data was used to validate identified in silico genes. Then, authors resequenced 60 genomes of unrelated birds from four populations of carrion and of hooded crows and found 8.44 million single-nucleotide polymorphisms (SNPs) segregated across all investigated populations. Interestingly, carrion and hooded crows shared just 5.27 million SNPs among all found. Authors also discovered substantial genome-wide gene flow across the hybrid zone. They observed that the major axes of genetic variation corresponded to hypothesized direction of special expansion out of Spain. Moreover, German carrion crows grouped more closely to both Swedish and Polish hooded populations than Spanish carrion crows (Figure 1). By using multiple tests, such as ABBA-BABA test, admixture analysis and coalescence-based parameter estimate of isolation-with migration model, authors proved extensive gene flow between hooded crows and the German carrion crows populations.
Further, mRNA sequencing analysis was performed on 19 individuals and five tissues to check gene expression divergence between species across the hybrid zone. However, authors observed low proportion (0.03% – 0.41%) of differently expressed genes across tissues in carrion and hooded crows. Most of differently expressed genes were responsible for plumage coloration and all found overexpressed genes were implicated in the melanogenesis pigmentation pathway (Figure 3). Nineteen of these 20 genes involved in melanogenesis were found underexpressed in the gray hooded crows. All differently expressed genes were related to growing feather follicles from the bird’s torso. Authors confirmed that gene expression bias was related to a broad spectrum of down-regulated genes implicated in melanogenesis pathway rather than to defect in melanin deposition due to various melanocytes density (Figure 4).
Then, authors investigated the landscape of genomic divergence through a 50-kb window-based approach which uses clustering algorithm reconstructing local genomic phylogenies without any a priori input hypothesis. They showed that only 0.28% of genome was divergent between carrion and hooded crows. Also, one 1.95-Mb genomic region located on chromosome 18 and exhibiting strong genetic differentiation between two species was found. This region had 81 of all 82 fixed sites between carrion and hooded crows and possessed 40 annotated protein coding genes. Moreover, it was characterized by marked reduced nucleotide diversity and differentiation in all populations and increased linkage disequilibrium (LD). Authors do not deny the possibility of inversion in this region. On Figure 2 authors demonstrated one region with recent, positive selection in hooded crows. This region had a lot of fixed hooded crow-specific derived variants and reduced values of Fu and Li’s D statistic (P < 0.05). Moreover, the region contained members of the voltage-gated calcium channel subunit gene (CACNG) family encoding for the transmembrane regulators of AMPA receptors. These proteins modulate activity of the microphtalmia-associated transcription factor gene MITF, a principal regulator of the melanogenesis (Figure 3C). Authors found 11 melanogenesis genes which were regulated by MITF and underexpressed in gray hooded crow feather follicles. Thus, the authors connect gene expression, color phenotypic differences and the signature of local divergent selection and postulate that a number of genes cause color divergence in crows. Further gene expression analysis revealed that regulator of G protein signaling 9 (RGS9), normally highly expressed in eye, together with members of SLC24 gene family, responsible for pigmentation, showed decreased expression levels in hooded crows.
To conclude, this paper underlines the significance of inversion for evolutionary process and role of sexual selection for phenotypic and genotypic differentiation.

Personal comment

This paper presents a great and complete work which deepens our understanding of the role of interspecific gene flow for the evolution and species diversification. However, on figure 1 authors showed the map of the European distribution of the carrion and hooded crows that does not linked to principal components PC1 and PC2. On my opinion, it should be better perform an analysis that links PCA to geographical coordinates, as, for example, Procrustes analysis (form of statistical shape analysis used to analyze the distribution of a set of shapes).

Poelstra, J., Vijay, N., Bossu, C., Lantz, H., Ryll, B., Muller, I., Baglione, V., Unneberg, P., Wikelski, M., Grabherr, M., & Wolf, J. (2014). The genomic landscape underlying phenotypic integrity in the face of gene flow in crows Science, 344 (6190), 1410-1414 DOI: 10.1126/science.1253226

Posted in Uncategorized | Leave a comment

The genomic substrate for adaptive radiation in African cichlid fish

In African lakes, cichlid fishes are famous for large, diverse and replicated adaptive radiations. Nearly 1,500 new species of cichlid fish evolved in a few million years when environmentally determined opportunity for sexual selection and ecological niche expansion was met by an evolutionary lineage with unusual potential to adapt, speciate and diversify. The phenotypic diversity encompasses variation in behaviour, body shape, coloration and ecological specialization. The frequent occurrence of convergent evolution of similar ecotypes suggests a primary role of natural selection in shaping cichlid phenotypic diversity.

To identify the ecological and molecular basis of divergent evolution in the cichlid system, David et al. [1] sequenced the genomes and transcriptomes of five lineages of African cichlids, Pundamilia nyererei (endemic of Lake Victoria); Neolamprologus brichardi (endemic of Lake Tanganyika); Metriaclima zebra (endemic of Lake Malawi); Oreochromis niloticus (from rivers across northern Africa); Astatotilapia burtoni (from rivers connected to Lake Tanganyika). These five lineages diverged primarily through geographical isolation, and three of them subsequently underwent adaptive radiations in the three largest lakes of Africa. Authors comprehensively investigate the features from these massive genomic data. Here is some interesting finding:

Accelerated gene evolution was assessed by non-synonymous/synonymous ratio. Compare with stickleback fish, O. niloticus has significant higher ranks. And three gene, a ligand (bmp4), a receptor (bmpr1b) and an antagonist (nog2) in the BMP pathway, all known to influence cichlid jaw morphology, show accelerated rates of protein evolution in haplo-chromine cichlids.

East African cichlids, including O. niloticus, possess an unexpectedly large number of gene duplicates. The author found 280 duplication events in the lineage leading to the common ancestor of the radiations. And that was 4.5- to 6-fold increase in gene duplications relative to other clades, normalizing by the branch length. But again, same as high dN/dS analysis, there is no significant enrichment for particular gene pathway.

For the transposable elements insertion in different lineage, the authors claimed that there were three waves of TE insertions. And the TE inserted near the 5’ UTR increased gene expression significantly. Surprisingly, none of the five cichlid genomes showed any deficit of sense-oriented LINE insertions, which correspond to a time of transposable element insertions in the common ancestor of the haplo-tilapiine cichlids. This suggests that ancestral East African cichlids went through an extended period of relaxed purifying selection.

For people who interested in small RNA, the authors also found surprising excess number of novel microRNA emerge in cichlid and with wet lab experiment confirmation, these novel miRNAs were believed to alter gene expression in multiple organs.

Last but not the least, they also did a lot of population genetic analysis in closely related species of the genera Pundamilia, Mbipia and Neochromis, all of which are endemic to Lake Victoria. Because Lake Victoria is where the most recent radiation happened. Several hundred endemic species emerged within the past 15,000–100,000 years. Their results from Fst comparing suggests that (1) variation in coding sequence is most likely to be involved in the divergence of physiological and/or terminally differentiated traits like color; (2) regulatory variation is more important in morphological changes involving genes that have pleiotropic effects in developmental networks.


Sometimes with massive interesting point, it is hard to get the simple answer for the ultimate question, why some species diversify so dramatically, some species did not. Here is the case for cichlid, which they try to address the question of what is the genomic substrate for adaptive radiation. The author’s conclusion is neutral and adaptive processes both make important contributions to the genetic basis of cichlid radiations.


  1. Brawand D, Wagner CE, Li YI, Malinsky M, Keller I, Fan S, Simakov O, Ng AY, Lim ZW, Bezault E, Turner-Maier, J. Johnson J, Alcazar R, Noh HJ, Russell P, Aken B, Alföldi J, Amemiya C, Azzouzi N, Baroiller J-F, Barloy-Hubler F, Berlin A, Bloomquist R, Carleton KL, Conte MA, D’Cotta H, Eshel O, Gaffney L, Galibert F, Gante HF, et al.: The genomic substrate for adaptive radiation in African cichlid fish. Nature 2014.
Posted in adaptation, evolution | Leave a comment