The Amborella Genome and the Evolution of Flowering Plants

ResearchBlogging.org

Amborella trichopoda, an endemic species to New Caledonia, is the most early-diverging taxa of flowering plants (angiosperms, Figure 1). As such, the sequencing of its genome was of considerable interest for the investigation of the emergence and evolution of this highly diverse lineage presenting at least 350’00 species.
In this work, the Amborella genome project (http://www.amborella.org/) reports the draft genome sequence for A. trichopoda. Notably, it was used as reference for the reconstruction of genomic features and architecture of the most recent common ancestor of living angiosperms, the investigation of gene families specific to flowering plants, and the investigation of the Amborella population structure.

Figure 1: Overview of the land plant phylogeny. Major hypothetical polyploidy events are indicated with stars. Additional ones are indicated with ellipses. Events supported by synteny analysis are filled, other events are only supported by phylogenetic analysis of paralogous gene pairs.
Figure 1: Overview of the land plant phylogeny. Major hypothetical polyploidy events are indicated with stars. Additional ones are indicated with ellipses. Events supported by synteny analysis are filled, other events are only supported by phylogenetic analysis of paralogous gene pairs.

Genome structure

The identification of frequent duplicated collinear genes (Figure 2a) within A. trichopoda genome provides evidence of an a ancient whole genome duplication (WGD). WGD is known to be a pervasive feature in the evolution of plants, with modern plants frequently presenting traces of multiple past duplication events. Thus, a comparison with Vitis vinifera (grape) showed that the genome of A. trichopoda is almost entirely covered by three syntenic grape regions (Figure 2b and 2c). This 1:3 relationship between those two genomes indicates that the WGD detected in A. trichopoda occurred in the common ancestor of the two species (an event referred as Epsilon, Figure 1), and confirms that the divergence of A. trichopoda at least 160 Ma ago predates the observed genome triplication of Vitis vinifera (referred as Gamma, Figure 1). In addition, the A. trichopoda genome shows no evidence of more recent duplication event.
A phylogenomic approach was then used to confirm results based on synteny analysis. First, with the reconstruction of the phylogeny of 11,519 gene families supporting the fact that duplicated genes specific to A. trichopoda are unfrequent. Inference of the duplication time significantly supported two divergence times of respectively 244 and 341 mya, corresponding to the previously identified Zeta and Epsilon WGD (Figure1). The fact that the Zeta duplication was not supported by syntenic analysis is probably due to extensive gene loss and rearrangements that occurred since this ancient event. A second phylogenetic analysis of 155 syntenic genes pairs from six manually curated duplicated blocks supports as well that Epsilon WGD predates the divergence of A. trichopoda.

Figure 2: Synteny analysis: a) Syntenic region of scaffolds 24 and 48 of the A. trichopoda draft genome. b) Top: Synteny pattern between Grape and Amborella: each A. trichopoda region match to up to three Grape regions, as a result of the Gamma hexaploidization, and A. trichopoda present signal of the Zeta WGD, with numerous blocks of intragenomic synteny. Bottom: Detailed view of A. trichopoda scaffold 9. Coloured blocks present genes with orientation on the same strand (blue) or reverse strand (green). c) Bottom: Alignments of the seven reconstructed ancestral eudicot chromosomes (blue) and the A. trichopoda scaffolds (green). Top: Alignment of  the reconstructed ancestral eudicot chromosomes with the three copies present in Peach, Cacao and Grape genomes.
Figure 2: Synteny analysis: a) Syntenic region of scaffolds 24 and 48 of the A. trichopoda draft genome. b) Top: Synteny pattern between Grape and Amborella: each A. trichopoda region match to up to three Grape regions, as a result of the Gamma hexaploidization, and A. trichopoda present signal of the Zeta WGD, with numerous blocks of intragenomic synteny. Bottom: Detailed view of A. trichopoda scaffold 9. Coloured blocks present genes with orientation on the same strand (blue) or reverse strand (green). c) Bottom: Alignments of the seven reconstructed ancestral eudicot chromosomes (blue) and the A. trichopoda scaffolds (green). Top: Alignment of the reconstructed ancestral eudicot chromosomes with the three copies present in Peach, Cacao and Grape genomes.

The ancestral gene order of the eudicot ancestor was reconstructed based on three eudicot genomes presenting structurally similar genomes and clear patterns of paralogy among gene copies: grape (V. vinifera), peach (Prunus persica), and cacao (Theobroma cacao). A. trichopoda was used as an outgroup. Seven hypothetical ancestral chromosomes were reconstructed. This reconstruction will help to understand the evolution of eudicot lineages after the Gamma hexaploidy event. Figure 2c presents the alignment of one ancestral chromosome and triplicates blocks of genes in the three rosid genomes.

Ancestral gene family content, origin and history of angiosperm genes

To investigate the ancestral gene content and evolution of genes families in the different lineages of the land plant phylogeny, protein coding genes from 22 sequenced land-plants were clustered into 53,136 orthogroups. Subsequent clustering merged those orthogroups into 6054 super-orthogroups. Changes in genes families are more important for terminal branches, and the branch leading to all angiosperms. Additional analysis including spruce genome, gymnosperm and basal angiosperm transcript assemblies led to the identification of 1179 orthogroups which seems to be specific to angiosperms.
We discussed the fact that this clustering method is based on BLASTp analyses. Amino acid sequences are not necessarily well conserved between distantly related proteins, and cutoffs used for clustering influence the number of inferred unique gene families. Most genes lineages (70%) with function linked to flowering were present in the most recent common ancestor of all seed plants, highlighting the fact that novelty is generally not linked with the emergence of drastically new gene families.
They further detailed gene family expansions linked with flowering, such as the MADS-box transcription factors, seed storage globulins and cell wall and lignin genes.

Transposable elements

One striking feature of the A. trichopoda genome is that average age of identifiable transposable elements (TEs) in A. trichopoda is considerably older than that of other angiosperm genomes. Terminal repeats of LTR retrotransposons presented extensive degree of divergence. Endogenous pararetroviruses (EPRVs) and TEs still account for 57.2% of the nonambiguous nucleotides in the A. trichopoda genome (668 Mb), but only few TEs families presented signal of recent activity, with an estimated insertion date of more than 10Ma for most of the TEs. The lack of recent transposon activity in the A. trichopoda genome may be due to very effective silencing or the loss of active transposases.

Population genomics and conservation

They finally investigated population history and structure of Amborella. Twelve genomes of 12 individuals sampled from nearly all known populations from New Caledionia were resequenced. The Pairwise Sequentially Markovian Coalescent (PSMC) method, which uses Single Nucleotide Polymorphism (SNP) data to infer past changes in effective population size, was used to investigate population history of the 12 Amborella populations (Figure 3). The coalescent time of the 12 genomes was estimated to be 9.0-2Ma. They discussed evidences suggesting population bottlenecks, sub-lineages admixture and reduction in effective population size in recent past. Nevertheless, given the boostrap clouds visible on Figure 2, those results are not strongly supported by the data. The number of identified SNP varied importantly depending the parameters used for the SNP calling: from 1,903,437 SNP with stringent parameters to 5,131,595 SNP with less stringent parameters. As the median read depth was quite low for some samples (6X for several samples), the less stringent analysis was retained to perform the subsequent PSMC and population structure analysis, casting doubt on the value of those results.

Figure 3: PSCM results for the 12 Amborella populations (one color/population, right panel), and the boostrap clouds in green. The vertical bar at about 325,000 years indicates the estimated timing of species wide decline of effective population size.
Figure 3: PSCM results for the 12 Amborella populations (one color/population, right panel), and the boostrap clouds in green. The vertical bar at about 325,000 years indicates the estimated timing of species wide decline of effective population size.

In conclusion, the analysis of the draft genome sequence of this basal Angiosperm gave first hints of the genome architecture and gene content of ancestral flowering plants, and promises to be a valuable tool for the investigation of the evolution of flowering plants.

., Albert, V., Barbazuk, W., dePamphilis, C., Der, J., Leebens-Mack, J., Ma, H., Palmer, J., Rounsley, S., Sankoff, D., Schuster, S., Soltis, D., Soltis, P., Wessler, S., Wing, R., Albert, V., Ammiraju, J., Barbazuk, W., Chamala, S., Chanderbali, A., dePamphilis, C., Der, J., Determann, R., Leebens-Mack, J., Ma, H., Ralph, P., Rounsley, S., Schuster, S., Soltis, D., Soltis, P., Talag, J., Tomsho, L., Walts, B., Wanke, S., Wing, R., Albert, V., Barbazuk, W., Chamala, S., Chanderbali, A., Chang, T., Determann, R., Lan, T., Soltis, D., Soltis, P., Arikit, S., Axtell, M., Ayyampalayam, S., Barbazuk, W., Burnette, J., Chamala, S., De Paoli, E., dePamphilis, C., Der, J., Estill, J., Farrell, N., Harkess, A., Jiao, Y., Leebens-Mack, J., Liu, K., Mei, W., Meyers, B., Shahid, S., Wafula, E., Walts, B., Wessler, S., Zhai, J., Zhang, X., Albert, V., Carretero-Paulet, L., dePamphilis, C., Der, J., Jiao, Y., Leebens-Mack, J., Lyons, E., Sankoff, D., Tang, H., Wafula, E., Zheng, C., Albert, V., Altman, N., Barbazuk, W., Carretero-Paulet, L., dePamphilis, C., Der, J., Estill, J., Jiao, Y., Leebens-Mack, J., Liu, K., Mei, W., Wafula, E., Altman, N., Arikit, S., Axtell, M., Chamala, S., Chanderbali, A., Chen, F., Chen, J., Chiang, V., De Paoli, E., dePamphilis, C., Der, J., Determann, R., Fogliani, B., Guo, C., Harholt, J., Harkess, A., Job, C., Job, D., Kim, S., Kong, H., Leebens-Mack, J., Li, G., Li, L., Liu, J., Ma, H., Meyers, B., Park, J., Qi, X., Rajjou, L., Burtet-Sarramegna, V., Sederoff, R., Shahid, S., Soltis, D., Soltis, P., Sun, Y., Ulvskov, P., Villegente, M., Xue, J., Yeh, T., Yu, X., Zhai, J., Acosta, J., Albert, V., Barbazuk, W., Bruenn, R., Chamala, S., de Kochko, A., dePamphilis, C., Der, J., Herrera-Estrella, L., Ibarra-Laclette, E., Kirst, M., Leebens-Mack, J., Pissis, S., Poncet, V., Schuster, S., Soltis, D., Soltis, P., & Tomsho, L. (2013). The Amborella Genome and the Evolution of Flowering Plants Science, 342 (6165), 1241089-1241089 DOI: 10.1126/science.1241089

Single and independent mutations lead to an adaptive and complex color phenotype in deer mice living on the light-colored soils of the Nebraska Sand Hills

ResearchBlogging.org Pleiotropy of genes is often the main solution to explain genetic basis of complex phenotypes (i.e., those composed of multiple traits). But dissection of those genes or loci are rarely studied, and it remains unclear which of single pleiotropic mutations or multiple mutations with independent effects are responsible to elaborate complex phenotypes.

Linnen et al. are interested in coloration of the deer mice (Peromyscus maniculatus) present on the light-colored soils of the Nebraska Sand Hills. Adaptation for crypsis is the strongest hypothesis to explain prevalence of the light morph compared to the black morph,  and they wanted to dissect the genetic basis of this adaptation. This study is composed of two main parts : first to understand and to evaluate the complexity of coloration phenotype and then to find the mutations responsible of those variation in traits and on which morph selection is acting on. First of all, they implemented an experimentation with plasticine models to count the number of attacks on each coloration morph. As they expected, statistical test reveals that the dark models are significantly more attacked than light models. Closer inspection reveals multiple pigmentation traits and pattern that differ between the light and the dark morph to compose complex coloration phenotype (particularly for dorsal hue and brightness, ventral color, dorsal ventral boundary and tail stripe). In previous study, they found that recent change in dorsal fur to light color is mainly caused by a change at the Agouti locus.

Before looking at point mutation in this locus, they wanted to see if color and color pattern are, or not, dependant. Principal component analysis (PCA) reveals that the phenotypes in this wild population were largely independent suggesting multiples independent genetic control. To test this hypothesis, they used NGS to generate polymorphism data for ~2100 unlinked regions and a smaller region containing Agouti and all known regulatory elements. Single-SNP linear regressions allowed first to find which mutation is associated with the different color traits. Then, using the residues of those regressions, multiple-SNP analysis are done with the other SNPs to look for dependent effect of mutations. We must keep in mind that choice of SNP for the first and the following regressions matters (see explanation in figure 1). Their results are really interesting as they find that most of the color traits are associated with a unique set of SNPs (except for one deletion associated with both ventral color and tail stripe), and that no single set of polymorphisms could account for variation across the five traits. Most interestingly, many of those SNPs fell in or near regions containing regulatory elements suggesting that multiple molecular mechanisms are involved in color adaptation in these Sand Hills mice.

Fig.1 : explanation of single and multiple SNP analyzes
Fig.1 : explanation of single and multiple SNP analyzes

One remark can be made about figure 2 panel C of the article, as it is difficult to see differences between the gray and white circles. Or it is something important as the gray circles represent significant SNPs after correction of false discovery…. It is important to note that no gray circles are found for ventral color trait, and that only one red circle is found (significant SNP after false discovery and bonferroni correction). Moreover, PVE (percent of variation of traits explain by SNPs) is 16%, which the smaller value of all traits. This could mean that this trait variation could be under control of other genes that were not sequenced here.

Figure 2 from Linnen et al., 2013. DOI: 10.1126/science.1233213

Which lead them to answer the two questions:

  • Does single mutation have pleiotropic effect?

The response is mainly no.

  • Do mutations have small and independent effects?

The response is yes, with SNPs falling in coding and non-coding (regulatory elements) regions.

It is useful now to test for positive selection on Agouti and SNPs. To do so, they compare a neutral model to a model with selection (created after calculating a coefficient selection) using simulations and a likelihood ratio test. The neutral model is a demographic model that they previously built using dadi. Also in figure 3 panel A, the simulations are done using all haplotypes, but because sweep (recent mutation) is supposed to be associated with light phenotype only, it is important to restricted the simulation after removing the dark haplotypes. On figure 3, y-axis corresponds to values of likelihood ratio (LR). Bigger the values of LR are (big peaks), less the models fit to each other. Also, when LR value is around 0 there is no departure from neutral model to the model under selection. Finally, as you can see on panel B and C, peaks are much more numerous and bigger when restricted to light alleles. Also, panel C is interesting as it zooms on the most strongly associated polymorphism for each trait, and helps to compare results from dark and light morphs (black line = dark haplotypes, colored line = light haplotypes). Peaks in panel B are found to be significant and clustered around the location of SNPs, which is consistent with recent selection acting on, or near, color-associated SNPs in light haplotypes. Finally, results from a comparison between dark and light-associated alleles are concur with multiple targets of selection among the light, but not dark, alleles of Agouti. Last but not least, strength of selection analysis reveal that selection coefficients (estimated using a maximum likelihood approach) are greater in traits linked to light allele compared to traits of dark allele. For example, for d-v boundary traits, selection coefficient s is 0.42 for light allele, and 0.067 for dark allele. Values of s in dark-associated alleles are really small compared to the light-alleles. Moreover, there is a positive correlation between PVE values (percent of variation of traits explain by SNP) and selection coefficient s across all light associated SNP.

Figure 3 from Linnen et al., 2013. DOI: 10.1126/science.1233213

To conclude, their results are finally consistent with Fisher geometric model of adaptation. Small and independent effects of mutation can lead to a more important pleiotropy of a gene (as here with agouti locus leading to complex coloration in the deer mice of Nebraska). Finally, Linnen et al.  want us to remember that it is individual mutations, not genes that bring population closer to its phenotypic optimum.

From my point of view this Science letter is well written, clear and concise to answer a question that have important impacts in evolutionary biology. A deeper look allows the reader to appreciate the complexity of the issue and the good work done by those researchers.

Linnen, C., Poh, Y., Peterson, B., Barrett, R., Larson, J., Jensen, J., & Hoekstra, H. (2013). Adaptive Evolution of Multiple Traits Through Multiple Mutations at a Single Gene Science, 339 (6125), 1312-1316 DOI: 10.1126/science.1233213

Patterns of population epigenomic diversity

ResearchBlogging.org
In my point of vue, this paper is interesting because it’s in my domain of interest but very difficult to understand because they put lot of technical word without definition and they say very often see references, as it’s described in this paper making this paper very difficult to understand. Also in this paper, the aim is not very clear and also there is no conclusion. I have the feeling that they don’t know what they can conclure. But I will try to explain in few words the paper…

About the introduction.

It is well know that natural epigenetic variation provides a source for the generation of phenotypic diversity but it remains unclear how this epigenetic variation contributes to this diversity and the relationship between genetic variation and epigenetic mechanisms. Epigenetic is defined by heritable modification of genes expression. This modification can be heritate during the meiosis and/or mitosis but does not affect a changing in DNA sequences. Epigenetic modifications are mainly defined by cytosine methylation in a DNA level or histone methylation, histone acetylation.

In plant including Arabidopsis thaliana model, basically there are 3 different patterns of methylation: methylation in a cytosine in a CG context, on a CHG context and in a CHH context (where H can be a A,C or T).

In order to understand the types and extend of natural DNA methylation variants in A.thaliana, epigenomes were determined using methyl-C-sequencing – genomic DNA sequencing – and RNA sequencing. Integration of all these information (genomic and epigenomic data) allowed investigation into variable methylation states of both CG gene body methylation and loci targeted by RdDM (RNA Directed DNA methylation).

Results

Firstly they assessed SMP (single methylation polymorphisms) diversity to understand their frequency among different population (accession). In order to do this, they compare the COL methylome (wild type) and methylome of different Arabidopsis population. They found 23% of CH pattern, 13% of CHG pattern and 64% of CHH pattern. After, they construct an epigenome-based phylogeny tree and compare these SMP with SNP. They can make a correlation between CG-SMP and SNP. So they used this feature but investigate the pattern of SMP diversity in a chromosome-wide or a genome-wide analysis (figure 1).

The major conclusion for this figure 1 is that the methylation state of SMPs in CG and CHG contexts is towards the methylated form at the centromere regions and towards transposons. The CHH is mainly in an unmethylated context. And using RNA- sequencing, it also shown that the transcription is higher in single gene copy where there is a CG methylated context. These features make sense because in the centromere there are lot of repetitive sequence and transposable element and these sequences are not transcribed.

Then, in figure 2, they tried to show the population-wide variation of DMRs. In fact, spontaneous formation of SMPs represents one form of natural epigenetic variation and this variation exists in the form of differentially methylated regions (DMR). They found that CG-DMRs are enriched in gene bodies and C-DMRs (composed of CHH-DMR and CHG-DMR) are enriched in transposable element and mainly unmethylated. The pattern of GC-DMR is the same as CG-SMP meaning that transposable elements are mainly methylated by CHH and CHG pattern leading to a silencing mechanism by RdDM pathway.

Then they look at different tissues: leaves and inflorescence using 2 different methods: when they performed a cluster using methylation levels of CG-DMRs or C-DMRs, this population cluster by their genotype and when the same analysis is performed using RNA-sequencing, this population cluster by their tissue.

This mean that DNA methylation is less dynamic than gene expression patterns and plays only a role during specific developmental stages or cell types.

Now, it’s important to understand the genetic linkage and methylation variants (figure 3). To do this, they use linkage disequilibrium decay. Basically, they wanted to know if DMR – SNP and SMP are linked with local genetic variant or not. So, the more these features are closed the more they are linked and segregate together.

If the value is around 0 these features are independent and if the value is around 1 it’s mainly due to local genetic variant.

We can see than SMP and CG-DMR reached faster 0 meaning that there are independent but SNP and C-DMR reached 50% of their value after 2kb meaning that there are due to local genetic variants.

So, now we know that C-DMRs are due to local genetic variant but which kind of genetic variant? They do an association-mapping methylation variant. They revealed VIM3 and AGO2 as possible causal loci (it’s know that AGO2 is acting in the RdDM pathway).

To sum up this part, the mQTL revealed an association between some genetic variants and DNA methylation variants expecially for C-DMR.

Finally, in order to understand the role of regions of the epigenome that are less prone to natural epigenetic variation, they searched for loci that contained methylated alleles. And they found that most of them are in seeds and in pollen. This conclusion also makes senses because by definition the epigenetic modification can be heritated. So in Arabidopsis, the pollen and seeds are the germ line, which is transmitted during generation. But it’s also found that there are lot of transposable elements is this germ line so the plant and the cell has to silencing these transposable elements. So the plant can degraded the transposable element using this RdDM silencing mechanism pathway. This silencing pathway in pollen and seed allow and ensure proper gametophytic and embryonic development.

To conclude this paper provides evidence that RdDM targeted genes may have co-opted this transposable element silencing mechanism to maintain their silencing state in vegetative tissues and transgenerationally in order to ensure proper expression in pollen, seed and germ line development.

Schmitz, R., Schultz, M., Urich, M., Nery, J., Pelizzola, M., Libiger, O., Alix, A., McCosh, R., Chen, H., Schork, N., & Ecker, J. (2013). Patterns of population epigenomic diversity Nature, 495 (7440), 193-198 DOI: 10.1038/nature11968

References

38. Li, L. et al. Linking photoreceptor excitation to changes in plant architecture. Genes Dev. 26, 785–790 (2012).

39. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memoryefficient alignment of short DNAsequences to the humangenome. Genome Biol.

10, R25 (2009).

40. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).

41. Storey, J. D. A direct approach to false discovery rates. J. R. Stat. Soc. A 64,479–498 (2002).

42. Breitling, R. et al. Genetical