Coev

Coev is a software that predicts coevolving positions and their evolutionary profile based on the aligned sequences and a phylogenetic tree.

Overview:

The software is the implementation of the model described in Dib et al. 2013, where we present a new substitution model that describes the coevolving process along the tree and reconstruct the ancestral states of co-evolving pairs of positions. We propose a new probabilistic Markov model that not only identifies coevolving positions but also estimates the associated coevolving profile. The necessity to estimate profiles emerges when considering the nucleotide alphabet that induces 192 different coevolving profiles and amino acid alphabet that induces a bigger number of profiles. The Coev Markov model is based on 16 states instantaneous matrix where each state represented the a combination. The instantaneous rate matrix Q contains 4 continuous parameters and a discrete parameter. The implementation of the model is written in C (Dennis Ritchie, 1969) with a user-friendly interface. Given an alignment file, the phylogenetic tree and a pair of positions the user will be able to assess the score of coevolution and estimate the profile with either the maximum likelihood or Bayesian framework. The user can also change within this interface the default values of the priors.

Download:

To download, please click on the following link. A manual can be found here, which also explain how to use the Coev-web platform that will be released soon.

System requirements: The software Coev uses several external tools that should be installed. We optimized the implementation using Lapack (Linear Algebra PACKage) and nlopt (library for nonlinear optimization) downloadable at http://ab-initio.mit.edu/wiki/index.php/NLopt and tested it on Linux and mac operating systems. Additionally, for the GUI interface we used Tcl/Tk and wish softwares.

Instructions for running Coev with a GUI:

Unzip Coev.zip in a directory of your choice.

Once all external software installed, to compile run:make

Then to execute run: wish Coev.tcl

Instructions for running Coev with a command line:

Unzip Coev.zip in a directory of your choice. Once all external software installed, to compile run:make

Then to execute run:

./coev [OPTIONS]\r\n  -method s:  Use method s (either ''bayes'' or ''ml''). Default is ''ml: maximum likelihood''.
 -IT n:         Run for n iterations. Default is n=10000.
-sfreq n:     Write every n''th iteration to file. Default is n=1000.
 -pfreq n:     Print every n''th iteration on the screen. Default is n=1000.
-burnin n:    Number of burn-in iterations. Default is n=0.
-s v:          ??. Default is v=1.
 -d v:          ??. Default is v=1.
 -w1 v:        ??. Default is v=1.
 -w2 v:        ??. Default is v=1.
 -tree s:       The name of the file containing the input tree in Newick format. Default is s=treeInput.txt.
 -align s:      The name of the file containing the sequence alignment in FASTA format. Default is s=alignment.txt.
 -out s:        The name of the log file to write the results to. Default is s=output.log.
 -cols n1 n2:  The columns in the alignment to use for the analysis. Default is n1=1 and n2=2.
 -h:             Print this help screen and exit.

Reading output:

To read the output, the user can either look at the log file or at the output console. In the log file, each line corresponds to a profile.

In the case of a Maximum Likelihood framework:

Each line can be truncated in 9 values:

  • Value 1 corresponds to the profile set: it is a binary vector of length 16 where 1 corresponds to a combination that belong to a profile and 0 when it doesn”t. The binary vector follows the following order AA,AC,AG,AT,CA,CC,CG,CT,GA,GC,GG,GT,TA,TC,TG,TT. For example when the value is 1000010000000000 then it means that the profile is {AA,CC}.
  • Value 2 corresponds to the estimated value of w1 in the Null model: it  is a double, ex:6.928542e-02
  • Value 3 corresponds to the estimated value of w2 in the Null model: it  is a double, ex:2.014202e-01
  • Value 4 corresponds to the estimated value of Log Likelihood in the Null model: it is a doube, ex:-7.405995e+01
  • Value 5 corresponds to the estimated value of s in the model: it is a double, ex:2.735160e-02
  • Value 6 corresponds to the estimated value of d in the Coev model: it is a double, ex:-4.970079e-01
  • Value 7 corresponds to the estimated value of r1 in the Coev model: it is a double, ex:3.103843e-02
  • Value 8 corresponds to the estimated value of r2 in the Coev model: it is a double, ex:2.922180e-01
  • Value 9 corresponds to the estimated value of Log Likelihood in the of the Coev model: it is a double, ex:-6.814893e+01

In the case of a Bayesian framework:

The log file is conceived to be read by tracer. The first line is a header. The other lines contains values where
  • Value 1 corresponds to the iteration identifier: it is an interger
  • Value 2 corresponds to the posterior probabilityr: it is a double
  • Value 3 corresponds to the LogLikelihood value of Coev: it is a double
  • Value 4 corresponds to the prior value:  it is a double
  • Value 5 corresponds to s: it is a double
  • Value 6 corresponds to d: it is a double
  • Value 7 corresponds to r1: it is a double
  • Value 8 corresponds to the r2: it is a double
  • The rest of the values correspond to the  profile vector  where 1 corresponds to a combination that belong to a profile and 0 when it doesn”t. The binary vector follows the following order AA,AC,AG,AT,CA,CC,CG,CT,GA,GC,GG,GT,TA,TC,TG,TT. For example when the value is 1000010000000000 then it means that the profile is {AA,CC}.

Contacts:

For questions, please send an email to wwwphylo [at] unil.ch

Reference:

When using this software, please cite:

L. Dib, D. Silvestro, N.Salamin. 2014. Evolutionary footprint of coevolving positions in genes. Bioinformatics. 30(9):1241-9.