TreeOfTrees

TreeofTrees is a software which consists in two programs in C, GeneTrees.c and ConsPond.c. They can be adapted to different systems (Unix, Xindows, MacOS9), before compiling, declaring which system is used at the end of the two programs in the Repertoire function.

All the input files have to be in a folder at the same level than the programs. The output, receiving fixed names, are also in this folder.

GeneTrees.c
This program asks for the "name" of the folder containing the data. This name is the first parameter. There must be in this folder a "List.txt" file containing the names of the n gene tree files, and also the gene tree files on the same taxa X in the Newick format. The X labels must be identical ; they are case sensible and the first tree of the first file gives the reference names. They are limited to 99 characters.

Each file must contain the same number of trees, NbArb. For each series k from 1 to NbArb, the n trees of rank k are considered to establish the n distances between taxa. The distance formula can be chosen according to the second parameter. It can be :
(0) the unitary distance : one unit for each edge whatever is its length, including <= 0
(1) the positive unitary distance : one unit for each edge longer than eps=0.0001
(2) the path length distance

Then a gene distance between these n distance vectors is computed. According to the third parameter. It can be :
(1) the Manhattan distance,
(2) the Euclidian distance,
(3) the Estabrook distance : the rate of quadruples having two different resolved topologies
(4) the order distance between two preordonnances (preorder of the taxa distance values)
Note that for distance (3) and (4) the quadruples of taxa are examined and the computation in in O(m^4), instead of O(m^2) for the two first ones (see article).

Using a NJ procedure, a phylogenetic tree on genes is computed and memorized in a file "GeneTrees.tre". At the end of the NbArb series, it contains NbArb trees on genes. For each gene distance, two robustness coefficients, the arboricity and the well designed quadruple rate, are computed. They are memorized in a "Quality.txt" file that can be used as weights by the ConsPond program.

The three parameters, the Folder name, the Taxa distance and the Gene distance, can be given as arguments in a command line. Unless, they will be asked by the program.
 
ConsPond.c
This program asks for the "name" of the folder containing the data. This name is the first parameter. It reads the "GeneTrees.tre" file generated by the GeneTrees program. Two kinds of "consensus tree" are computed ; the median tree and the extended majority rule consensus tree.

They are computed with or without robustness coefficient (RC) which is the second parameter that indicates :
(0) no coefficient are used
(1) the arboricity coefficient is used
(2) the well designed quadruple rate is used.
Both are given in the "Quality.txt" file.

The program calculates for each tree the bipartitions (splits) set corresponding to the internal edges. Those having a length lower than eps = 0.0001 are not considered and do not give a bipartition. Then it establishes weights for bipartitions and trees (see article).

The tree with the maximum weight in the "GeneTrees.tre" is saved in the Newick format as the "MedianTree.nw" file, with its own edge lengths and boostrap values corresponding to the weights of its bipartitions.

The extended majority rule consensus tree is built. If there are enough compatible bipartitions, this tree is completely resolved. It is saved in the Newick format as the "ConsTree.nw" file. The edge lengths are the average length in trees containing them and the boostrap values are the weights of its bipartitions.

The two parameters, the folder name and the Robustness coefficient, can be given as arguments in a command lign. Unless, they will be asked by the program.

For methodological questions : guenoche@iml.univ-mrs.fr

For programming questions : garreta@lif.univ-mrs.fr

For biological questions : darlu@vjf.inserm.fr