Introductions

NJTREE is a versatile program that builds, manipulates, or infers evolutionary events from phylogenetic trees. It is able to reconstruct trees by means of neighbour-joining (NJ) or extended maximum-likelihood (ML), to infer duplications/losses as well as orthologs/paralogs, to merge trees with tree-merge algorithm, to reorder the leaves, to compare trees, to export trees in EPS formats, to filter out low-quality parts of multialignments, and even to do several types of pairwise alignment. NJTREE also comes with a nice Graphical User Interface (GUI) by using FLTK package, which is available for most of modern operating systems including Unix/Linux, Windows and Mac OS X.

NJTREE is the core engine of TreeFam. The first piece of its codes was writted in 2004 for building a tree with constrained NJ algorithm. This is why this program is called NJTREE. However, with the development of TreeFam, more and more functions were added to meet the high-standard requirements of TreeFam, which finally makes this program far more than an NJ-tree builder. The history of NJTREE reflects the progress of TREEFAM.

TreeFam is a gene families database for animal genes, and therefore NJTREE is best suited to manipulating gene trees of a gene family. It is particularly useful in these cases:

  • Reconstruct a gene tree for a gene family with known species phylogenies. NJTREE can be used to build species trees or gene trees with unknown species phylogenies, but in those cases it does not perform better than traditional tree builders such as PHYLIP and PHYML.
  • Infer duplications/losses as well as orthologs/paralogs given known species phylogenies.
  • Display phylogenetic trees, or plot trees in EPS with TreeFam-like look-and-feel.
  • Merge several gene trees which were built from the same multialignment with different algorithms, and get a better resultant tree.
  • Compare two trees and re-order leaves to make them look similar in visual.
  • Filter out low-quality parts of a multialignment.
  • Given a protein alignment and a set of related nucleotide sequences, construct a protein-guided codon alignment.
NJTREE may also be useful to extract subtrees, to calculate topological distance between trees, and to estimate branch lengths given fixed topologies (with constrained NJ).


Get NJTREE

NJTREE is distributed under GNU Public License (GPL). All its source codes are freely available to both academic and commercial users. The latest version can be downloaded at the SourceForge.net download page. Source codes are also available at SourceForge.net Subversion server, which can be access with:

    svn co https://svn.sourceforge.net/svnroot/treesoft/trunk/njtree njtree
    


Install NJTREE

Compiling NJTREE requires C/C++ compiler GCC, Fast Lexical Analyzer FLEX, and Bison/byacc. All of them are the standard components for a Linux developmental systems, and are available for other Unix, Windows/MinGW and Mac OS X. Compiling the GUI version further requires FLTK-1.1, which is also available for all the operating systems mentioned above. Whereas compiling the command-line version is generally easy, compiling GUI version might be hard for some users, and in this case downloading pre-compiled binaries is recommend.

If you want to compile NJTREE by yourself, you should first download the NJTREE source package njtree-x.y.z.tar.gz, and compile by:

    gzip -dc njtree-x.y.z.tar.gz; cd njtree-x.y.z; make
    
If everything goes well, you can copy the resultant binary njtree wherever you want. No additional files are needed. To compile the GUI version, you should first have FLTK-1.1 installed on your systems and then type:
    make flnjtree
    
Again, you could copy a single file flnjtree anywhere. It is standalone.


Get Started With NJTREE

This section only introduces frequently used functions of NJTREE.

Prepare Multialignment

Unlike PHYLIP, which uses PHYLIP aligment format, NJTREE takes multi-sequence FASTA format as the default alignment format. One can covert between differet alignment format by sreformat in HMMER package, or by seqret in EMBOSS package.

In manipulating gene trees, one of the most powerful strengths of NJTREE is to make use of species evolution in tree building and inference. To let NJTREE know what species a sequence belongs to, you should specify the species of a sequence with a underscore "_" in a sequence name. For example:

    >CCNE1_HUMAN
    ------------------------------------------------------------
    ----------------MSRRSSRLQAKQQPQPSQTESPQEAQ------------------
    ----IIQAKKRKTTQDVKKRREEV---------------------TKKHQYEIRNC---W
    PP---VLSGGISP---------------------------------------CIIIETPH
    KEIGT-------------------------------------------------------
    -----------------------------SDFSRFTN-----------------------
    ------------------------------YRFKNL-FINPS--------PLPDLSWGCS
    KEVWLNML--KKESRYVHDKHFEVLHSDLEPQMRSILLDWLLEVCEVYTLHRETFYLAQD
    FFDRFM-LTQKDINKNMLQLIGITSLFIASKLEEIYAPKLQEFAYVT-DGACSEEDILRM
    ELIILKALKWELCPVTIIS-WLNLFLQVDAL------------------KDAPK-VLL-P
    QYSQETFIQIAQLLDLCILAIDSLEFQYRILTAAALC-HFTSIEVVKKASGLEWDSISEC
    VDWMVPF----V-----NVVKSTSPVKLKTF-----------------KKI---PMEDRH
    NIQTHTNYLAMLEEVNYINTFRKGGQL------SPV------CNGGIMTPPKSTEKPPGK
    H-----------------------------------------------------------
    --------------------------
    >CycE-RA_DROME
    MGLNAKSVCSTSSTEPNGSIVTTAPSNGEVSSSIVVVVSSSSISSSSDSPIAILPHPDPI
    PSTSFSSASQRSEEELPGTSAASRTDEMCSCDSQNLAASTAAT-----------------
    ----SNGNKRKRRLSSDSNEDPELGFEP-----------PSA---KRQQRLPALYGSEQG
    NLSSVASSVYTSP-----VVSVDGQSTQELLSIRSSPAEDLSEAPHSPLPDSPDSPPSPD
    RGSKQTPVVVRYAAEQVVTSTVVTQKTEDDDLLDDSCEDYSYDEDDEDDVEEEDDDVEIY
    SSTISPASSGCSQQQAVNGERTPGL-PKHQEQIHHPVSDLM-------------------
    ------------------------------INMRTPMSPAVENGL--RQCPLPALAWANA
    ADVWRLMCHRDEQDSRLRSISMLEQHPGLQPRMRAILLDWLIEVCEVYKLHRETFYLAVD
    YLDRYL-HVAHKVQKTHLQLIGITCLFVAAKVEEIYPPKIGEFAYVT-DGACTERDILNH
    EKILLQALDWDISPITITG-WLGVYMQLNVNNRTPASFSQIGRQKS---AEADDAFIY-P
    QFSGFEFVQTSQLLDLCTLDVGMANYSYSVLAAAAIS-HTFSREMALRCSGLDWQVIQPC
    ARWMEPFFRVISQKAPYLQLNEQNEQVSNKF-----------GLGLICPNI---VTDDSH
    IIQTHTTTMDMYDEVLMAQDAAHAMRA--RIQASPATALR--APESLLTPPASSHKPDEY
    LGDEGDETGARSGISSTTTCCNTAASNKGGKSSSNNSVTSCSSRSNP-------------
    --------------------------
    
"HUMAN" and "DROME" indicate the species of the sequences. For complete list of species recognized by NJTREE, you could check out the species tree by: Examples, such as ex1.nucl.mfa, can be found in njtree-x.y.z/examples of the source code directory.

To specify species names is optional for some functions, but is required for duplication/loss inference and tree merge, and is definitely beneficial to other functions such as "best" and "phyml".

Another strength of NJTREE is to make use of several types of trees including protein trees, nucleotide trees, synonymous distance trees (dS) and non-synonymous distance trees (dN). To make the best use out of it, you are strongly recommended to provide a protein-guided codon alignment which can be built by replacing, in a protein alignment, each amino acid with the corresponding three-base codon. You can also use "backtrans" function to make such an alignment:

    njtree backtrans aa_alignment nt_sequences
    
where both aa_alignment and nt_sequences are in FASTA format.

Reconstruct Gene Trees

NJTREE provides NJ and ML methods for tree reconstruction. However, in consideration of accuracy, we recommend to build tree with "best" mode:

where "ex1.nucl.mfa" is a codon alignment provided with the NJTREE package in "examples" directory. Five additional trees, misc.phyml-aa.nhx, misc.phyml-nt.nhx, misc.nj-dn.nhx, misc.nj-ds.nhx and misc.nj-mm.nhx will also be generated. These trees are used to in tree merge.

Infer Duplications/Losses

If you have prepared the alignment, you can invoke duplication/loss inference by:

Duplication tags ":D=" and loss tags ":E=" will be written to the NHX field.


To be continued ...