HOWTO:Trees (post-refactor)

From BioPerl
Jump to: navigation, search

Contents

For the Lazy

use Bio::Tree::Tree;
my $tree;
 
# Step 1: Load a tree
#   from a file
#$tree = Bio::Tree::Tree->from_file("my-tree.xml"); 
#   from a string
$tree = Bio::Tree::Tree->from_string("(a,(b,c));"); 
#   using the TreeIO system (the format can be autodetected if the -format argument is missing)
#my $treeio = Bio::TreeIO->new(-file => "my-tree.xml", -format => 'phyloxml'); 
#$tree = $treeio->next_tree;
 
# Step 2: Save a tree
#   you can get the Newick string directly
print "Newick format: " . $tree->newick . "\n";
#   or use a TreeIO writer
$treeio = Bio::TreeIO->new(-file => ">tree-out.xml", -format => 'phyloxml');
$treeio->write_tree($tree);
 
# Step 3: Analyze a tree
print "Leaf count: " . scalar($tree->leaves) . "\n";
print "Node count: " . scalar($tree->nodes) . "\n";
print "Total branch length: " . $tree->total_branch_length . "\n";
print "Max root-to-tip branch length: " . $tree->max_distance_to_leaf . "\n";
print "Max root-to-tip node depth: " . $tree->max_depth_to_leaf . "\n";
 
# Step 4: Modify a tree
#   print a human-readable ASCII diagram
print "Original tree:\n" . $tree->ascii; 
my $orig_root = $tree->root;
#   re-root the tree on the node labeled 'b'
$tree->reroot($tree->find('b'));
print "Rerooted on b:\n" . $tree->ascii;
#   re-root halfway along the branch leading to 'c'
#   (this can be more intuitive, but it adds a new internal
#   node to the tree)
$tree->reroot_above($tree->find('c'), 0.5); 
print "Rerooted above c:\n" . $tree->ascii;
#   return to the old root and remove the internal node created
#   in the previous re-rooting.
$tree->reroot($orig_root);
$tree->contract_linear_paths;
 
#   use key-value mappings to translate the tree's node labels
my $id_map = {
  'a' => 'Aardvark',
  'b' => 'Banana',
  'c' => 'Coyote'
};
$tree->translate_ids($id_map);
print "Translated IDs:\n" . $tree->ascii;
 
#   add a new node to the tree
#   ... first create a new object of the same class as the root
my $root_node = $tree->root;
my $new_node = new $root_node;
#   ... then add it as third child of the node parental to Banana
$tree->find('Banana')->parent->add_child($new_node);
$new_node->branch_length(1);
$new_node->id('z');
print "New node added:\n" . $tree->ascii;
 
#   now the tree has a multifurcation -- ask BioPerl to randomly
#   resolve the multifurcation
$tree->force_binary;
print "Forced to binary structure:\n" . $tree->ascii;
 
$tree = Bio::Tree::Tree->from_string("(a,(b,(c,(d,(e,(f,(g,(h,i))))))));");
print "Full tree:\n" . $tree->ascii;
my $slice = $tree->slice($tree->find('a'), $tree->find('d'), $tree->find('i'));
print "Slice:\n" . $slice->ascii;
my $slice = $tree->slice_by_ids('a', 'c', 'e', 'i');
print "Slice:\n" . $slice->ascii;
Newick format: (a,(b,c));
Leaf count: 3
Node count: 5
Total branch length: 0
Max root-to-tip branch length: 2
Max root-to-tip node depth: 1
Original tree:
          /-a
---------|
         |          /-b
          \--------|
                    \-c
Rerooted on b:
                    /-c
-b------- /--------|
                    \-------- /-a
 
 
Rerooted above c:
          /-c
---------|
         |          /-b
          \--------|
                    \-------- /-a
 
Translated IDs:
          /-Aardvark
---------|
         |          /-Banana
          \--------|
                    \-Coyote
New node added:
          /-Aardvark
         |
---------|          /-Banana
         |         |
          \--------|--Coyote
                   |
                    \-z
Forced to binary structure:
          /-Aardvark
---------|
         |          /-Banana
          \--------|
                   |          /-Coyote
                    \--------|
                              \-z
Full tree:
          /-a
---------|
         |          /-b
          \--------|
                   |          /-c
                    \--------|
                             |          /-d
                              \--------|
                                       |          /-e
                                        \--------|
                                                 |          /-f
                                                  \--------|
                                                           |          /-g
                                                            \--------|
                                                                     |          /-h
                                                                      \--------|
                                                                                \-i
Slice:
          /-a
---------|
         |          /-d
          \--------|
                    \-i
Slice:
          /-a
---------|
         |          /-c
          \--------|
                   |          /-e
                    \--------|
                              \-i

Motivation

The evolutionary tree is a fundamental concept in biology, and the tree structure is a common datatype in computer science and bioinformatics. Almost all biological studies involve a phylogenetic tree at some stage or another, whether it be locating the position of an organism in the tree of life, inferring the evolutionary history of a protein family, or testing different detailed evolutionary hypotheses using mathematical models and a known evolutionary tree.

A powerful API for loading, modifying, analyzing, and saving trees is thus a necessity for any 'swiss army knife' biological toolkit such as BioPerl.

Goals

The ultimate goal of the BioPerl Tree API is to make working with phylogenetic trees simple and easy.

Bio::Tree::* is not meant to be the fastest or most powerful tree library available; rather, it should provide the basic tools that 'glue' scripts might require to work with trees that are output from or input to other programs. For more advanced functionality such as manipulating large trees (e.g., the entire NCBI taxonomy) or inferring phylogenetic trees from molecular data, please look elsewhere. Joe Felsenstein maintains a phylogenetic software page which is a useful starting point.

Some key aspects of the API:

  • The most common tree operations should require little code and even less documentation
  • More advanced functionality should be well-abstracted and encapsulated into methods of reasonable size and complexity
  • Short yet accurate method names should be used when possible
  • Convenience methods should be provided in order to save users from needless characters and excess lines of boilerplate code.
Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox