[Bioperl-l] proposed additions to Tree and cladogram

Georgii A Bazykin gbazykin at Princeton.EDU
Fri Feb 3 15:38:04 EST 2006

Hi all,

a while ago, I mailed to bioperl-l some proposed additions to
phylogeny-related modules (see below). I am doing a project on hiv
phylogeny now, and rely on these additions heavily. They expand on
what was already present in the corresponding modules. I expected them
to be also of general usage (at least the first one).

However, I never got any answer, so I assumed that these additions
were considered superfluous by most.

I am now working on an addition to Tree::Draw::Cladogram module. For
my project, I need to color individual tree edges (including internal)
into colors from red to blue (according to the nosynonymous/synonymous
ratios of these branches). This should be technically easy (I guess I
will add -Rcolor, -Gcolor and -Bcolor tags to nodes and use them in
Cladogram to color preceding edges), but I have two questions:

    - will this add-on be of general interest - should I try to do it
    "the right way", updating the pods etc.;
    - in general, are there any guidelines about how specific an issue
    a method should address to be included in bioperl distribution?

Yegor Bazykin

This is a forwarded message
From: Georgii Bazykin <gbazykin at princeton.edu>
To: bioperl-l at bioperl.org
Date: Wednesday, October 26, 2005, 4:27:07 PM
Subject: suggestions for additions to Tree

===8<==============Original message text===============

here are some tree-related methods I needed and added to my bioperl.
Hope someone else finds any of them useful as well.

Yegor Bazykin

To NodeI:

# modified from total_branch_length in Tree:Tree module
# gets sum of branches in the subtree - descendents of given node

=head2 children_branch_length

 Title   : children_branch_length
 Usage   : my $size = $node->children_branch_length
 Function: Returns the sum of the length of all branches of the subtree which starts at given node
 Returns : integer
 Args    : none


sub children_branch_length {
   my ($self) = @_;
   return 0 if($self -> is_Leaf) ;

   my $sum = 0;

   for ($self -> get_all_Descendents) {
       $sum += $_->branch_length || 0;

   return $sum;


=head2 height_nodes

 Title   : height_nodes
 Usage   : my $len = $node->height_nodes
 Function: Returns the height of the tree starting at this
           node.  Height is the maximum branchlength to get to the tip.
 Returns : The longest length to a leaf, in nodes
 Args    : none


sub height_nodes{
   my ($self) = @_;
   return 0 if( $self->is_Leaf );

   my $max = 0;
   foreach my $subnode ( $self->each_Descendent ) { 
       my $s = $subnode->height_nodes + 1;
       if( $s > $max ) { $max = $s; }
   return $max;


=head2 get_all_Descendent_Leaves

 Title   : get_all_Descendent_Leaves($sortby)
 Usage   : my @nodes = $node->get_all_Descendent_Leaves;
 Function: Recursively fetch all the nodes and their descendents, only selecting leaves
           *NOTE* This is different from each_Descendent
 Returns : Array or Bio::Tree::NodeI objects
 Args    : $sortby [optional] "height", "creation" or coderef to be used
           to sort the order of children nodes.


sub get_all_Descendent_Leaves{
   my ($self, $sortby) = @_;
   $sortby ||= 'height';   
   my @nodes;
   foreach my $node ( $self->each_Descendent($sortby) ) {
       if ($node->is_Leaf) {
           push @nodes, $node;
       else {
           push @nodes, ($node->get_all_Descendents($sortby));
   return @nodes;

To Tree:

=head2 total_internal_branch_length

 Title   : total_internal_branch_length
 Usage   : my $size = $tree->total_internal_branch_length
 Function: Returns the sum of the length of all branches, excluding branches leading to leaves
 Returns : integer
 Args    : none


sub total_internal_branch_length {
   my ($self) = @_;
   my $sum = 0;
   if( defined $self->get_root_node ) {
       for ( $self->get_root_node->get_Descendents() ) {
           unless ($_->is_Leaf) {       # YB: THIS IS ALL I ADDED
               $sum += $_->branch_length || 0;
   return $sum;


To TreeFunctionsI:

=head2 distance_nodes

 Title   : distance_nodes
 Usage   : distance_nodes(-nodes => \@nodes )
 Function: returns the distance between two given nodes in numbers of nodes
 Returns : numerical distance
 Args    : -nodes => arrayref of nodes to test


# YB: distance_nodes is very similar to distance method in TreeFunctionsI except that 
# it estimates distances between nodes in numbers of nodes (e.g., 1 between mother and 
# daughter, 2 between two sisters, etc.)

sub distance_nodes {
    my ($self, at args) = @_;
    my ($nodes) = $self->_rearrange([qw(NODES)], at args);
    if( ! defined $nodes ) {
        $self->warn("Must supply -nodes parameter to distance_nodes() method");
        return undef;
    my ($node1,$node2) = $self->_check_two_nodes($nodes);
    # algorithm:

    # Find lca: Start with first node, find and save every node from it
    # to root, saving cumulative distance. Then start with second node;
    # for it and each of its ancestor nodes, check to see if it's in
    # the first node's ancestor list - if so it is the lca. Return sum
    # of (cumul. distance from node1 to lca) and (cumul. distance from
    # node2 to lca)

    # find and save every ancestor of node1 (including itself)

    my %node1_ancestors;        # keys are internal ids, values are objects
    my %node1_cumul_dist;       # keys are internal ids, values 
    # are cumulative distance from node1 to given node
    my $place = $node1;         # start at node1
    my $cumul_dist = 0;

    while ( $place ){
        $node1_ancestors{$place->internal_id} = $place;
        $node1_cumul_dist{$place->internal_id} = $cumul_dist;
        $cumul_dist++;                                                # YB
#YB     if ($place->branch_length) {
#YB         $cumul_dist += $place->branch_length; # include current branch
#YB                                               # length in next iteration
#YB     }
        $place = $place->ancestor;

    # now climb up node2, for each node checking whether 
    # it's in node1_ancestors
    $place = $node2;  # start at node2
    $cumul_dist = 0;
    while ( $place ){
        foreach my $key ( keys %node1_ancestors ){ # ugh
            if ( $place->internal_id == $key){ # we're at lca
                return $node1_cumul_dist{$key} + $cumul_dist;
        # include current branch length in next iteration
#YB     $cumul_dist += $place->branch_length || 0; 
        $cumul_dist++;                                                 # YB
        $place = $place->ancestor;
    $self->warn("Could not find distance!"); # should never execute, 
    # if so, there's a problem
    return undef;
===8<===========End of original message text===========

More information about the Bioperl-l mailing list