[Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees

Tristan Lefebure tristan.lefebure at gmail.com
Fri Aug 8 12:02:32 EDT 2008

Yes indeed, with the svn code it took 10 minutes (compared to one week!)
Thanks, -Tristan

On Fri, Aug 8, 2008 at 3:50 AM, Sendu Bala <bix at sendu.me.uk> wrote:

> Chris Fields wrote:
>> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:
>>  Tristan Lefebure wrote:
>>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed with
>>>> BioPerl (with the only difference that I'm using taxids instead of taxon
>>>> names). Basically, the script generates a taxonomic tree given a list of
>>>> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon
>>>> object, and then merge this object to a tree object that keeps growing. It
>>>> runs very well with a small number of taxa, but with many taxa (>1000), it
>>>> is very very very slow (about a week for 3000 taxa).
>>>> The slowness is due to the  function merge_lineage (line 65), which
>>>> merges the existing tree object with a new taxon object. I guess that the
>>>> difficulty with a big tree (i.e. more than 1000 leaf) is to find the nodes
>>>> in common between the tree and the new taxon object...
>>>> Would you have any idea of how to get around the problem? Should I look
>>>> under the hood of merge_lineage to try to improve it for large trees?
>>> Yes, please do. It might have been me that wrote that, in which case I
>>> didn't do anything fancy or consider the above problem.
>> Actually I thought that was fixed;
> Oh yeah. Looks like I did something related to 'speedup for
> merge_lineage()' on the 18th Dec 2006. Tristan, checkout
> Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem.

More information about the Bioperl-l mailing list