[Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees
bix at sendu.me.uk
Fri Aug 8 03:50:50 EDT 2008
Chris Fields wrote:
> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:
>> Tristan Lefebure wrote:
>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed
>>> with BioPerl (with the only difference that I'm using taxids instead
>>> of taxon names). Basically, the script generates a taxonomic tree
>>> given a list of taxids using the NCBI taxonomy db. For each taxon, it
>>> generates a taxon object, and then merge this object to a tree object
>>> that keeps growing. It runs very well with a small number of taxa,
>>> but with many taxa (>1000), it is very very very slow (about a week
>>> for 3000 taxa).
>>> The slowness is due to the function merge_lineage (line 65), which
>>> merges the existing tree object with a new taxon object. I guess that
>>> the difficulty with a big tree (i.e. more than 1000 leaf) is to find
>>> the nodes in common between the tree and the new taxon object...
>>> Would you have any idea of how to get around the problem? Should I
>>> look under the hood of merge_lineage to try to improve it for large
>> Yes, please do. It might have been me that wrote that, in which case I
>> didn't do anything fancy or consider the above problem.
> Actually I thought that was fixed;
Oh yeah. Looks like I did something related to 'speedup for
merge_lineage()' on the 18th Dec 2006. Tristan, checkout
Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem.
More information about the Bioperl-l