[Bioperl-l] bioperl-db performance
Alex.Zelensky at anu.edu.au
Tue Sep 7 00:58:24 EDT 2004
Ok, I found one bottleneck which if removed significantly speeds up
species (and hence sequence) object retrieval from bioperl-db, at least
in my case (mysql 4.0.20). For the reasons I don't understand, SQL
statement in the
about 10x faster if " ORDER BY node.left_value" is removed from it and
sorting of the classification array is done with perl (left_value has
to be included into the list of returned fields). An even higher
speedup (from >16'' to <1'') can be achieved by replacing the complex
request with a dumb perl loop that fetches parent nodes one by one with
a simple select by primary key.
I didn't check whether this behavior is specific to mysql or the
particular versions of it that I have (4.0.16 and 4.0.20), but since
mysql is popular and 4.0.20 is the current production version, I think
it's better to fix this.
On 06/09/2004, at 3:30 PM, Alex Zelensky wrote:
> I have a project which is based on the bioperl-db. Till now I've been
> using the old (bioperl-1.1 branch) version of the code and schema, but
> it is becoming unacceptable (mainly because of the way taxonomy is
> stored), so I decided to upgrade to the current version. The new code
> is a huge leap forward in terms of design, clarity and consistency.
> However, I am experiencing severe performance problems.
> For example, retrieving a locally stored GenPept entry consistently
> takes 16-17 seconds (by primary or unique key, doesn't matter),
> compared 2-3'' it takes to get it directly from SRS using
> Bio::DB::GenBank or ~ 1'' from the old bioperl-db. Also, getting a
> species object (I use them a lot) from a local database (new
> bioperl-db) that contains nothing but an import of NCBI taxonomy takes
> >15'', compared to <1'' with the old bioperl-db. In both cases I use a
> mysql 4.0.16 on a dual 866 Mhz PowerPC G4 with 768 Mb RAM.
> So, my questions are:
> 1. Is this performance drop an expected behavior (due to increased
> complexity of the code and new schema)?
> 2. If the answer to (1) is yes, then what is the way to improve it and
> how big an improvement can be achieved?
> 3. If the answer to (1) is no, where should I look for my problem
> There was a related question on this list in May 2004, but it
> described sequence loading performance on a significantly slower
> machine, and the suggestion was to increase the horsepower.
> Thanks in advance!
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
More information about the Bioperl-l