[Bioperl-l] SearchIO speed up

Chris Fields cjfields at uiuc.edu
Thu Aug 17 22:12:52 EDT 2006

On Aug 17, 2006, at 4:53 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> ...
> That is exactly what I did (on your suggestion). The problem that  
> Hilmar
> points out is that HSPI should continue being a SimilarityPair in case
> anything checks that it is a SimilarityPair.

Okay, fine by me.  It was merely a suggestion thrown out there.   
Seemed like you were banging your head against the wall trying to  
work this out.

What I intended was something that wouldn't dramatically change what  
was returned from the methods (you would get SeqFeature::Similarity  
objects back).  Hilmar has a point, though; if checks are performed  
to see if the HSP is-a SeqFeatureI then there will be problems (as  
the failed tests probably show).

> Would there be any problem with leaving HSPI as a SimilarityPair and
> having GenericHSP::new as:
> ...
> This gives a 1.43x speedup. (Simply overriding methods gives only a
> 1.14x speedup.)

I don't think it's worth that much effort really.  There are other  
ways to go about this, such as your and Aaron's suggested pull  
parser, the hash-based approach, etc., which may be better.  My  
concern is trying to maintain API in the current set of classes  
unless (as pointed out, again, by Hilmar) there is a tremendous  
advantage to making changes that break the current API.  So far,  
sorry to say, it's debatable whether a 1.5-fold increase in speed  
along with even small API changes is worth all the effort you are  
putting into it.  I don't think changing what's already present in  
the current SearchIO modules will accomplish much.

That being said, the nice thing about SearchIO is that you could  
introduce new SearchIO::* modules using your own custom handler/ 
Search class combinations to work alongside the current ones; that  
way everybody has an option (use the old slow more OO ones vs. the  
new fast hash-based ones).  There, they may choose to use a new API  
for the speed advantages.  Make it easier for them to make the right  
choice i.e. Damian Conway's affordances.

You may not even have to use a handler, and you could even build your  
own Search interface classes to tailor-fit your specific needs.   
There's a lot of freedom there, which can be a dangerous thing.

Those SearchIO classes that get the most usage will likely eventually  
lead to deprecation of the ones infrequently used/maintained.  This  
is the current idea of Lincoln's Bio::DB::SeqFeature, which I believe  
is intended to eventually replace Bio::DB::GFF.  When everybody  
realizes that GFF3 works better with Bio::DB::SeqFeature, eventually  
Bio::DB::GFF likely will no longer be actively maintained and  
eventually deprecated.

Remember, your SearchIO modifications do not have to be included in  
this release of BioPerl, so don't rush them to make a release.  We  
could feasibly have 1-2 extra dev releases before v1.6, maybe more.   
Rushing to make a release was one of the initial problems with  
Bio::SeqFeatureI (I think) in the first 1.5 release.  Please correct  
me if I'm wrong there, Hilmar.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign

More information about the Bioperl-l mailing list