[Bioperl-l] SearchIO speed up

Chris Fields cjfields at uiuc.edu
Fri Aug 18 07:56:29 EDT 2006

On Aug 18, 2006, at 2:00 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Aug 17, 2006, at 4:53 PM, Sendu Bala wrote:
>> I don't think it's worth that much effort really.  There are other
>> ways to go about this, such as your and Aaron's suggested pull
>> parser, the hash-based approach, etc., which may be better.
> Doing multiple things gives a better final result. Every penny counts,
> so to speak.

If  you have 4-5 fold increases w/o API changes, fine.  But I don't  
think 1.5-fold is worth worrying about if it involves something  
fundamental about the class (inheritance).

>> So far, sorry to say, it's debatable whether a 1.5-fold increase  
>> in speed
>> along with even small API changes is worth all the effort you are
>> putting into it.
> To be fair, no API change is required, and it only took a few  
> minutes to
> implement and try the idea out :)

Maybe I'm missing something here; didn't you say it failed tests  
somewhere?  That's suggestive of API problems.

>> That being said, the nice thing about SearchIO is that you could
>> introduce new SearchIO::* modules using your own custom handler/
>> Search class combinations to work alongside the current ones; that
>> way everybody has an option (use the old slow more OO ones vs. the
>> new fast hash-based ones).  There, they may choose to use a new API
>> for the speed advantages.  Make it easier for them to make the right
>> choice i.e. Damian Conway's affordances.
> Even if you were making a new SearchIO module, I think you'd want to
> have it return HSPI objects for the hsps. Otherwise to what extent  
> is it
> a bioperl or searchio module? To what extent will people be able to
> easily use the new module with existing code that expects a  
> SearchIO to
> eventually provide HSPI objects?
> Maybe I'm wrong about that - is it reasonable to just come up with a
> whole new system for returning the results, and have users learn to  
> use
> the new system?

My point is, if you create something new that changes the API (i.e.  
create new Result/Hit/HSP interfaces, then implement them) but keep  
the old way around (ResultI/HitI/HSPI-based implementations), then  
the user can make the decision on what to use, not us.  If the gain  
in using the new classes is substantial enough people will probably  
switch and get used to the new API/methods.  If both are used, then  
both stay.

If everybody wants to have SearchIO methods return only HSPI/HitI (no  
new interface/API allowes), then create something new, basing it on  
SearchIO but running it the way you want.  That's how SearchIO came  
about in the first place.

I don't think that's unreasonable.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign

More information about the Bioperl-l mailing list