[Bioperl-l] SearchIO speed up
cjfields at uiuc.edu
Thu Aug 17 22:12:52 EDT 2006
On Aug 17, 2006, at 4:53 PM, Sendu Bala wrote:
> Chris Fields wrote:
> That is exactly what I did (on your suggestion). The problem that
> points out is that HSPI should continue being a SimilarityPair in case
> anything checks that it is a SimilarityPair.
Okay, fine by me. It was merely a suggestion thrown out there.
Seemed like you were banging your head against the wall trying to
work this out.
What I intended was something that wouldn't dramatically change what
was returned from the methods (you would get SeqFeature::Similarity
objects back). Hilmar has a point, though; if checks are performed
to see if the HSP is-a SeqFeatureI then there will be problems (as
the failed tests probably show).
> Would there be any problem with leaving HSPI as a SimilarityPair and
> having GenericHSP::new as:
> This gives a 1.43x speedup. (Simply overriding methods gives only a
> 1.14x speedup.)
I don't think it's worth that much effort really. There are other
ways to go about this, such as your and Aaron's suggested pull
parser, the hash-based approach, etc., which may be better. My
concern is trying to maintain API in the current set of classes
unless (as pointed out, again, by Hilmar) there is a tremendous
advantage to making changes that break the current API. So far,
sorry to say, it's debatable whether a 1.5-fold increase in speed
along with even small API changes is worth all the effort you are
putting into it. I don't think changing what's already present in
the current SearchIO modules will accomplish much.
That being said, the nice thing about SearchIO is that you could
introduce new SearchIO::* modules using your own custom handler/
Search class combinations to work alongside the current ones; that
way everybody has an option (use the old slow more OO ones vs. the
new fast hash-based ones). There, they may choose to use a new API
for the speed advantages. Make it easier for them to make the right
choice i.e. Damian Conway's affordances.
You may not even have to use a handler, and you could even build your
own Search interface classes to tailor-fit your specific needs.
There's a lot of freedom there, which can be a dangerous thing.
Those SearchIO classes that get the most usage will likely eventually
lead to deprecation of the ones infrequently used/maintained. This
is the current idea of Lincoln's Bio::DB::SeqFeature, which I believe
is intended to eventually replace Bio::DB::GFF. When everybody
realizes that GFF3 works better with Bio::DB::SeqFeature, eventually
Bio::DB::GFF likely will no longer be actively maintained and
Remember, your SearchIO modifications do not have to be included in
this release of BioPerl, so don't rush them to make a release. We
could feasibly have 1-2 extra dev releases before v1.6, maybe more.
Rushing to make a release was one of the initial problems with
Bio::SeqFeatureI (I think) in the first 1.5 release. Please correct
me if I'm wrong there, Hilmar.
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l