[Bioperl-l] SearchIO speed up

aaron.j.mackey at gsk.com aaron.j.mackey at gsk.com
Thu Aug 10 13:39:59 EDT 2006

> ...Except I need to know if the community considers the speed problem 
> solved or not. More radical changes will make SearchIO even faster, eg. 
> Chris Fields and Jason (if I interpret the Project priority list item 
> correctly) have suggested an end to individual Hit and HSP objects, 
> which become just data members of a Result-like object. Ideally I don't 
> want to go down that route because we lose quite a bit of OO power;

As already mentioned, a lazy-evaluation approach would also work.

Jason and I did once talk about an entirely new parsing/object-building 
framework, based on nested grammars; in essence, the "top-level" parser, 
simply "chunks" the input into blobs of (minimally parsed) text that 
correspond to the top level result object.  This chunk/blob is the input 
to the next-level parser for Hits, which in return has chunk for HSPs. 
Note that the Result/Hit/HSP "chunks" are "fat", i.e. they *are* the same 
Generic*I-implementing objects we're already using.  Thus, if HSPs are 
never interrogated, they're never parsed; as soon as one is interrogated, 
it gets parsed, and so on.  In such an environment, you can imagine 
flyweight objects that are built very quickly/easily (recall that many 
previous analyses of BioPerl speed problems are not related to parsing, so 
much as heavy-weight object creation).

I happen to have such a nested parser lying around for 
Bio::SearchIO::fasta.pm, but it also uses an Inline::C, yacc-generated C 
parser backend (yet another experiment in trying to get SearchIO to run 
faster), so really isn't ready for prime time (being entirely untested, 
and probably not even finished).


More information about the Bioperl-l mailing list