[Bioperl-l] SearchIO speed up

Sendu Bala bix at sendu.me.uk
Sun Aug 20 17:56:28 EDT 2006


Sendu Bala wrote:
> Chris Fields wrote:
>> ...
>>> My proposal involves the "chunks" being unparsed, raw text "blobs", that
>>> are essentially blessed into a package that does the parsing only when
>>> necessary (and even then, might choose different parsing strategies, based
>>> on what's been asked for).  Thus a potentially large amount of parsing and
>>> storage is skipped.  Additionally, you now have the option of not even
>>> storing the blobs in memory, just file seek pointers (requiring temp.
>>> storage for streaming pipe data sources), and thus can process very large
>>> reports without consuming memory (currently a problem).
>> Using file pointers is a great touch.  Sendu has a slight aversion to temp
>> files but he has already indicated other ways around this.
> 
> I'm in the midst of implementing an 'Aaron'-style pull-parser which I 
> have called PullParserI.

I've now committed this to bioperl-live. It is Bio::PullParserI and the 
first thing to implement it is my new hmmer parser, 
Bio::SearchIO::hmmer_pull (for want of a better name). The API here 
isn't set in stone, so certainly I'd encourage suggestions for improvement.

I've made a start on a BLASTN parser so we can see a more familiar speed 
comparison, but its not ready yet. Meanwhile, see thread 'New hmmpfam 
parser'.


More information about the Bioperl-l mailing list