[Bioperl-l] SearchIO speed up

Chris Fields cjfields at uiuc.edu
Mon Aug 14 09:54:14 EDT 2006


On Aug 14, 2006, at 8:04 AM, Sendu Bala wrote:

> aaron.j.mackey at gsk.com wrote:
>> A "pull parser" need not read everything (i.e. the entire file) into
>> memory, just the current/next chunk, right?
>
> The problem arises when you need random-access to the input data in
> order to do what you need to do, like get just the next chunk or  
> bit of
> information.
>
> So I don't see a way for a generalized pull-parser to cope with piped
> input, because most operations are going to have use seek() to  
> work, and
> you can't seek piped input.
>
> What I do at the moment, then, is on detecting piped input, I'm forced
> to read all the input data in in one go and spit it out into seekable
> memory or a temp file. After which normal behaviour resumes - you  
> don't
> read everything, just the bit you want.

The traditional route has been using a tempfile.  Bio::Root::IO has  
several methods for creating tempdirs/tempfiles.

I would have the option available for a tempfile, at least, for the  
guys who deal with large BLAST files.  I think the XML files can also  
be quite long.

Speaking of XML, is the current idea to get this running on text- 
based BLAST initially?

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





More information about the Bioperl-l mailing list