[Bioperl-l] SearchIO speed up

aaron.j.mackey at gsk.com aaron.j.mackey at gsk.com
Mon Aug 14 10:00:01 EDT 2006


I'm failing to understand, sorry.

The UNIX utility "more" (or "less" if you prefer) is a pull parser; it 
reads the stream as much as it needs to satisfy the current iteration (the 
next iteration occurring when the user asks for an additional screen or 
line).  It does not copy data from a pipe into temp storage.

That said, you can't use "more" to page backwards in piped content (unless 
your "more" is keeping a buffer, which some do).

So, I agree that you will need some form of storage for the *current* 
information to be parsed (and must process all of the stream necessary to 
obtain all such information), but not for any of the information yet to be 
accessed.

-Aaron


bioperl-l-bounces at lists.open-bio.org wrote on 08/14/2006 09:04:19 AM:

> aaron.j.mackey at gsk.com wrote:
> > A "pull parser" need not read everything (i.e. the entire file) into 
> > memory, just the current/next chunk, right?
> 
> The problem arises when you need random-access to the input data in 
> order to do what you need to do, like get just the next chunk or bit of 
> information.
> 
> So I don't see a way for a generalized pull-parser to cope with piped 
> input, because most operations are going to have use seek() to work, and 

> you can't seek piped input.
> 
> What I do at the moment, then, is on detecting piped input, I'm forced 
> to read all the input data in in one go and spit it out into seekable 
> memory or a temp file. After which normal behaviour resumes - you don't 
> read everything, just the bit you want.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




More information about the Bioperl-l mailing list