[Bioperl-l] Bio::SeqIO::game

Hilmar Lapp hlapp@gmx.net
Thu, 30 Nov 2000 11:29:04 -0800

Jason Stajich wrote:
> Brad - Thanks for the game parser updates and test files.
> I have some comments.  One thing we've been kicking around with the latest
> bioperl release and the Bio::DB rewrites is that the expected behavior
> when a class is reading a data file of sequences is for it to only read
> one sequence at a time.  The game code actually reads everything in at one
> time and does this multiple times depending on whether or not it is
> adding features or just reading in a primary seq.

These days sequences are getting bigger, not shorter. I think it is
absolutely worthwhile to try to avoid slurping in a whole file of
sequences whenever possible.

> It also expects only file names to be passed in, but I think file handles
> should be supported as well and that it should be using the
> $self->_filehandle method that all SeqIO classes subscribe to.  This
> will not work with the current way of multiple passes on the document.

I agree. File handles as general streams should be supported. This way
you can pass in a socket or any emulation of a stream.

> There are some simple ways around this, one is to read everything from the
> stream/file and store it as one giant string and then re-pass this string
> as input to the SAX parser. This will use up a large amount of memory and
> break on very large files.  The problem is that the SAX parser is
> expecting to read to the end of the document not to the end of a
> <seq></seq> block.  Anyone else had a chance to look over this and think
> about it?

Unfortunately not. I also don't know the internals of the perl SAX
interface, but in general SAX was exactly defined for one-pass
chunk-by-chunk stream reading. Does the perl SAX parser not adhere to
this concept, or are there pecularities of the GAME DTD that prohibit


Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757