[Bioperl-l] Bio::SeqIO::game

Jason Stajich jason@chg.mc.duke.edu
Thu, 30 Nov 2000 12:01:27 -0500 (EST)

Brad - Thanks for the game parser updates and test files.  

I have some comments.  One thing we've been kicking around with the latest
bioperl release and the Bio::DB rewrites is that the expected behavior
when a class is reading a data file of sequences is for it to only read
one sequence at a time.  The game code actually reads everything in at one
time and does this multiple times depending on whether or not it is
adding features or just reading in a primary seq.  

It also expects only file names to be passed in, but I think file handles
should be supported as well and that it should be using the
$self->_filehandle method that all SeqIO classes subscribe to.  This 
will not work with the current way of multiple passes on the document.

There are some simple ways around this, one is to read everything from the
stream/file and store it as one giant string and then re-pass this string
as input to the SAX parser. This will use up a large amount of memory and
break on very large files.  The problem is that the SAX parser is
expecting to read to the end of the document not to the end of a
<seq></seq> block.  Anyone else had a chance to look over this and think
about it?

I checked in some changes to try and keep this SeqIO compliant --
specifically expecting -file=>filename instead of just reading the 2nd

<   $self->{file}=@args[1];
>   ($self->{file} ) = $self->_rearrange( [ qw(FILE) ], @args);
>   $self->throw("did not specify a file to read, Filehandle suport is not
implemented currently") if( !defined $self->{file});

Jason Stajich
Center for Human Genetics
Duke University Medical Center