[Bioperl-l] Re: [Bioperl-guts-l] bioperl commit
cjm at fruitfly.org
Tue Jul 13 12:42:24 EDT 2004
On Tue, 13 Jul 2004, Aaron J. Mackey wrote:
> On Jul 12, 2004, at 11:30 PM, Chris Mungall wrote:
> > Added ability to parse sequence data in GFF3 - see NOTES section &
> > email to bioperl list for details
> > +If you call
> > +
> > + $gffio->ignore_sequence_data_toggle(1)
> > +
> > +prior to parsing the sequence data is ignored; this is useful if you
> > +just want the features. It avoids the memory overhead in building and
> > +caching sequences
> Maybe just $gffio->ignore_sequence(1) would be sufficient? We tend to
> not add "_toggle" to every attribute; besides which "toggle" has the
> semantics that every time you call it, the value switches.
OK, will change
> > +Alternatively, you can call either
> > +
> > + $gffio->get_all_seqs()
> Again, would $gffio->get_seqs() suffice?
> > + $gffio->seq_id_by_h()
> Why have two separate APIs to get the same data? If you want to
> provide a hashref instead of an array of seqs, use the calling context
> of get_seqs() ...
OK; I will keep get_id_by_h private (it's used within the module)
> > +Note that these objects will not have the features attached - you have
> > +to do this yourself, OR call
> > +
> > + $gffio->features_attached_to_seqs_toggle(1)
> Again, $gffio->attach_features(1) seems sufficient ...
OK; although one might be led to expect that the argument for a method
with that name would be a list of SeqFeatures. Is the BP method name
syntax enshrined anywhere, or is it more a general set of principles
shared by the authors?
> > +Note that auto-attaching the features to seqs will incur a higher
> > +memory overhead as the features must be cached until the sequence data
> > +is found
> Which would be the same if you "had to do this yourself". I think it's
> fair that if a sequence is to have 100 features attached to it, that
> those 100 features will require memory. There's no *extra* memory
> overhead here, is there?
Generally not, but the client app may wish to do something with the
feature or seq and then immediately discard it - in which case the caching
would not be required
> > +=head1 TODO
> > +
> > +Make a Bio::SeqIO class specifically for GFF3 with sequence data
> This would lead to a much cleaner API, and could now easily be done via
> your improvements to Bio::Tools::GFF
OK, will add
> As an aside, instead of reimplementing your own simple FASTA parser, is
> it possible to pass along the Bio::Root::IO object to Bio::SeqIO::fasta
> directly, and let it do the work?
Hmm, when I wrote the parser I wrote it in such a way that the sequence
data could be interspersed with the feature data. It seems that this is
unneccessary, as the spec states that the sequence data must come at the
very end of the file.
So perhaps I should reeingineer it a bit so that it rejects anything that
doesn't follow the spec. This makes it easier to use the FASTA parser.
More information about the Bioperl-l