[Bioperl-l] Re: [Bioperl-guts-l] bioperl commit
Aaron J. Mackey
amackey at pcbi.upenn.edu
Tue Jul 13 08:17:20 EDT 2004
On Jul 12, 2004, at 11:30 PM, Chris Mungall wrote:
> Added ability to parse sequence data in GFF3 - see NOTES section &
> email to bioperl list for details
> +If you call
> + $gffio->ignore_sequence_data_toggle(1)
> +prior to parsing the sequence data is ignored; this is useful if you
> +just want the features. It avoids the memory overhead in building and
> +caching sequences
Maybe just $gffio->ignore_sequence(1) would be sufficient? We tend to
not add "_toggle" to every attribute; besides which "toggle" has the
semantics that every time you call it, the value switches.
> +Alternatively, you can call either
> + $gffio->get_all_seqs()
Again, would $gffio->get_seqs() suffice?
> + $gffio->seq_id_by_h()
Why have two separate APIs to get the same data? If you want to
provide a hashref instead of an array of seqs, use the calling context
of get_seqs() ...
> +Note that these objects will not have the features attached - you have
> +to do this yourself, OR call
> + $gffio->features_attached_to_seqs_toggle(1)
Again, $gffio->attach_features(1) seems sufficient ...
> +Note that auto-attaching the features to seqs will incur a higher
> +memory overhead as the features must be cached until the sequence data
> +is found
Which would be the same if you "had to do this yourself". I think it's
fair that if a sequence is to have 100 features attached to it, that
those 100 features will require memory. There's no *extra* memory
overhead here, is there?
> +=head1 TODO
> +Make a Bio::SeqIO class specifically for GFF3 with sequence data
This would lead to a much cleaner API, and could now easily be done via
your improvements to Bio::Tools::GFF
As an aside, instead of reimplementing your own simple FASTA parser, is
it possible to pass along the Bio::Root::IO object to Bio::SeqIO::fasta
directly, and let it do the work?
More information about the Bioperl-l