[Bioperl-l] Bio::Seq, search for specific features

Frank Schwach fs5 at sanger.ac.uk
Thu Sep 9 04:10:36 EDT 2010


so something like an abstract Bio::Seq::FeatureContainer that defines
the methods for storing and retrieving features and that would then be
sub-classed to e.g. Bio::Seq::FeatureContainer::Memory or
Bio::Seq::FeatureContainer:Sqlite - is that the plan? Is there any way I
can get involved or is it better to wait for other features to be
developed first?

Cheers,

Frank



On Wed, 2010-09-08 at 18:20 -0500, Chris Fields wrote:
> Well, no move has been concretely made yet.  It would be nice to abstract the backend, so one could use possibly any db or memory adaptor.  This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing).
> 
> chris
> 
> On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:
> 
> > Hi Jason,
> > 
> > Yes, I guess that would be the simplest way of doing it - basically just
> > doing it the way the docs suggest for getting at a specific feature but
> > hiding the grep behind a Bio::Seq method with search parameters. But we
> > could also build a hash of feature tags as the Bio::Seq is built so that
> > retrieval is more efficient. This could also be used to implement a bin
> > indexing scheme for range queries, similar to what Bio::DB::GFF does.
> > Is a move to an sqlite backend planend for the near future? 
> > 
> > Frank
> > 
> > 
> > 
> > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
> >> And the implementation would just be something like this?
> >> 
> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
> >> eq 'my_gene' } $seq->get_SeqFeatures();
> >> 
> >> I think any implementation would be if we moved from the in-memory 
> >> arrays & hash-based system to a sqlite db on the back-end for how 
> >> Sequence and Feature objects are stored.
> >> This would be a somewhat slower but wouldn't have performance/memory 
> >> problems we get for sequences with many annotations.
> >> 
> >> -jason
> >> Frank Schwach wrote, On 9/7/10 5:09 AM:
> >>> I am working a lot with feature-rich Bio::Seq objects these days and
> >>> thought that it would be really nice if I could do something like:
> >>> 
> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
> >>> 
> >>> instead of having to grep for the feature every time.
> >>> There could then be 'by_tag' and 'by_region' options as well.
> >>> 
> >>> According to the Bio::Seq docs, something like this seems to be planned
> >>> at some stage. I would be willing to contribute to this feature if I can
> >>> and if this isn't already being implemented by somebody else.
> >>> Does anybody know the state of this feature?
> >>> 
> >>> Frank
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> > 
> > 
> > 
> > -- 
> > The Wellcome Trust Sanger Institute is operated by Genome Research 
> > Limited, a charity registered in England with number 1021457 and a 
> > company registered in England with number 2742969, whose registered 
> > office is 215 Euston Road, London, NW1 2BE. 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


More information about the Bioperl-l mailing list