[Bioperl-l] Bio::DB::SeqFeature::Store::memory ->filter_by_type very slow

Mark A. Jensen maj at fortinbras.us
Fri Feb 5 10:57:29 EST 2010


my guess is ..memory::find_types is costly, as it descends into
Bio::DB::GFF::Typename objects...could implement a cache
or memo?
----- Original Message ----- 
From: "Lincoln Stein" <lincoln.stein at gmail.com>
To: "Jelle Scholtalbers" <j.scholtalbers at gmail.com>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Friday, February 05, 2010 10:46 AM
Subject: Re: [Bioperl-l] Bio::DB::SeqFeature::Store::memory ->filter_by_type 
very slow


>I think the problem with the filter function is that the type you request
> may be "BAC" but the feature's type is "BAC:FPC", and you want to be able to
> filter by the more generic type terms. Nevertheless I'm sure we can do
> better than 60 min running time and so I'll have to look at how this
> function works more carefully. I can't do this right now, unfortunately, so
> perhaps someone on the mailing list would be willing to take a look?
>
> Lincoln
>
> On Mon, Feb 1, 2010 at 7:24 AM, Jelle Scholtalbers <j.scholtalbers at gmail.com
>> wrote:
>
>> Hi,
>> I used the Bio::DB::SeqFeature::Store::memory module to load in a GFF3 file
>> which I could then use in my script in a 'queryable' way. To retrieve
>> features I used for example
>>        $db->features(-type => 'BAC:FPC', -seq_id=>'chromosome0')
>> However when doing a profile on my script I found out that 60% of the
>> running time went into filter_by_type from
>> Bio::DB::SeqFeature::Store::memory.
>> Replacing this function with
>>     my @features = grep{$_->type eq 'BAC:FPC'}
>> $db->features(-seq_id=>'chromosome0')
>> which gave me the same results was just a fraction of the earlier run time.
>> My script went from 60min. to 4min. for the same result and only changing
>> this function (is called often).
>> Can/Should this be fixed or is this just the faster way to do it?
>>
>> Cheers,
>> Jelle
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> -- 
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 



More information about the Bioperl-l mailing list