[Bioperl-l] extracting CDS portion of RefSeqs

Cook, Malcolm MEC at stowers-institute.org
Wed Dec 21 12:59:34 EST 2005


re: "only genbank.pm supports the SeqBuilder interface" 

Ah, that explains why I saw no speedup when I compared reading EMBL with
and without running 


(and also why I noticed that my feature processing sub worked even if I
didn't add_wanted_slot on 'features' !!)

FYI - using the $builder as above to read 46 GenBank mRNA RefSeq
containing lots of REFERENCE data gave me ~ 33% speed up
HOWEVER, I get %52 speed up if instead I pre-filtered the genbank
flatfile using:
	perl -n -e "print if (m'^(LOCUS|ACCESSION)' ||

-- Malcolm

>-----Original Message-----
>From: Hilmar Lapp [mailto:hlapp at gmx.net] 
>Sent: Friday, December 16, 2005 11:55 AM
>To: Cook, Malcolm
>Cc: bioperl-l at portal.open-bio.org; Amit Indap
>Subject: Re: [Bioperl-l] extracting CDS portion of RefSeqs
>On Dec 15, 2005, at 11:06 AM, Cook, Malcolm wrote:
>> Regarding performance, I've never tried it, but you might look at
>> http://doc.bioperl.org/bioperl-live/Bio/Seq/SeqBuilder.html, which 
>> shows
>> you how to tell SeqIO that you only need to read sequence 
>and features.
>BTW right now only genbank.pm supports the SeqBuilder interface. If 
>anyone of those people who posted recently that they'd like to 
>volunteer read this, this would be a nice opportunity to take a fully 
>working implementation as an example and transfer it to other 
>applicable parsers, e.g. embl and swiss.
>	-hilmar
>Hilmar Lapp                            email: lapp at gnf.org
>GNF, San Diego, Ca. 92121              phone: +1-858-812-1757

More information about the Bioperl-l mailing list