[Bioperl-l] Is bio-perl right?

Ewan Birney birney@ebi.ac.uk
Fri, 29 Sep 2000 08:21:03 +0100 (GMT)

On 28 Sep 2000, John S. J. Anderson wrote:

> Greetings --
> I'm trying to decide if bio-perl provides the tools I need to do
> something, or if I'm better off rolling my own custom solution.
> Basically, I want to retrieve a large number (hundreds) of sequence
> files from Genbank (via Entrez, so there are a number of potential
> formats) and then parse each file according to the header
> information. I need to split the sequence in each file into coding and
> non-coding, and I would like to map each segment back onto a genome
> (probably by tracking location relative to ORF starts and stops).
> I know there's been some traffic on the list recently about the
> difficulty of sufficiently generalizing the GenBank format via a
> bio-perl parser, but I haven't played around with the code at all. (To
> cross with another thread, the documentation (and lack of time) has
> been the biggest barrier to my picking up bio-perl.)

I think bioperl should get you 75% of the way there:

   Picking up hundreas of sequence files - ok

   Parsing GenBank (latest 0.6.2 candidate release is the one to go for)

   Then the basic loop is going to go something like

   # script for looping over genbank entries, printing out
   # start-end of CDS exons

   use Bio::SeqIO;
   use Bio::Seq; # don't really need this, because Bio::SeqIO uses it

   $seqio = Bio::SeqIO->new('-format' => 'GenBank', -fh =>
   while( $seq = $seqio->next_seq ) {
      foreach $feat ( $seq->top_SeqFeatures ) {
          if( $feat->primary_tag eq 'CDS_span' ) {
             # features is a CDS line with a join statement
             foreach $sub ( $feat->sub_SeqFeature ) {
                print "start ",$sub->start," ",$sub->end,"\n";
                # do what you like
          } elsif ( $feat->primary_tag eq 'CDS' ) {
             # feature is a CDS line without a join statement
             # yes - this part is potentially badly designed in bioperl!
             print "start ",$feature->start," end ",$feature->end,"\n";


> So, is bio-perl the Right Thing for this job, or should I look into
> developing my own stuff?

I would hope that Bioperl is "the right thing".

Give it a whirl and i'd be interested to hear about your experiences. Feel
free to edit the Wiki docs directly about your experiences as well at


(choose perhaps BioperlGettingStarted and then just click "edit page" and
you are away).

I will add this mini-script to the wiki docs myself... ;)

> Thanks for any advice,
> john.
> -- 
> ------------------------------------------------------------------------
> John S J Anderson                                           NCBI,NLM,NIH
> IRTA Fellow                                              Bldg 38A, B2N14  
> janderso@ncbi.nlm.nih.gov                                   301.594.6087
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420