[Bioperl-l] Starting from scratch, a trivial problem?

Brian Osborne brian_osborne at cognia.com
Fri May 7 22:15:11 EDT 2004


The first part can be accomplished by using one of the Bio::DB* modules.
This is what it would look like using a RefSeq id:

use Bio::DB::RefSeq;
my $db = Bio::DB::RefSeq;
my $seq_obj = $db->get_Seq_by_acc("NC_001133");

See the module documentation or bptutorial for more on this approach.

The second part can be achieved by looking at the coordinates of features.
I'm assuming you're working with a whole chromosome Genbank sequence file,
like NC_001133. In this sort of file each "gene" is contained within a
single feature, and you already know that the coordinates are found in those
join() statements. You could find the end and start and use Perl's substr()
method or Bioperl's subseq() method to get that intergenic region. See the
Feature-Annotation HOWTO for information on handling sequence features.

Once you have this sequence you'll want to create a new Sequence object with
it. The bptutorial can help you out here as well, I think you'll want to
create a basic Sequence object of type Bio::Seq. Once you have this it's
easy to write to a file, see the SeqIO HOWTO.

I couldn't imagine a more perfect script to illustrate Bioperl than this, it
touches on all the fundamentals: retrieving, features and annotations,
making new Sequence objects, then SeqIO. Tell us how it goes.

Brian O.

-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of JAMES IBEN
Sent: Friday, May 07, 2004 2:17 PM
To: bioperl-l at portal.open-bio.org
Cc: npanasik at jhu.edu
Subject: [Bioperl-l] Starting from scratch, a trivial problem?

  Hello.  I'm relatively new to programming in perl to begin with, so I
apologize for the foolish question.
  I would like to write a program to pull genomic sequences and clip out a
specific intergenic region in species where it exists, but I am running into
difficulty trying to incorporate the bioperl modules into my script to
accomplish this. This coupled with a  grasp of only the the ability to
script the most rudimentary programs has left me at a bit of a loss.
  I believe the program should run with this rough outline unless someone
has a more informed opinion:

-Obtain sequence from online databank
-Search annotations for sequential occurrence of gene X and gene Y (or gene
Y and gene X)
-Print to output file Sequence ID and sequence occurring between addresses
of X and Y
-loop to next sequence

  I would be very grateful if someone could please point me in the direction
of perhaps a similar example script or a lower-level resource.  It seems
like it would be a fairly trivial problem.

Thanks for your time,

Bioperl-l mailing list
Bioperl-l at portal.open-bio.org

More information about the Bioperl-l mailing list