[Bioperl-l] Bio::DB::RefSeq and NC_007092

armendarez77 at hotmail.com armendarez77 at hotmail.com
Tue Mar 2 16:06:17 EST 2010


I am writing a script to remotely access annotation files and parse information using Bio::DB::RefSeq and Bio::DB::Genbank.  I was testing it with random RefSeq accession numbers (NC_######) when something odd happened.  When I used the accession number 'NC_007092', the script seemed to freeze.  After some time, 'Out of Memory' was printed to the terminal.  

When I investigated the annotation file associated with NC_007092, a MapViewer page opened.  It turns out that NC_007092 is a genome shotgun sequence, but it does not start with 'NZ' as I though all shotgun sequences did.  

Is this a random event that I don't have to worry much about or is there a way to pre-screen accession numbers to ensure they are associated with complete genome RefSeq files?  

I've included my script in case there is something I missed that could have prevented this.

Thank you,



use strict;
use Bio::Perl;
use Getopt::Long;
use IO::Handle;

my $accessionNumber;

    options for $0
    accessionNumber    -a    accession number

my $description = annotation_info($accessionNumber);

print "$description\n";

sub annotation_info{
    my $seqObj;
    my $accNum = shift(@_);
    my $rs = Bio::DB::RefSeq->new();
    my $gb = Bio::DB::GenBank->new();

    if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore in their accession number
        $seqObj = $rs->get_Seq_by_id($accNum);
    elsif($accNum !~ /_/){ #GenBank annotation
        $seqObj = $gb->get_Seq_by_id($accNum);
    return $seqObj->desc();

Hotmail: Trusted email with Microsoft’s powerful SPAM protection.

More information about the Bioperl-l mailing list