[Bioperl-l] Having problems with parsing SwissProt Records

Anand Venkatraman bioperlanand at yahoo.com
Wed Oct 27 00:44:47 EDT 2004


I am using Bioperl to parse SwissProt Records.

The bioperl version is 1.4.

I am having 2 problems :

Problem 1: I am unable to get all the accession
numbers   from the line starting with AC on the
SwissProt Record. i.e.,in  some SwissProt records
there are multiple accession numbers  whereas in some
there is only 1 Accession Number. My code (see below)
is getting only  the 1st accession number it

Problem 2:  I am also trying to get the associated
EMBL and GO cross-references fro a given Swissprot
entry. The problem I am having is that 
[a]: I am only getting the Nucleotide Id and Not the
Protein Id from the EMBL tag and 
[b]: In some cases, I am unable to get the GO ids. For
example, from the code below, I am only getting the GO
id for some records, and missing it for some. Also, if
a particular record has 3 or 4 lines of GO, the code
just captures the 1st occurence of the GO Id(if and
when it does so).

This is the code 
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;

my $sp_file = shift @ARGV or die$!;
my $seqio_object = Bio::SeqIO->new(-file => $sp_file,
-format => "swiss");

while (my  $seq_object = $seqio_object->next_seq) {
    if ($seq_object->species->binomial =~ m/Homo
sapiens/) {
        print "Accession:
",$seq_object->accession_number(), "\t";
        my $annotation = $seq_object->annotation();

        foreach my $dblink (
$annotation->get_all_Annotations('dblink') ) {

            if ( ( $dblink->database eq "EMBL" ) || (
$dblink->database eq "GO" )  ) {
                print "\t",$dblink->database, ":",
$dblink->primary_id, "\t";
    print "\n";



Any suggestions, 

Thanks in advance for the help.


Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.

More information about the Bioperl-l mailing list