[Bioperl-l] Fwd: How to extract xref information from seq object that is fetched from GenBank?

Sang Chul Choi goshng at gmail.com
Wed Aug 30 16:43:57 EDT 2006


It turned out that I asked the same question as Mira's Last December,
to which Hilmar had answered
GenBank has the protein-mRNA cross-reference in the feature table,
hence you would need to look into the tag/value pairs of a sequence's
features. DBSOURCE I believe is only present for those entries
originating from UniProt (i.e., not natively from GenBank).

On top of all that, the tags GenBank uses for their entry annotation
are not the ones BioPerl uses to tag its annotation objects - BioPerl
is not an API solely for GenBank. Consult the Bio::SeqIO::genbank POD
for documentation on what goes where in the BioPerl object model.

I totally agree with this point. I have another problem, though. For some
SwissProt protein case, I could use Bio::DB::SwissProt to fetch the protein
sequence, from which I could get 'dblink' annotation that pointed to where
I could get DNA sequence of the protein sequence.

So, there are two choices depending on which type of protein sequence.
Bio::DB::SwissProt for SwissProt protein, and Bio::DB::GenBank for GB protein.
But, the SwissProt protein, P38038, could not be fetched by using
Bio::DB::SwissProt. I got the message:
-------------------- WARNING ---------------------
MSG: id (P38038) does not exist
-------------------- WARNING ---------------------
MSG: acc (P38038) does not exist

Then, I tried to use Bio::DB::GenBank to fetch this protein sequence and
it worked. And, I've been trying to get DBSOURCE information of the protein
that is I think the only way to have information where I could get DNA sequence.

So, I am sort of stuck. And, I'm using Bioperl 1.4. I'm wondering if getting
DBSOURCE information from GenBank file is really hard, or there is a way
to do this.

I think that this might be a basic question and I'm sorry for my lack of
knowledge of BioPerl. I will appreciate your help.

Thank you very much,

Sang Chul

---------- Forwarded message ----------
From: Sang Chul Choi <goshng at gmail.com>
Date: Aug 30, 2006 4:07 PM
Subject: How to extract xref information from seq object that is
fetched from GenBank?
To: bioperl-l <bioperl-l at bioperl.org>


I am trying to fetch protein-coding DNA sequence from the public database.
I used Bio::DB::GenBank to fetch firstly protein sequence using SwissProt ID
or GenBank ID. Then, I am trying to look for DBSOURCE, which points to
where I can fetch the DNA sequence. But, I don't know how to get that
Often, there are many links in 'xrefs' of DBSOURCE. For example,
DBSOURCE    swissprot: locus CYSJ_ECOLI, accession P38038;
            class: standard.
            extra accessions:P14782,Q2MA65,created: Oct 1, 1994.
            sequence updated: Jun 27, 2006.
            annotation updated: Jun 27, 2006.
            xrefs: M23008.1 , AAA23650.1, U29579.1, AAA69274.1, U00096.2,
            AAC75806.1, AP009048.1, BAE76841.1, H65057, 1DDGA, 1DDGB, 1DDIA,

I thought I could use Annotation object like this to have information
but I am starting to think I may be wrong because I could not get that DBSOURCE
information using Annoation object.

use Bio::DB::GenBank;
$gb = new Bio::DB::GenBank;

$seq = $gp->get_Seq_by_acc('P38038');
$ann_coll = $seq->annotation;
for $ann ($ann_coll->get_Annotations) {
   print $ann->tagname, " ", $ann->as_text, "\n";

How can I get this DBSOURCE information?

Thank you,

Sang Chul

Live, Learn, and Love!
E-mail : goshng at empal dot com
            goshng at gmail dot com

Live, Learn, and Love!
E-mail : goshng at empal dot com
            goshng at gmail dot com
Home : +1-919-434-8298
Address : 1528 Macalpine Circle
               Morrisville, NC 27560

More information about the Bioperl-l mailing list