[Bioperl-l] Not picking up Dbxrefs EMBL records

Hilmar Lapp hlapp at gnf.org
Tue Aug 9 12:40:12 EDT 2005

This is a RefSeq accession. In GenBank format the db_xrefs you see are 
notes for features in the feature table, not top-level db_xrefs (i.e., 
for the entry itself), although semantically of course that's what they 
are. Bioperl (i.e., the Bioperl SeqIO parser for genbank format) 
doesn't interpret that however, and leaves them where they are, namely 
as annotation for the features. The single exception to that is that 
the parser actually does look for the taxon ID in the feature table and 
sets the $seq->species->ncbi_taxon_id property accordingly.

GenBank format doesn't have top-level db_xrefs at all. You will need 
EMBL format for that. As I said before, the PUBMED line is not a 
db_xref for the entry either but the db_xref for the reference entry, 
so you will need to retrieve the references 
($seq->annotation->get_Annotations('reference')) and use its 
$ref->pubmed or $ref->medline properties.

BTW this will still hold true if you first load the sequences into 
bioperl-db and then retrieve them; there isn't really any magic being 
applied that would transform db_xrefs into a common unified picture.

I use a SequenceProcessor (see Bio::Seq::BaseSeqProcessor and the 
--pipeline option to load_seqdatabase.pl) to promote db_xref tags found 
in the feature table of genbank records to Bio::Annotation::DBLink 
annotation on the sequence object. Very easy to implement and you are 
in total control of the annotation structure.


On Aug 9, 2005, at 9:21 AM, SG Edwards wrote:

> Hi,
> My installation does not pick up ANY dbxrefs for gene records e.g. 
> Pubmed,
> MEDLINE(either EMBL or Genbank formats). When I load them into the 
> database
> they go in fine but no dbxref_ids are mapped to the bioentry_id in the
> bioentry_dbxref table. Therefore, nothing appears in the dbxref table 
> either!
> The system works fine for UniProt protein entries into the database. I 
> am
> currently installing BioPerl v 1.5 to see if this resolves the problem.
> An example: NM_214434 from Genbank which has the dbxrefs:
> Pubmed 1503277
> Taxon  9823
> GeneID 404088
> Quoting Hilmar Lapp <hlapp at gnf.org>:
>> Are you referring to references and their PMID? These you would find 
>> in
>> the Reference table, which has a foreign key to dbxref, which would
>> only store the PUBMED or MEDLINE ID (not both at this time). Can you
>> given an example accession that's giving you grief?
>> 	-hilmar
>> On Aug 8, 2005, at 1:17 AM, SG Edwards wrote:
>>> Hi folks,
>>> I have a BioSQL database (PostgreSQL 7.4.3, BioPerl 1.4, bioperl-db
>>> 1.2) set up
>>> containing protein and gene data. However, when I load gene sequence
>>> records
>>> (EMBL or Genbank) using:
>>> perl load_seqdatabase.pl -driver Pg -safe -lookup -dbname milk 
>>> -dbuser
>>> s0460205
>>> -dbpass password -format embl /home/s0460205/file_name.txt
>>> from bioperl-db it does not pick up any dbxrefs i.e. there is no
>>> dbxref_id for
>>> MEDLINE etc.
>>> Has anyone else come across this rpoblem and is ther a fix?
>>> Cheers,
>>> Stephen
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> --
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757

More information about the Bioperl-l mailing list