[Bioperl-l] DBSOURCE parsing

Chris Fields cjfields at uiuc.edu
Mon Nov 27 16:47:12 EST 2006


I am working on stockholm and GenPept format parsing, both which have  
DBLink objects.  I have a couple of questions.  First, (not a huge  
issue really, more like a curiosity), is it possible to pass a  
callback to Annotation objects for the overloaded operators?  I'm  
just thinking of situations where the data is displayed differently  
in other formats (like Stockholm).

Also, would it be feasible to have DBLink objects also contain  
annotations (comments, other DBLink objects, etc) for more complex  
data?  In particular this regards GenPept stuff, like the following  

DBSOURCE    swissprot: locus BRCA1_HUMAN, accession P38398;
             class: standard.
             created: Oct 1, 1994.
             sequence updated: Feb 1, 1995.
             annotation updated: Nov 14, 2006.
             xrefs: U14680.1, AAA73985.1, L78833.1, AAC37594.1,  
             AAP12647.1, A58881, 1JM7A, 1JNXX, 1N5OX, 1OQAA, 1T15A,  
             1T2UA, 1T2VA, 1T2VB, 1T2VC, 1T2VD, 1T2VE, 1Y98A
             xrefs (non-sequence databases): UniGene:Hs.194143,  
             TRANSFAC:T04074, Ensembl:ENSG00000012048, KEGG:hsa:672,  
             MIM:113705, MIM:114480, Reactome:P38398,  
             GO:0031436, GO:0008274, GO:0005634, GO:0000151, GO:0050681,
             GO:0003677, GO:0019899, GO:0003713, GO:0015631, GO:0008270,
             GO:0030521, GO:0007059, GO:0006978, GO:0008630, GO:0042759,
             GO:0046600, GO:0016481, GO:0045739, GO:0031398, GO:0045893,
             GO:0016567, GO:0042981, GO:0042127, GO:0006357, GO:0006359,
             InterPro:IPR011364, InterPro:IPR001357, InterPro:IPR002378,
             InterPro:IPR001841, PANTHER:PTHR13763, Pfam:PF00533,  
             PIRSF:PIRSF001734, PRINTS:PR00493, SMART:SM00292,  
             PROSITE:PS50172, PROSITE:PS00518, PROSITE:PS50089
DBSOURCE    pdb: molecule 1T2U, chain 65, release Apr 22, 2004;
             deposition: Apr 22, 2004;
             class: Antitumor Protein;
             source: Mol_id: 1; Organism_scientific: Homo Sapiens;
             Organism_common: Human; Gene: Brca1; Expression_system:  
             Coli; Expression_system_common: Bacteria;
             Exp. method: X-Ray Diffraction.

DBSOURCE    pir: locus I49350;

             summary: #length 1812 #molecular-weight 198788 #checksum  
             genetic: #gene Brca1
             superfamily: transcriptional regulator, BRCA1 type; RING  
             PIR dates: 02-Jul-1996 #sequence_revision 02-Jul-1996  
DBSOURCE    prf: locus 2202221A;

             state: hepatoma/colonic tumor;
             taxonomy: Mammalia.

My thought is, the first line would be the main DBLink object data,  
with all subsequent lines as annotation objects (comments, DBLinks,  
etc) in an annotation collection contained within the main DBLink  
object.  I don't think there would be any danger of circular  
references if handled correctly.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign

More information about the Bioperl-l mailing list