[Bioperl-l] Genbank parsing using Bioperl

Chris Fields cjfields at uiuc.edu
Fri Apr 21 11:26:53 EDT 2006

I'm adding my 2c since I've got a bit of time on my hands.  I'll add that I
found most of these answers by looking through the mail list archives (now
searchable through Gmane) and the BioPerl wiki.

I believe Sean pointed out the HOWTO on the BioPerl wiki: 



In theory, you should be able to retrieve from the CDS feature which gene
feature or transcript each coding feature belongs to, and normally vice
versa.  I may be wrong (I work with bacterial genome sequences mainly), but
I believe this is completely dependent on how well the features are
annotated (which can vary greatly between different sequencing centers) so
can be a bit tricky depending on the source of the GenBank file.  I would,
instead, try a database that's well-curated and has a consistent interface
across different genome projects.  In other words, something like what Sean
suggested, like Ensembl:  


Use can use the Ensembl Perl API to retrieve data from Ensembl databases:


You could also have a look at Entrez Gene; Brian's working on modules (in
CVS) for retrieving and parsing Entrez Gene's output:


You'll need the Bio::ASN1 parser for Brian's modules:


Both Ensembl and Entrez Gene are constantly updated for transcript/protein
information and are likely what you are looking for.


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Prabu R
> Sent: Friday, April 21, 2006 8:25 AM
> To: bioperl-l at lists.open-bio.org; Sean Davis
> Subject: Re: [Bioperl-l] Genbank parsing using Bioperl
> Dear All,
> I feel sorry for making a small mistake in my earlier mail
> I am not actually using Genbank releases, But Refseq Genome build gbk
> files
> of NCBI (ftp.ncbi.nih.gov/genomes/)
> Those files are genbank formatted and contains Refseq IDs.
> Kindly help.
> R. Prabu
> ----------------------------
> Dear all!
> I am a novice bioperl user, trying to parse Genbank files with Bioperl
> modules to get some specific features and details.
> Anyone please tell me, whether we can retrive a Gene, its Transcript ID
> and
> its Protein ID from the Genbank file.
> I mainly need to extract with one to one relationship between TranscriptID
> and Protein ID.
> I was trying this. I was able to take these details if the gene is not
> alternatively spliced.
> If a gene contains multiple mRNA/CDS feature, I am not able to build the
> relationship between Transcript and its Protein.
> Kindly help me to find out whether this is possible in Bioperl.
> Thanks in advance,
> R. Prabu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

More information about the Bioperl-l mailing list