[Bioperl-l] Genbank parsing using Bioperl
cjfields at uiuc.edu
Fri Apr 21 11:26:53 EDT 2006
I'm adding my 2c since I've got a bit of time on my hands. I'll add that I
found most of these answers by looking through the mail list archives (now
searchable through Gmane) and the BioPerl wiki.
I believe Sean pointed out the HOWTO on the BioPerl wiki:
In theory, you should be able to retrieve from the CDS feature which gene
feature or transcript each coding feature belongs to, and normally vice
versa. I may be wrong (I work with bacterial genome sequences mainly), but
I believe this is completely dependent on how well the features are
annotated (which can vary greatly between different sequencing centers) so
can be a bit tricky depending on the source of the GenBank file. I would,
instead, try a database that's well-curated and has a consistent interface
across different genome projects. In other words, something like what Sean
suggested, like Ensembl:
Use can use the Ensembl Perl API to retrieve data from Ensembl databases:
You could also have a look at Entrez Gene; Brian's working on modules (in
CVS) for retrieving and parsing Entrez Gene's output:
You'll need the Bio::ASN1 parser for Brian's modules:
Both Ensembl and Entrez Gene are constantly updated for transcript/protein
information and are likely what you are looking for.
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Prabu R
> Sent: Friday, April 21, 2006 8:25 AM
> To: bioperl-l at lists.open-bio.org; Sean Davis
> Subject: Re: [Bioperl-l] Genbank parsing using Bioperl
> Dear All,
> I feel sorry for making a small mistake in my earlier mail
> I am not actually using Genbank releases, But Refseq Genome build gbk
> of NCBI (ftp.ncbi.nih.gov/genomes/)
> Those files are genbank formatted and contains Refseq IDs.
> Kindly help.
> R. Prabu
> Dear all!
> I am a novice bioperl user, trying to parse Genbank files with Bioperl
> modules to get some specific features and details.
> Anyone please tell me, whether we can retrive a Gene, its Transcript ID
> its Protein ID from the Genbank file.
> I mainly need to extract with one to one relationship between TranscriptID
> and Protein ID.
> I was trying this. I was able to take these details if the gene is not
> alternatively spliced.
> If a gene contains multiple mRNA/CDS feature, I am not able to build the
> relationship between Transcript and its Protein.
> Kindly help me to find out whether this is possible in Bioperl.
> Thanks in advance,
> R. Prabu
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
More information about the Bioperl-l