[Bioperl-l] Polyproteins, ribo slippage, and mat_peptide in viruses?
clarsen at vecna.com
Tue Oct 27 16:07:55 EDT 2009
This is a good strategy when the gi is given. However I failed to
mention that we are finding the example I gave is unusual (15%?)---
most virus 'mature peptides' we will apply this analysis to do not in
fact have a gi number or unique identifier associated with them. There
are thousands of dengue virus files to be processed to give mature
Should have mentioned this...Hence the problem--we cant look it up
because only the parent polyprotein has a gi. Theres nothing to look
up /by/ in most cases. So we still have to build a set of proteins
that are cleaved out of every polyprotein, by local and high
throughput methods, by building it out of the available information
(sadly, kind of a run around-- it should be in the genbank entry).
On Oct 27, 2009, at 3:54 PM, Peter wrote:
> On Tue, Oct 27, 2009 at 7:15 PM, Chris Larsen <clarsen at vecna.com>
>> Hello Peter!
>> For instance, check this:
>> No mat_peptide sequence is given. We want that...
> Looking at the GenBank file displayed, the mat_peptide features
> (mature peptides) do not include a translation entry (like the parent
> CDS feature does). However, they do have protein IDs - which are
> actually links in the HTML version.
> This leads me to suggest a third option as an alternative to the two
> ideas you outlined. You could parse the GenBank file(s), and for each
> mat_peptide feature look up the protein ID via Entrez EFetch (e.g. as
> a FASTA file, or a GenPept file). If you only have a relatively small
> number of viruses and proteins this is probably going to be pretty
> easy. At least, I could do it in Biopython and I am sure the same is
> true with the BioPerl GenBank parser and their EFetch interface.
> However, for a large dataset, handling it all locally (your options
> (1) and (2) sound best).
Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
6404 Ivy Lane #500
Greenbelt, MD 20770
Phone: (240) 965-4525
Fax: (240) 547-6133
More information about the Bioperl-l