[Bioperl-l] Polyproteins, ribo slippage, and mat_peptide in viruses?

Peter biopython at maubp.freeserve.co.uk
Tue Oct 27 16:17:19 EDT 2009

On Tue, Oct 27, 2009 at 8:07 PM, Chris Larsen <clarsen at vecna.com> wrote:
> Peter,
> This is a good strategy when the gi is given. However I failed to mention
> that we are finding the example I gave is unusual (15%?)---most virus
> 'mature peptides' we will apply this analysis to do not in fact have a gi
> number or unique identifier associated with them. There are thousands of
> dengue virus files to be processed to give mature proteins.
> Should have mentioned this...Hence the problem--we cant look it up because
> only the parent polyprotein has a gi. Theres nothing to look up /by/ in most
> cases. So we still have to build a set of proteins that are cleaved out of
> every polyprotein, by local and high throughput methods, by building it out
> of the available information (sadly, kind of a run around-- it should be in
> the genbank entry).
> Chris

Ah. That's a shame. I did just take a few minutes to try out the
EFetch idea (using Biopython) and it does work beautifully for
this "nice" example virus which the NCBI have annotated.

I also note that in the example given, all the mature peptides
have nice and simple locations (in terms of their co-ordindates
for the nucleotides), no ribosomal slippages etc. This means
grabbing the relevant bits of the genome and translating it is
also pretty easy (option 2 in your original email).

Have you got a more typical entry you can point us at?

If there is nothing publicly available, I wouldn't mind you
emailing me one or two to look at off list (and if don't mind,
they might make good examples for Bio* project unit tests
or examples).


More information about the Bioperl-l mailing list