[Bioperl-l] Re: translation using Bioperl

Heikki Lehvaslaiho heikki@ebi.ac.uk
Wed, 26 Jul 2000 10:18:44 +0100

Heikki Lehvaslaiho wrote:
> Dear Jonathan,
> Thanks for persisting. I learned something new. (I had to check it
> with a
> couple of experts before I believed!)
> To summarize: The amino acid that starts a polypeptide in translation
> process is always methionin, even if an alternative initiator codon is
> used. That means that even when using the standard codon table, the
> correct translation of the sequence 'AGT TGA ...' is 'MV...'.
> I've fixed this in bioperl live and 06 CVS branches.
> If you want to catch those cases where translation is NOT started by a
> valid initiator codon, you can now set ->verbose to true in your
> PrimarySeq object and it will warn you.
>         -Heikki

Looks like I opened a real can of worms. Thanks for input everyone.
I'll try summarize the opinions and comments so far.

(Jonathan, hold on. This is a bit more complicated issue than I

Ewan :

> Can we watch the ->verbose attribute. This will have to be in on
> Bio::PrimarySeqI --- and i would prefer we did not burden other
> implementations with having to set this.

I will remove the ->verbose call. 

This is a separate discussion but I'd like to use a global verbose to
determine how much warnings classes produce. The code is there in
Bio::Root modules - disperced in various modules which is not good.

Will Fischer:

> Errrr... How's that again?  I'm guessing you mean 'AGT GTA', but even
> so, _only_ valid start codons encode the initial methionine.  A start
> codon _is_ a start codon because there is a special tRNA (loaded with
> a modified methionine) that matches it.

Sorry, I copied a wrong string. In the test file (t/PrimarySeq.t), I
have 'TTGGTGGCGTCAAC' which translates to 'MVAST'.

Really, although no one knows quite what happens, if  you have an
alternative start codon that is recognized by ribosomes, a methionin
is put in. There must be some additional signals in the mRNA cause a
codon mismatch, but this is plain guessing.

> Correct behavior (IMHO) would be to check whether the first codon
> matches a valid start (in the genetic code being used): if yes, 
> put in Met; otherwise, put in the default amino-acid and
> (perhaps) complain.  

That is what I did, except complained.

Andrew Dalke:
> Short answer, I agree except that it's impossible for bioperl, as
> a library, to do this.  It is the responsibility of users of library
> to decide what to do when the first codon isn't a start codon.
> The difficulty is knowing the "perhaps" part.  Detail below.
> ...
> For relevance to this topic, I agree that it's the relevant behaviour,
> but not of the bioperl or biopython tools.  If you consider those codes
> as a library, it's the responsibility of the person using the library to
> make the call on what to do.  The library should merrily translate away
> and the *calling* code detect "hmm, this doesn't start off with an M, I
> think I'll complain."

I think I have to disagree here. In my opinion, these libraries have a
two equally important roles:

1. To give default 'computational' behaviour
   (e.g blindly translate any nucleotide sequence).

2. Have enough biological sense to give results identical to
   sequence repositories (EMBL, GenBank, DDBJ).
   (e.g. translate a valid CDS correctly)

(3. Third role is to do do everything better than sequence

To achieve the level 2 for translations, this is what I suggest to

Add one more optional, boolean argument, $fullCDS, to method
If it is true:

1. Check and replace the initial amino acid.

2. Remove the trailing stop character
   Note that this is the default behaviour now. In my opinion ,
   the trailing stops should be left there by default. 

3. Warn if a) first codon is not a valid initiator
           b) last codon is not a stop

Comments anyone?