[Bioperl-l] three letter codes for amino acids?

Hilmar Lapp hlapp@gmx.net
Wed, 10 Jan 2001 11:35:09 -0800

Heikki Lehvaslaiho wrote:
> I noticed that it is not possible to use three letter codes for amino
> acids in any bioperl sequence objects. I think should be possible at
> least to output in three letter code. Mapping three letter code back
> to one letter code is not too hard, either, but is it a good idea to
> have?
> I propose to put method 'seq3' into PrimarySeq.pm which is called from
> Seq.pm, too.
> =head2 seq3
>  Title   : seq3
>  Usage   : $string = $obj->seq3()
>  Function: Read only method that returns the amino acid sequence
>            as a string of three letter codes. moltype has to be
>            'protein'. Output follows the IUPAC standard plus
>            'Ter' for terminator.
>  Returns : A scalar
>  Args    : character used for stop, optional, defaults to '*'
>            character used for unknown, optional, defaults to 'X'
> =cut
> Any opinions?

Considering sequence atoms as symbols seems the most natural
concept to me. Having single letters representing each symbol
makes symbol arrays and strings more or less equivalent in Perl.
This might not hold for multi-letter representations, so in the
first place I'd expect an array to be returned. However, this is
inconsistent with $seq->seq(), and reportedly inefficient due to
Perl's array implementation.

I know you could still split at every 3rd letter as a simple way
to get an array. I'd nevertheless accept a third optional
parameter denoting the 'join' character, with a default of ''.

Just my few thoughts.

Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757