Gatherer, D. (Derek)
Fri, 14 Jul 2000 09:21:43 +0200
I realise now I should have included the sources for the alphabets:
The Stanfel alphabet was devised by Larry Stanfel:
Stanfel LE (1996) A new approach to clustering the amino acids. J. theor.
Biol. 183, 195-205.
The Dayhoff and Sneath alphabets I took from Stanfel's paper, Figure 1 for
Sneath, and Dayhoff given in text p.197. The original references to these
alphabets are given in Stanfel's ref. list.
The remaining alphabets were taken from:
Karlin S, Ost F and Blaisdell BE (1989) Patterns in DNA and amino acid
sequences and their statistical significance. Chapter 6 of: Mathematical
Methods for DNA Sequences. Waterman MS (ed.) CRC Press, Boca Raton , FL.
Table 4 of that chapter.
In answer to your question about synonymous and non-synonymous, I think that
Dayhoff's alphabet is the closest to this, since according to Stanfel (ref
"A quite different approach to amino acid classification is taken by those
interested in evolutionary differences. Dayhoff et al (1978) is a
well-known example. Given a grouping of proteins into functional families,
one tallies the frequencies with which one amino acid apparently substitutes
for another, and computes similarities or distances as some functions of
these values. The more frequently alpha substitutes for beta, the greater
the similarity. Though Dayhoff et all did not apply any formal clustering
methodologies ... [snipped critical digression on Dayhoff's methods...], the
amino acids were partitioned into groups which showed generally greater
substitutability within themselves than between different groups."
Hope this helps
From: Heikki Lehvaslaiho [mailto:firstname.lastname@example.org]
Sent: 13 July 2000 10:06
Subject: [Bioperl-l] Bio::Tools::OddCodes
I just notices that Derek Gatherer
(D.Gatherer@organon.nhe.akzonobel.nl) have submitted a nice class to
rename amino acid sequences according to some amino acid properties.
See below for synopsis. Thanks Derek!
In a related note: I'd like to ask if anyone on the list might know is
there is commonly agreed definition to yet an other way of amino acid
change classification: Synonymous and non-synonymous are commonly used
terms in population genetic papers to classify coding region mutations
but they are never defined. Most probably their definiton is based on
something similar to Derek's 'functional' alphabet.
Can anyone tell me if synonymous amino acid changes are _identical_ to
Derek's functional change?
Bio::Tools::OddCodes - Object holding alternative alphabet coding for
one protein sequence
Take a sequence object from eg, an inputstream, and creates an object
for the purposes of rewriting that sequence in another alphabet.
These are abbreviated amino acid sequence alphabets, designed to
simplify the statistical aspects of analysing protein sequences,
by reducing the combinatorial explosion of the 20-letter alphabet.
These abbreviated alphabets range in size from 2 to 8.
Creating the OddCodes object, eg:
my $inputstream = Bio::SeqIO->new( -file => "seqfile", -format
my $seqobj = $inputstream->next_seq();
my $oddcode_obj = Bio::Tools::Oddcodes->new($seqobj);
my $seqobj = Bio::PrimarySeq->new(-seq=>'[cut and paste a
here]', -moltype = 'protein', -id = 'test');
my $oddcode_obj = Bio::Tools::OddCodes->new($seqobj);
do the alternative coding, returning the answer as a reference to a
my $output = $oddcode_obj->structural();
my $output = $oddcode_obj->chemical();
my $output = $oddcode_obj->functional();
my $output = $oddcode_obj->charge();
my $output = $oddcode_obj->hydrophobic();
my $output = $oddcode_obj->Dayhoff();
my $output = $oddcode_obj->Sneath();
my $output = $oddcode_obj->Stanfel();
display sequence in new form, eg:
my $new_coding = $$output;
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho email@example.com
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
Bioperl-l mailing list