[Bioperl-l] Help in Programming a substitution routine

manni122 markus.liebscher at gmx.de
Mon Mar 9 11:39:43 EDT 2009

Hi there,
i need a little help in programming. I am still stucking in a problem and
have no idea how to get out a solution. I have a fasta alignment file (with
two aligned sequences) and I want to optimize the genetic code for a better
alignment score. Everything is fine up to the line where I'd like to compare
the codons of each sequence.
If both codons are equal i push them into a new array, if not I'd like to
lookup all possible codons for the respective amino acid and compare all
codons from one amino acid of one site with the corresponding on the other:
lets say in sequence 1 is a "A" (GCC, GCA, GCG,GCT) and in sequence 2 is a
"E" (GAA,GAG). So the best hit should be the combination GCA and GAA or GCG
and GAG. But I don't have clue how to get this working - as you might see in
the code...
I've probably made a lot of mistakes. But I would feel better if someone
could suggest some code...

#the code starts here
use Bio::SeqIO;
use Bio::AlignIO;

my ($i, $j, $seq1, $seq2, $seqName1, $seqName2);

#the complete hash is not shown for the sake of clarity
my(%cod2aa) = (
    'TCA' => 'S',    # Serine
    'TCC' => 'S',    # Serine
    'TCG' => 'S',    # Serine
    'TCT' => 'S',    # Serine
    'TTC' => 'F',    # Phenylalanine
    'TTT' => 'F',    # Phenylalanine
    'TTA' => 'L',    # Leucine

my %aa2cod = reverse %cod2aa;

my $input  = Bio::AlignIO->new(-file => 'align1.fas' , '-format' =>

  while (my $aln = $input->next_aln()) {
  push @nuc_seqs, $aln;
  $seq1 = $aln->get_seq_by_pos(1)->seq();
  $seq2 = $aln->get_seq_by_pos(2)->seq();

#create an array for the sequences
push @sequences, $seq1;
push @sequences, $seq2;

#separate whole sequence into single triplets (subroutine is not shown)
foreach (@sequences) {
push @tmpArray, CreateTripletArray($_);

#count number of triplets in the array
$count = @tmpArray;

#compare pairwise and substitute if possible (assuming the length of both
genes is equal)
foreach $i(0 .. (($count/2)+1)) {

  if ($value1 eq $value2) {
        #push equal triplets in new arrays
		push @newArray1, $value1;
		push @newArray2, $value2;
  else {
        #split codons to compare each base in a triplet
		@zwvalue1 = split(//, $value1);
       	        @zwvalue2 = split(//, $value2);
                #I want to get back all codons for one amino acid
		$aminoacid1 = $cod2aa{$value1};
		$aminoacid2 = $cod2aa{$value2};
                push @codon1, $aa2cod{$aminoacid1};
		push @codon2, $aa2cod{$aminoacid2};
                #then something like go through all bases of both codons a
every position to find the best fitting partners
		if (exists $cod2aa{$value1} and $cod2aa{$value2} ) {
			if (@zwvalue1[0] eq @zwvalue2[0]){
			push @newArray1, @zwvalue1[0];
			push @newArray2, @zwvalue2[0];
View this message in context: http://www.nabble.com/Help-in-Programming-a-substitution-routine-tp22413520p22413520.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

More information about the Bioperl-l mailing list