[Bioperl-l] Degenerate primer calculation
Samantha.Thompson at greenbiologics.com
Tue Jan 13 04:22:15 EST 2009
From: Sara Kalla [mailto:skalla at rice.edu]
Sent: 12 January 2009 20:10
To: Samantha Thompson
Cc: bioperl-l List
Subject: Re: [Bioperl-l] Degenerate primer calculation
Samantha Thompson wrote:
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: 08 December 2008 16:41
> To: Samantha Thompson
> Cc: bioperl-l List
> Subject: Re: [Bioperl-l] Degenerate primer calculation
> On Dec 8, 2008, at 9:59 AM, Samantha Thompson wrote:
>> I also have another similar sequence analysis/primer problem.
>> What I'd like to do is produce degenerate primers from amino acid
>> What I did initially was take the codon usage table and rewrite it
>> in a
>> hash in perl in the form of degenerate codon usage e.g Lysine/K
>> would be
>> AAR, its reverse complement would be YTT. So my form then takes an
>> acid sequence (derived as a consensus from multiple the alignment of
>> homologous proteins) and converts them into degenerate codons and
>> that degenerate primer (actually several primers synthesised with
>> different bases pooled together), in order to search for homologues
>> the protein in unsequenced organisms.
>> I would like to improve this by being able to take a consensus
>> more in the form of a Prosite motif (I think thats the right one)
>> as [TS]YW[RKSD] and then develop a degenerate nucleotide sequence
>> corresponding to this.
>> So I'm wondering if bioperl contains anything like this (both prosite
>> motif format parsing and degenerate code from multiple alignments or
>> such a motif), or if I need to write this myself (which I want to if
>> doesn't exist already).
>> Thanks again,
> Bio::Tools::CodonTable reverse translates, but I don't think it
> accepts patterns. Maybe a pipeline including Bio::Tools::SeqPattern?
> Might be an interesting programming challenge if it isn't already set
> up for that.
> I'm trying to have a go at solving this problem and I'm looking at
> Bio::Tools::SeqPattern. What I would like to be able to obtain from a
> motif is a list of all the sequences that that sequence could
> to. E.g IKL[GP]NM could be IKLGNM or IKLPNM ... so I take both of
> sequences and turn them into degenerate codons for each amino acid.
> complicated part (I thought) here is creating a degenerate codon that
> corresponds to either G or P. The way I will do this is by producing
> each of the 3 degenerate bases and creating a new codon by creating
> of the 3 degenerate bases separately based on a 2D matrix which
> the result of 'crossing' each of the nucleotide bases of the
> code with each other. So when you cross the codon for G (GGN) with the
> codon for P (CCN) you get a codon that contains the degeneracy of both
> (SSN). So then you have a degenerate nucleotide sequence for your
> peptide motif.
> I have written this part already but I am wondering about the expand
> function of Bio::Tools::SeqPattern . I'm not quite sure what it means
> the expanded sequence (if there is just one?) that it returns. I'm
> trying to get every possible permutation of the motif is there any
> function that does this or will I have to write one to parse it
> This would be great, but what would make things even better would be
> I could take multiple sequence alignments and produce patterns/motifs
> from them. Is there a part of BioPerl that does something like this?
Correct me if I'm wrong (or if it's not relevant)... If you use the
example above with G (GGN) and P (CCN) and combine to give SSN, wouldn't
you also get everything that had an A (GCN) or a R (CGN) at that
Yes you would, G and P are a bit of a bad example that I randomly
suggested, you might be more likely to be looking for something like a
change in hydrophobic residue like V or A in which case your overall
degenerate codon would be GYN. It's generally probably more effective
when you are just looking for third base wobble, or other very similar
codons. When you cross degenerate codons they do tend to approach
maximum degeneracy (NNN) pretty quickly, so it's about picking the right
amino acids from your consensus/pattern.
More information about the Bioperl-l