[Bioperl-l] IUPAC support for DNA alignment
hlapp at gmx.net
Fri Jun 27 17:31:55 EDT 2008
On Jun 27, 2008, at 1:35 PM, Yee Man Chan wrote:
> Hi guys
> What about providing two switches; one for full score and one for
> probabilistic score?
> Assume match is +3 and mismatch -1
> Full score version:
> 1) T - U = +3 (I assume U is the same as T for alignment purpose,
> 2) A - W = +3
> 3) A - D = +3
> 4) A - N = +3
> 5) A - X = -1 (not so sure about this one)
> Probabilistic score version:
> 1) T - U = +3
> 2) A - W = +3/2-1/2 = +1
> 3) A - D = +3/3-1*2/3 = +1/3
> 4) A - N = +3/4-1*3/4 = 0
> 5) A - X = -1
Note that there are also M, R, V, and H, and their complements (which
by definition would not match your example of 'A').
Note also that the above implicitly assumes 50% GC content or equal
likelihood of the code-constituent bases, which in reality for most
coding sequences is not true.
Also, if you have a known polymorphism at the site, for 3-letter
ambiguities not all 3 may be equally likely. For example, if you have
letter D for a [A/G] SNP, one may not want to give 1/3 of weight to
I would at least allow for the possibility to assign expected base
frequencies and weight the ambiguous possibilities by those.
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
More information about the Bioperl-l