[Bioperl-l] IUPAC support for DNA alignment
apapanicolaou at ice.mpg.de
Wed Jul 2 16:20:51 EDT 2008
>> Full score version:
>> 1) T - U = +3 (I assume U is the same as T for alignment purpose,
yea... unless you are aligning RNAs and thus have wobble pairing
<http://en.wikipedia.org/wiki/Wobble_base_pair>UG :-) let's start
> Note that there are also M, R, V, and H, and their complements (which
> by definition would not match your example of 'A').
oh, I assumed Yee Man was just giving us a trimmed down example. Hilmar
is very right.
> Note also that the above implicitly assumes 50% GC content or equal
> likelihood of the code-constituent bases, which in reality for most
> coding sequences is not true.Also, if you have a known polymorphism at
> the site, for 3-letter ambiguities not all 3 may be equally likely.
> For example, if you have letter D for a [A/G] SNP, one may not want to
> give 1/3 of weight to possibility T.
> I would at least allow for the possibility to assign expected base
> frequencies and weight the ambiguous possibilities by those.
Ehm, wouldn't we now be walking in the twilight of modeling it? That
might be a bit harder work for Yee Man, which I was trying to avoid.
Perhaps for starters Yee Man can just document how the user can provide
their own substitution matrix? ((s)he may have done already, sorry dunno
if Yee is a boy or girl; like alexie too :-))
"You can't find a hermit to teach you herming, because of course that rather spoils the whole thing."
-- (Terry Pratchett, Small Gods)
Department of Entomology,
Max Planck Institute for Chemical Ecology,
D-07745 Jena, Germany.
More information about the Bioperl-l