[Bioperl-l] IUPAC support for DNA alignment

Alexie Papanicolaou apapanicolaou at ice.mpg.de
Wed Jul 2 16:20:51 EDT 2008

>> Full score version:
>> 1) T - U = +3 (I assume U is the same as T for alignment purpose, 
>> right?)
> Right.
yea... unless you are aligning RNAs and thus have wobble pairing 
<http://en.wikipedia.org/wiki/Wobble_base_pair>UG :-)  let's start 
simple though...
> Note that there are also M, R, V, and H, and their complements (which 
> by definition would not match your example of 'A').
oh, I assumed Yee Man was just giving us a trimmed down example. Hilmar 
is very right.
> Note also that the above implicitly assumes 50% GC content or equal 
> likelihood of the code-constituent bases, which in reality for most 
> coding sequences is not true.Also, if you have a known polymorphism at 
> the site, for 3-letter ambiguities not all 3 may be equally likely. 
> For example, if you have letter D for a [A/G] SNP, one may not want to 
> give 1/3 of weight to possibility T.
> I would at least allow for the possibility to assign expected base 
> frequencies and weight the ambiguous possibilities by those.
>     -hilmar
Ehm, wouldn't we now be walking in the twilight of modeling it? That 
might be a bit harder work for Yee Man, which I was trying to avoid. 
Perhaps for starters Yee Man can just document how the user can provide 
their own substitution matrix? ((s)he may have done already, sorry dunno 
if Yee is a boy or girl; like alexie too :-))


