[Bioperl-l] IUPAC support for DNA alignment
apapanicolaou at ice.mpg.de
Wed Jul 2 16:11:31 EDT 2008
I agree if it is not too much trouble for you.
I think X (if users put them in DNA sequences) they will be masking
characters (I 'm foolish enough to do it) so perhaps X =0? Does anyone
else use Xs?
I dunno about Ns too. They could be unknown characters (in which case
full score version could be 0 as well) or really mean all four
nucleotides are equally likely. Is it too much trouble to allow users to
set X and N manually (since it is the same whether they align with an
Yee Man Chan wrote:
> Hi guys
> What about providing two switches; one for full score and one for
> probabilistic score?
> Assume match is +3 and mismatch -1
> Full score version:
> 1) T - U = +3 (I assume U is the same as T for alignment purpose, right?)
> 2) A - W = +3
> 3) A - D = +3
> 4) A - N = +3
> 5) A - X = -1 (not so sure about this one)
> Probabilistic score version:
> 1) T - U = +3
> 2) A - W = +3/2-1/2 = +1
> 3) A - D = +3/3-1*2/3 = +1/3
> 4) A - N = +3/4-1*3/4 = 0
> 5) A - X = -1
> What do you think?
> Yee Man
> On Fri, 27 Jun 2008 aaron.j.mackey at gsk.com wrote:
>> You could replicate what they do here with EST_GENOME (re-engineered to
>> accept ambiguity codes):
>> But I think the answer is user-dependent -- some might want the "full
>> score" (as in the above case), others might want the "(probabilistically)
>> averaged score", etc. So, let the scoring matrix be subclass-able (or
>> mix-able), so that users can specify the exact desired behavior via a
>> handful of predefined (and useful) behaviors.
"You can't find a hermit to teach you herming, because of course that rather spoils the whole thing."
-- (Terry Pratchett, Small Gods)
Department of Entomology,
Max Planck Institute for Chemical Ecology,
D-07745 Jena, Germany.
More information about the Bioperl-l