[Bioperl-l] IUPAC support for DNA alignment
hlapp at gmx.net
Fri Jun 27 08:47:12 EDT 2008
On Jun 27, 2008, at 6:02 AM, Alexie Papanicolaou wrote:
> I'm the user who asked for it. I don't know of any conventions but
> perhaps people can help on this?
> I'm not an expert at all but here is my opinion:
> If you don't know the codon position (or even if it is coding) then
> you can't estimate the codon degeneracy. If you don't know the
> frequency of the bases representated in the degenerate site then you
> can't model it either on the DNA level. So any solution will be ad-
> Regarding 2 base degenerate positions: My suggestion is that in a
> situation of alignment between, say a polymorphic and non
> polymorphic population for that site, and the user is interested in
> the distance between the populations, it would make sense to have
> the score to the full match.
> Regarding 3 bases: I don't really know (see N below) but I 'd go for
> a full match again, assuming the user build the consensus.
are you suggesting that a determined and a degenerate site aligned
pairwise should score as much as two determined sites?
My (possibly naive) default would be to average over all
possibilities, each weighted by base frequency (if base frequencies
are assumed unequal or independent), thus integrating out the
uncertainty. (For standard matrices, I think this would also result in
N receiving zero score.)
In the end though, maybe there should be an option for a user to just
provide a substitution matrix?
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
More information about the Bioperl-l