[Bioperl-l] SiteMatrix changes

Sendu Bala bix at sendu.me.uk
Thu Aug 31 11:26:29 EDT 2006

Stefan Kirov wrote:
> Perhaps I do not understand your idea, but it seems to me the changes
> you made to SiteMatrix are wrong. Why did you have to remove the 
> pseudo-counts? The correction can be set to 0 which will disable it 
> ic case this is necessary. Pseudo counts are intended to account for 
> the probabilistic uncertainty.

What has adding the number 1 to some but not all input numbers got to do
with pseudo counts? Can you explain your thinking?

> On the other hand the correction should be disabled by default if 
> instead of raw count frequencies are used for the construction of the
> object (still having 0 is a bad idea).

Why is having 0 a bad idea? It is correct if the user is creating a
simple count-based matrix. I don't think the module should be trying to 
do any kind of analysis, especially given that it has no idea of the 
source of its input data. It must just accept what it is given. If a 
user or other module wants to do pseudo-count correction, they can do it 
themselves in the most appropriate way for their data.

I can't imagine that sometimes adding 1 is /ever/ an appropriate way of
doing it, but please explain if it is.

> Next, the rules you have enforced for the IUPAC do not make sense to 
> me. For example in case the frequency for A is 0.45, G 0.45, C 0.05 
> and T 0.05, according to you rules the result would be N, which makes
> no sense.

Why does that make no sense? IUPAC has no concept of frequencies or have
a cutoff. When there is a chance of all four bases (complete ambiguity),
the IUPAC code is N. If you want it to return 'R' in this case, the
IUPAC method would need to be extended to allow input of a user-defined
threshold defining what frequencies to ignore.

More information about the Bioperl-l mailing list