[Bioperl-l] How to generate negative pair for motif binding sites
in Perl?
Alex Zhang
mayagao1999 at yahoo.com
Thu Aug 25 22:53:05 EDT 2005
Hi, all!
I have a problem in using Perl to
make 100 negative pair for motif binding
sites. Would anybody give me some suggestions?
Thank you very much ahead of time.
Alex
The description of the problem:
To generate negative dependent pair binding sites for
motif
1. We can get 16 combinations of any 2 nucleotides.
They are:
AA, AT, AC, AG,
TT, TC, TG, TA,
CC, CT, CG, CA,
GG, GC, GT and GA
For example, if we say pair ¡°AA¡± is a positive
dependent pair, which means that ¡°A¡± always comes
with
another ¡°A¡± across many sequences with probability
x%.
In other words, it looks like:
¡¡¡¡¡¡¡
¡¡AA¡¡¡.
¡¡AA¡¡¡.
¡¡AA¡¡¡.
¡¡AA¡¡¡.
¡¡AA¡¡¡.
¡¡AA¡¡¡.
¡¡¡¡¡¡..
In contrast to positive pair, the negative pair ¡°AG¡±
looks like in some sequences:
¡¡¡¡¡¡...
¡¡A¡¡¡...
¡¡A¡¡¡..
¡¡A¡¡¡...
¡¡..G¡¡¡.
¡¡..G¡¡¡.
¡¡..G¡¡¡.
¡¡¡¡¡¡...
Which means that ¡°A¡± is less likely to be
with ¡°G¡± across these sequences than other
nucleotides
G, T, C. But if we count the frequency of each
nucleotide along the column, we can find that the
¡°A¡±
and ¡°G¡± have the highest frequencies in its columns.
By generating 4 negative pairs, we can end up with
motif binding sites of length 8. Finally we are going
to make 100 binding sites.
2. (1) Randomly pick 4 pairs from the 16 combinations
which will be used as ¡°negative pairs¡± in the
sequences. For example, we get pairs AG, CT, CT, GG.
(2) Suppose the probability for each negative pair
is 70%. In the 100 binding sites, we let the all the
1st nucleotides be A with probability 70%. In other
words, there are 70 As in the 100 binding sites on the
1st positions.
If 1st position is A, then 2nd position will be G with
probability 57% and A or C or T with probability
(1-0.57)/3;
If 1st position is not A, then let 2nd position be G
automatically;
(3) Repeat this for other three negative pairs.
3. Generally speaking, we have negative pair XY.
(a) let 1st nucleotides in 100 sites be X with
probability 70% and other with probability 10%
(b) if 1st nucleotide = X, then let 2nd nucleotide in
100 sites be Y with probability 57% and other with
probability (1-57%)/3;
(c) Else, let 2nd nucleotide in 100 sites be Y
automatically;
(d) Repeat (a) (b) (c) for other three pairs.
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the Bioperl-l
mailing list