[Bioperl-l] Bio::PopGen modules performance
bli1 at bcm.tmc.edu
Fri Nov 4 14:18:05 EST 2005
I used Bio::PopGen modules to calculate various statistics such as
Tajima's D, Pi and so on. For single data, the performance is fine.
But to get a sense of significance, I simulated the data using
Hudson's "ms" program to generate 10000 simulated populations. When I
used Bio::PopGen modules on the 10000 samples, it takes long time
(finished 600 samples in about 10 hours, population size about 200,
segregating size about 500). If I have a set of data, say 100, for
each data I need 10000 simulated populations, I do not think it is
doable. I am wondering if it makes sense for these modules or I can
increase the performance by optimization of my code. I think 10000
simulations are typical for population genetics analysis. Does any
body have experiences with this issue and can anyone give me any
suggestions about the performance?
Thanks a lot!
More information about the Bioperl-l