[Bioperl-l] PopGen modules

Bingshan Li bli1 at bcm.tmc.edu
Thu Nov 10 01:15:43 EST 2005

Hi folks,

I recently started to play with PopGen modules but am confused by the 
difference between "number of individuals" and "sample size". My 
understanding is that sample size is the number of haploids or 
chromosomes, and number of individuals is the number of diploids. For 
example, 100 humans are genotyped, then sample size should be 200 and 
number of individuals is 100. Am I right? I could be completely wrong 
but assume I am right for now.

I constructed a population object (named $pop) using prettybase format. Then

 $stats = new Bio::PopGen::Statistics();
 $number_individuals = $pop->get_number_individuals();
 $seg_sites = $stats->segregating_sites_count($pop);
 $theta1 = $stats->theta($pop);
 $theta2 = $stats->theta($number_individuals, $seg_sites);
 $theta3 = $stats->theta($number_individuals*2, $seg_sites);

In the above code, $theta1 == $theta2 != $theta3, and I think $theta3 
should be the correct answer.

I used "ms" program of Hudson to simulate 200 chromosomes and I used 200 
as sample size which gives correct answers (double confirmed with other 

Please let me know if I am too naive about this.

More information about the Bioperl-l mailing list