[Bioperl-guts-l] [BioPerl - Bug #3328] (New) segregating sites calculation fails on gapped sequences

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Fri Feb 17 12:39:41 EST 2012

Issue #3328 has been reported by Jason Stajich.

Bug #3328: segregating sites calculation fails on gapped sequences

Author: Jason Stajich
Status: New
Priority: Normal
Assignee: Bioperl Guts
Category: Bio::PopGen
Target version: 

   I am Cheng-Ruei Lee, a graduate student in Duke Biology. I'm analyzing many DNA alignments of a plant species.
   I first used (Bio::PopGen::Utilities -> aln_to_population()) to read in the fasta format alignment, and then use Bio::PopGen::Statistics to calculate some statistics without outgroup. Most gene work fine, but I think a bug happened when it meets alignments like this:


   I get this data set from other people. I guess due to the annotation program people used, the definition of coding sequence is much longer in genotype 1 than in other genotypes. This creates a long stretch of gap in the very beginning. Whenever Bio::PopGen meets this kind of genes, the number of singleton counts boost a lot - seems like the long stretch of sites with gap is also counted as singletons. Also, some Fu & Li statistics boosted. The "number of segregation sites" seems not to be affected. (And therefore, there are genes with hundreds of singleton sites but only a few total segregating sites.)
   May be a possible bug in Bio::PopGen::Utilities when reading in the data? Or when calculating singletons?

Cheng-Ruei Lee <cl134 at duke.edu>

You have received this notification because this email was added to the New Issue Alert plugin

You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org

More information about the Bioperl-guts-l mailing list