[Bioperl-guts-l] [Bug 2178] New: Bio::Enzyme's cutter() is wrong for V, H, D, B amibuities

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Sun Jan 7 01:59:25 EST 2007


           Summary: Bio::Enzyme's cutter() is wrong for V,H,D,B amibuities
           Product: Bioperl
           Version: 1.5 branch
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Core Components
        AssignedTo: bioperl-guts-l at bioperl.org
        ReportedBy: tobias.thierer at web.de

current code is (from the 1.5 release from december 2006, presumably it's still
in the main trunk too):

sub cutter {
    my ($self)=@_;
    $_ = uc $self->string;

    my $cutter = tr/[ATGC]//d;
    my $count =  tr/[MRWSYK]//d;
    $cutter += $count/2;
    $count =  tr/[VHDB]//d;
    $cutter += $count*3/4;
    return $cutter;

Here, the line "$cutter += $count*3/4;" is wrong. VHDB are ambiguity symbols
that match three different nucleotides, so they contribute less to the
effective recognition sequence length than e.g. Y which matches only two
nucleotides. A symbol which matches n of the 4 nucleotides has an effective
length of 1 - log(n) / log(4). Hence, the line should read:

   $cutter += $count * (1 - log(3) / log(4));

(1-log(3)/log(4)) is approximately 0.2075.

Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the Bioperl-guts-l mailing list