[Bioperl-guts-l] [Bug 1867] New: Bio::SimpleAlign gap_line and all_gap_line methods ignore $gapchar

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Wed Sep 21 08:30:51 EDT 2005


http://bugzilla.open-bio.org/show_bug.cgi?id=1867

           Summary: Bio::SimpleAlign gap_line and all_gap_line methods
                    ignore $gapchar
           Product: Bioperl
           Version: main-trunk
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Core Components
        AssignedTo: bioperl-guts-l at bioperl.org
        ReportedBy: s9904982 at sms.ed.ac.uk


In the Bio::SimpleAlign module, there's a flaw in the gap_line and all_gap_line
method, which I tracked down through a flaw in the remove_gaps method.

In both the gap_line and the all_gap_line, the function that looks for
occurences of $gapchar is incorrect - it does not take into acount the variable
$gapchar, but instead always looks for '-'.  

Currently:

sub gap_line {
    my ($self,$gapchar) = @_;
    $gapchar = $gapchar || $self->gap_char;
    my %gap_hsh; # column gaps vector
    foreach my $seq ( $self->each_seq ) {
                my $i = 0;
        map {$gap_hsh{$_->[0]} = undef} grep {$_->[1] eq '-'} 
                  map {[$i++, $_]} split(//, uc ($seq->seq));
    }
    my $gap_line;
    foreach my $pos ( 0..$self->length-1 ) {
          $gap_line .= (exists $gap_hsh{$pos}) ? $gapchar:'.';
    }
    return $gap_line;
}

Should be:

sub gap_line {
    my ($self,$gapchar) = @_;
    $gapchar = $gapchar || $self->gap_char;
    my %gap_hsh; # column gaps vector
    foreach my $seq ( $self->each_seq ) {
                my $i = 0;
        map {$gap_hsh{$_->[0]} = undef} grep {$_->[1] eq $gapchar} 
                  map {[$i++, $_]} split(//, uc ($seq->seq));
    }
    my $gap_line;
    foreach my $pos ( 0..$self->length-1 ) {
          $gap_line .= (exists $gap_hsh{$pos}) ? $gapchar:'.';
    }
    return $gap_line;
}

This is doubly confusing since the $gap_line that is returned has $gapchar
inserted into it, even though $gapchar is allways assumed to be '-'.  So if
your
alignment is:

alpha   ACGTAGC---ACT--ACAGCTAGC
beta    ACGTAGCGGTACT--ACTGCTACC
gamma   ACGCAGC--TTAT--ACTGCTAGC

and you call $aln->gap_line();
you get 
......---...--.........

but if you call $aln->gap_line('N')
you get
......NNN...NN.........

even though there are no N's.

The all_gap_char method has the same flaw.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Bioperl-guts-l mailing list