[Bioperl-guts-l] [Bug 2016] Bio::Align::metafasta

bugzilla-daemon at newportal.open-bio.org bugzilla-daemon at newportal.open-bio.org
Wed Jun 7 09:06:57 EDT 2006


http://bugzilla.open-bio.org/show_bug.cgi?id=2016





------- Comment #11 from bix at sendu.me.uk  2006-06-07 09:06 -------
(In reply to comment #10)
> I had a look at commit history and it was Brian who put the metafasta tests
> into AlignIO.t. No wonder the code did not look familiar to me! These tests are
> more specific than other format tests in that file.
> 
> Meta-info should not be seen by Bio::SimpleAlign object. It should live in
> other dimension and be completely independent of the sequence. 
> 
> I never intended the test data to be used for MSA testing. That is why the two
> sequences in the data file happen to be of different length. 
> 
> The main problem is in AlignIO::consensus_string. It fails if the sequences in
> the alignment are not of equal length. I tested this with normal
> Bio::LocatableSeq objects. I could pad the sequence in the test file, but that
> does not solve the problem with  consensus_string().

What do you mean? padding the sequences till they're the same length solves the
problem for me.


> ... I fixed it by testing if the $letter in a given position is undef (i.e. we
> are reading beyond the sequence length), then assume that it is a gap
> character. Tests are added into SimpleAlign.t.
> 
> I am leaving this bug open in case I have missed something. If Chris and Sendu
> are happy how this is resolved, please close it.

I don't think that's the way to fix it. Or at least it makes more sense to do
it the way I did it. Your way is contrary to the docs for and pre-existing
behaviour of SimpleAlign. My way is a copy/paste job from
Bio::AlignIO::fasta.pm. fasta.pm does it that way because fasta multiple
alignment files don't have to bother with trailing gaps to make all sequences
the same length. But the generic idea of a multiple alignment does need all the
sequences the same length.

The fasta.pm knows the problem and deals with it silently. But in other formats
sequences not being the same length could indicate a major problem with the
input file and the user should discover this through the exception. SimpleAlign
shouldn't silently 'correct' this problem by adding gaps because we could end
up aligning something that wasn't even an alignment file. 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Bioperl-guts-l mailing list