[Bioperl-l] Bio::AlignIO ignores questionmarks?
dmessina at wustl.edu
Fri Apr 14 01:14:25 EDT 2006
I'm by no means an expert with this module, but I'll take a shot.
Running your code through a debugger, I'm seeing that
Bio::AlignIO::fasta is gobbling the question marks:
line 66: $MATCHPATTERN = '^A-Za-z\.\-';
and then where $entry contains a line of sequence from the input file
line 118: $entry =~ s/[$MATCHPATTERN]//g;
As far as I can tell, a question mark is not a valid character for
the FASTA format (see http://en.wikipedia.org/wiki/FASTA_format) --
perhaps that's the reason Bio::AlignIO::fasta doesn't permit them?
And then by the time missing_char() is applied, the question marks
are already gone.
What happens if you read in your sequence with question marks in a
format that explicitly permits question marks?
On Apr 13, 2006, at 7:38 PM, Kai Müller wrote:
> I'm very new to BioPerl and have a maybe silly question.
> when using Bio::AlignIO to load a set of sequences, the
> questionmarks are
> simply lost (they refer to missing characters as opposed to gap
> [-] or ambiguity [N]). I thought that 'missing_char()' might help,
> but it
> didn't (I probably used it the wrong way).
> when $filename contains sequences with ????, the following snippet
> produce an alignment with ???? lost and downstream nucleotide just
> and the resulting length differnces filled by '---' @ 3' end:
> my $aln_in = Bio::AlignIO->new(-file => "$filename", '-format' =>
> my $aln = $aln_in->next_aln();
> my $testout = Bio::AlignIO->new(-fh => \*STDOUT , '-format' =>
> Can somebody give me a hint here?
> thanks and all the best,
> Kai Müller
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
More information about the Bioperl-l