[Bioperl-l] Bio::Index::Fastq '@' in qual

Sofia Robb sofia2341 at gmail.com
Mon Oct 24 10:58:13 EDT 2011


I am having problems running Bio::Index::Fastq.  I get the following error when a quality line begins with '@'.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: No description line parsed
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71
STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29
STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147
STACK: Bio::Index::AbstractSeq::get_Seq_by_id /usr/share/perl5/Bio/Index/AbstractSeq.pm:198
STACK: /home_stajichlab/robb/bin/clean_pairs_indexed.pl:68

Here is an example of a fastq record that is causing this error, The last line which starts with an '@'  is actually the qual line.


i see that chris has partially addressed this in the mailing list

However as he pointed out at the time, it appears this may be a fairly large problem.

My fastq seq and qual lines are alway only one line, so I think that adding a line count and only checking for @ in the lines that $line_count%4 ==0  would work since the header lines are always the first of 4 lines , 0,4,8, etc.

But if there are multiple lines of seq and qual i think that the /^+$/ of /^+$id/ can be used to identify the end of the sequence and the number of lines of quality should be equal to the number of lines of sequence

## only for single line seq and qual
my $line_count = 0;
   while (<$FASTQ>) {
       if (/^@/ and  $line_count % 4 == 0) {
           # $begin is the position of the first character after the '@'
           my $begin = tell($FASTQ) - length( $_ ) + 1;
           foreach my $id (&$id_parser($_)) {
               $self->add_record($id, $i, $begin);

BioPerl fastq parsing issues aside, is there another tool which allows you to retrieve arbitrary sequences from a fastq file by sequence ID?

There's one called cdbfasta which looks like it might work — does anyone have experience with it?


P.S. I am CCing Peter Cock in case BioPython has solved this issue already — if so, perhaps their solution could be applied here.

More information about the Bioperl-l mailing list