[Bioperl-l] fastq index

Davis, Caleb F cdavis at bcm.edu
Fri Dec 31 01:47:15 EST 2010


Thank you for the rec!

Here's what I get with 1.6.1: 

%perl make_fq_inx_test.pl test.inx test.fastq
%perl fetch_fastq_test.pl test.inx FVBWUVC01D7SUB

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: No description line parsed
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71
STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29
STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147
STACK: fetch_fastq_test.pl:11
-----------------------------------------------------------

Is it a bug?
--Caleb

These perl scripts are from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html

##########  make_fq_inx_test.pl  ###########
# Complete code for making an index for several
# fastq files
use Bio::Index::Fastq;
use strict;

my $Index_File_Name = shift;
my $inx = Bio::Index::Fastq->new(
	'-filename' => $Index_File_Name,
	'-write_flag' => 1);
$inx->make_index(@ARGV);


##########  fetch_fastq_test.pl  ###########
# Print out several sequences present in the index
# in Fastq format
use Bio::Index::Fastq;
use strict;

my $Index_File_Name = shift;
my $inx = Bio::Index::Fastq->new('-filename' => $Index_File_Name);
my $out = Bio::SeqIO->new('-format' => 'Fastq','-fh' => \*STDOUT);

foreach my $id (@ARGV) {
	my $seq = $inx->fetch($id); # Returns Bio::Seq::Quality object   <-------------------  THROW
	$out->write_seq($seq);
}

Example data--

##########  test.fastq  ###########
@FVBWUVC01BR7MP
GCGACCCTAGGTAGCAACCGCCGGCTTCGGCGGTAAGGTATCACTCAG
+
24<9000988:;<=<;=<44444<<=<<<>???@@@@?>=86662232
@FVBWUVC01D7NSE
GAAGCAGACACAGAAAGACACGGTCTAGCAGATCG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIEEEE@<
@FVBWUVC01D7SUB
TTTATCGGCTAGGTCAAATAGAGTGCTTTGATATCAGCATGTCTAGCT
+
FFD===FFFFFHFFFFFFFFFFC888FFFFDDBAAA@@@840...757
@FVBWUVC01BFN75
TTAGAATTCAGTTTAGTGCGCTGATCTGAGTCGAGATAAAATCACCAGTACCCAAAACCAGGCGGGCTCGCCACGTTGGCTAATCCTGGTACATTTTGTAATCAATGTTCAGAAGA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFDDBB:544448<<=>;899<=8889988894<<9955,,/4,,,,,811775512426766777;97668<<44944
@FVBWUVC01AYO0N
AAATTTGTGTTAGAAGGACGAGTCACCACGTACCAATAGCAACAACGATCGGTCGGACTATTCATTGTGGTGGTGACGCTC
+
IIIIIIIIIIIIIHHFF@??DA???==<=766<<11,/,,,1,,,,733977--/4444722466<;;<<<82/,,--.12
@FVBWUVC01EYPM9
GGATTACACGGGAAAGGTGCTTGTGTCCCGACAGGCTAGGATA
+
FFFFDD<<:ABAA<988:9::BA===BBBBAA??<8623425/
@FVBWUVC01BWHY4
AGGTACTACTTCTTAGTGAGACAAGTCCTGGACAGGAGCAGGTAATATT
+
HGGGDDD:555:4449==>=<<555=BBAAAA at 8888894224266;..
@FVBWUVC01ELH7H
CATGAGAAGTCTTAATATTACCTCTCAGGTACCTCCTCTTAAGACACAATTACAGAAGGTGCT
+
IIIII@@??GIIIIG<<666:IFEIEIEED<==<;CE?3344IFIIIIIIIIIGC>==<HGD;
@FVBWUVC01CTTAY
CTCGAGATTCTGGATCCTCATGGACAAGATGTTCTCCGGCTTAGAGAT
+
FFFFFFFFFFFFDA:88@>>>44444898==<;<62444221775557


-----Original Message-----
From: Chris Fields [mailto:cjfields at illinois.edu] 
Sent: Wednesday, December 29, 2010 9:35 PM
To: Cook, Malcolm
Cc: Davis, Caleb F; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] fastq index

May just wrap this for the indexer.  Thanks for the pointer Malcolm!

chris

On Dec 29, 2010, at 6:20 PM, Cook, Malcolm wrote:

> If you're looking for alternatives, I recommend: http://sourceforge.net/projects/cdbfasta/
> 
> No bioperl wrapper, but, hey, that's what `system` is for
> 
> Cheers,
> 
> Malcolm
> 
> 
> On 12/29/10 2:28 PM, "Chris Fields" <cjfields at illinois.edu> wrote:
> 
> On Dec 29, 2010, at 1:46 PM, Davis, Caleb F wrote:
> 
>> Hi all,
>> 
>> Retrieving fastq from an index with bio::index::fastq is not working for me. I try using the index creation and retrieval code as given here:
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html
>> using the fastq sequence given here:
>> http://www.bioperl.org/wiki/FASTQ_sequence_format
>> but I get this error:
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: NCYC361-11a03.q1k bases 1 to 1576 doesn't match fastq descriptor line type
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>> STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/fastq.pm:113
>> STACK: Bio::Index::AbstractSeq::fetch /usr/lib/perl5/site_perl/5.8.8/Bio/Index/AbstractSeq.pm:134
>> STACK: fetch_fastq_test.pl:11
>> 
>> The only other report of this behavior I could find is here:
>> http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/17836
>> 
>> I get the same behavior when I use my own code and sequence. I hope I provided enough information. Sadly, I'm not sure what I'm doing wrong here.
>> 
>> --Caleb
> 
> Caleb,
> 
> Make sure you are using the latest BioPerl release via CPAN, or via github; the line number and error message don't correspond to the latest version.  If the problem persists, you may need to file a bug report for this with some example data and a script, or at least show some example data that is triggering the problem.
> 
> I believe the current indexing scheme used for FASTQ isn't up-to-date with the current parser (which underwent a complete refactoring a while back), so this would help tremendously, but it should be fairly easy to add proper indexing to this.  Jason and I briefly talked about FASTQ parsing a few months back in relation to speed of parsing, it could be much faster (my main concern initially was that it was correct).
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list