[Bioperl-l] Bio::Index::Fastq - Interface for indexing (multiple) fastq files failure

Jason Stajich jason at bioperl.org
Mon Apr 5 18:53:19 EDT 2010

Hi David - I am not sure this is going to be the right tool for the job.

I'm concerned that none of the Bio::Index:: will really work for 
Illumina/NGS size data because once you get beyond about 4M hash keys 
things slow down quite dramatically and/or don't finish.

I think we have to consider SQLite implementations or some more explicit 
way to handle larger keysize for hashes in the DB_File or BerkeleyDB 
A similar slow problem can be seen if you just index a fastq converted 
fasta file from a single Illumina lane.

I ended up writing a simple DBD::SQLite db storage system for my NGS 
data but I don't think it is very space efficient and there are probably 
some better ways to go about this that some of the indexing schemes that 
the NGS alignment software programs have applied.  I assume other people 
have hit this problem but I've not seen much discussion.


KOVALIC, DAVID K [AG/1000] wrote, On 4/5/10 11:28 AM:
> Hi,
> Using the example code for this module I get an error building an index
> file:
> 		>index_fastq Run101_s11_test_index
> 100108_SOLEXA-02_0101_PE_61810AAXX/s_1_1_sequence.txt
> 		sdbm store returned -1, errno 22, key
> "SOLEXA-02_0001:1:55:1110:290#0/1" at
> /bli/lib/SunOS/perl5/site_perl/5.6.1/Bio/Index/Abstract.pm line 714,
> <FASTQ>  line 30369077.
> The code to build index file/retrieve sequences works on small test file
> but not the real data (fasyq file from single Illumina GAII lane). Here
> is the code for building the index.
> Index_fastq:
> 		# Complete code for making an index for several
> 		# fastq files
> 		use Bio::Index::Fastq;
> 		use strict;
> 		unless (scalar @ARGV>= 2){die "ERROR index_fastq:
> Usage: index_fastq<index file name>  <(list of) fastq file(s) to index>
> \n\n"}
> 		my $Index_File_Name = shift;
> 		my $inx = Bio::Index::Fastq->new('-filename' =>
> $Index_File_Name,'-write_flag' =>  1);
> 		   $inx->make_index(@ARGV);
> Any idea what the problem might be and how to fix it? Let me know if you
> have any insight. Thanks,
> David
> ---------------------------------------------------------------------------------------------------------
> This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.
> All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment.
> ---------------------------------------------------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

More information about the Bioperl-l mailing list