[Bioperl-l] Bio::Index::Fastq - Interface for indexing (multiple) fastq files failure
cjfields at illinois.edu
Mon Apr 5 19:57:02 EDT 2010
On Apr 5, 2010, at 6:15 PM, Peter wrote:
> On Mon, Apr 5, 2010 at 11:53 PM, Jason Stajich <jason at bioperl.org> wrote:
>> Hi David - I am not sure this is going to be the right tool for the job.
>> I'm concerned that none of the Bio::Index:: will really work for
>> Illumina/NGS size data because once you get beyond about 4M hash
>> keys things slow down quite dramatically and/or don't finish.
>> I think we have to consider SQLite implementations or some more
>> explicit way to handle larger keysize for hashes in the DB_File or
>> BerkeleyDB approach. A similar slow problem can be seen if you
>> just index a fastq converted fasta file from a single Illumina lane.
> Another example, and this was in Python rather than Perl, but
> SQLite got a thumbs up over an in house hash based approach:
> I think a new SQLite based Bio* OBF successor to the existing
> BDB based OBDA standard for indexing files could be very interesting.
Would be nice to get some ideas performance-wise with some data sets. SQLite is a very easy option (I'm using it routinely as well).
More information about the Bioperl-l