[Bioperl-l] Next-gen modules
bix at sendu.me.uk
Wed Jun 17 18:10:57 EDT 2009
Chris Fields wrote:
> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>> Tristan Lefebure wrote:
>>> Regarding next-gen sequences and bioperl, following my experience,
>>> another issue is bioperl speed. For example, if you want to trim bad
>>> quality bases at ends of 1E6 Solexa reads using Bio::SeqIO::fastq and
>>> some methods in Bio::Seq::Quality, well, you've got to be patient
>>> (but may be I missed some shortcuts...).
>> This is my concern as well. Or, rather, is there actually a
>> significant set of users out there who are dealing with next-gen
>> sequencing and would consider using BioPerl for their work?
>> I'm working with all the 1000-genomes data at the Sanger, and we at
>> least are probably never going to use BioPerl for the work.
> Are you using pure perl or (gasp) something else? ;>
We use some perl stuff, some C stuff. My own stuff is OO perl, but much
lighter weight than BioPerl. Absolute minimal object creation.
>>> A pure perl solution will be between 100 to 1000x faster... Would it
>>> be possible to have an ultra-light quality object with few simple
>>> methods for next-gen reads?
>> The fastq parser itself already seems pretty fast. The way to get the
>> speedup is to not create any Bio::Seq* objects but just return the
>> data directly. At that point it's not taking much advantage of
>> BioPerl. But certainly it could be done...
> I suppose the best way to assess what needs to be done is come up with a
> set of 'use cases' specifying what users want so we can design around
> them, otherwise we're shooting in the dark.
Indeed. Though at least I think we can all agree it would be nice to
have the functionality there even if it's slow. There will always be at
least some use-cases where the run speed doesn't matter.
> I'm personally wondering if this could be done as a sequence database,
> something similar in theme to Lincoln's SeqFeature::Store, but sequence
> only, and returns quality objects in a similar manner (ala Storable)?
> Not sure whether that's feasible, but it's appears at least scalable.
I think not. Well, at least SeqFeature::Store doesn't scale. Try storing
millions of features in a database and watch it crawl to complete
unusability. I can't imagine a db scaling to holding hundreds of TB of
data either. I'm also not sure what the benefit is. There are already
high-speed ways of indexing your fastq or bam files.
More information about the Bioperl-l