[Bioperl-l] Next-gen modules
e.stupka at ucl.ac.uk
Wed Jun 17 16:36:31 EDT 2009
Better than colorspaced discussions for sure ;)
On 17 Jun 2009, at 21:35, Chris Fields wrote:
> So, #1 priority is to get fastq up-to-speed, then maybe assess other
> Illuminating discussion, thanks Elia!
> urgh, excuse unintended bad pun above...
> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>> Interesting that you mention the database issue. We found that for
>> specific memory/CPU intenstive things we also switch to using dbs.
>> For example, after many years of loyal use of disconnected_ranges
>> we switched to a simple SQL implementation of it, because of the
>> large performance gains it would give us. Similarly in Ensembl as
>> well as in the old days of bioperl-db we opted for doing subseq
>> within SQL where possible.
>> Some lean way of SQL'izing specific components could be less
>> "disruptive" than avoiding object creation and provide significant
>> gains in performance. Could be set as an optional flag, and could
>> use temporary ad hoc SQL databases?
>> Still, priority now is to make SeqIO compliant with all those
>> formats, than we can worry about performance :)
>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>> Tristan Lefebure wrote:
>>>>> Regarding next-gen sequences and bioperl, following my
>>>>> experience, another issue is bioperl speed. For example, if you
>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using
>>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,
>>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>> This is my concern as well. Or, rather, is there actually a
>>>> significant set of users out there who are dealing with next-gen
>>>> sequencing and would consider using BioPerl for their work?
>>>> I'm working with all the 1000-genomes data at the Sanger, and we
>>>> at least are probably never going to use BioPerl for the work.
>>> Are you using pure perl or (gasp) something else? ;>
>>> Judging by the feedback there are definitely a set of users who
>>> would like to integrate nextgen into bioperl somehow, probably to
>>> take advantage of other aspects of bioperl.
>>>>> A pure perl solution will be between 100 to 1000x faster...
>>>>> Would it be possible to have an ultra-light quality object with
>>>>> few simple methods for next-gen reads?
>>>> The fastq parser itself already seems pretty fast. The way to get
>>>> the speedup is to not create any Bio::Seq* objects but just
>>>> return the data directly. At that point it's not taking much
>>>> advantage of BioPerl. But certainly it could be done...
>>> I suppose the best way to assess what needs to be done is come up
>>> with a set of 'use cases' specifying what users want so we can
>>> design around them, otherwise we're shooting in the dark.
>>> I'm personally wondering if this could be done as a sequence
>>> database, something similar in theme to Lincoln's
>>> SeqFeature::Store, but sequence only, and returns quality objects
>>> in a similar manner (ala Storable)? Not sure whether that's
>>> feasible, but it's appears at least scalable.
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374
Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801
More information about the Bioperl-l