[Bioperl-l] Next-gen modules
michael watson (IAH-C)
michael.watson at bbsrc.ac.uk
Wed Jun 17 15:15:20 EDT 2009
In answer to your question, yes! We have 6 illumina datasets which we have searched against sequence databases using fasta, and I used SearchIO to parse the results. This is where BioPerl comes into its own - wrapped around fast, optimised solutions written in C or Java. Sure, I could have written something in sed/awk/pure perl/C etc to parse out the information I needed faster, but the SearchIO solution only took a few minutes to parse a huge fasta results file, and for me (and many others, I suspect) a few minutes is not a problem.
From: bioperl-l-bounces at lists.open-bio.org on behalf of Sendu Bala
Sent: Wed 17/06/2009 7:20 PM
To: tristan.lefebure at gmail.com
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Next-gen modules
Tristan Lefebure wrote:
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).
This is my concern as well. Or, rather, is there actually a significant
set of users out there who are dealing with next-gen sequencing and
would consider using BioPerl for their work?
I'm working with all the 1000-genomes data at the Sanger, and we at
least are probably never going to use BioPerl for the work.
> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?
The fastq parser itself already seems pretty fast. The way to get the
speedup is to not create any Bio::Seq* objects but just return the data
directly. At that point it's not taking much advantage of BioPerl. But
certainly it could be done...
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
More information about the Bioperl-l