[Bioperl-l] SeqIO: paired end reads
cjfields at illinois.edu
Sun Aug 7 11:51:19 EDT 2011
On Aug 7, 2011, at 4:40 AM, Peter Cock wrote:
> On Friday, August 5, 2011, Lee Katz <lskatz at gmail.com> wrote:
>> Thank you. I figured out through the Newbler manual that there is a
>> sequence to separate the paired end reads. Then, the forum at
>> http://seqanswers.com/forums/showthread.php?t=12940 showed me that the
>> linker sequence is "GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC".
> There is more than one Roche 454 linker sequence depending on the chemistry
> used, one is the same as it's reversve complement, one isn't.
> There is nothing in the SFF file format (nor the Roche specific XML manifest
> last time I checked) that handles the paired end information explicitly.
Yep, it's all implied AFAIK.
>> I think a useful addition to bioperl could be to have paired end reads.
> Maybe, but to do this well you'd want to do flow space alignment of the
> reads to the linker sequence to find the imperfectly called linker
> Personally I use ssf_extract which is a free open source command line tool
> for this (calling an external aligned tool for paid end 454).
I think it could be done, but I would implement something like this as a wrapper around faster tools (like sff_extract or similar). Implementing the functionality in pure (bio)perl/(bio)python doesn't make much sense if there are newer/faster tools out there.
>> This is outside of the domain of bioperl, but now I am left wondering how
>> could specify the distance between reads in Newbler, if the linker
>> is fixed.
> How to do that depends on the aligned or assembly tool you are using.
Yep. I don't think there is a defined way to specify that in any format that I know of.
More information about the Bioperl-l