[Bioperl-l] AlignIO and Gbrowse_syn
Russell.Smithies at agresearch.co.nz
Wed Aug 11 17:31:32 EDT 2010
I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague.
If GBrowse_syn is using .maf format, does AlignIO need more work?
I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
*Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
*The coordinate system for reverse strand matches differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
*AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
More information about the Bioperl-l