[Bioperl-l] AlignIO and Gbrowse_syn
cjfields at illinois.edu
Wed Aug 11 18:02:38 EDT 2010
We have had very few requests to support .maf until recently, which is why there has been little done with it. We welcome any help to improve it.
On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote:
> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague.
> If GBrowse_syn is using .maf format, does AlignIO need more work?
> Any comments?
> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
> *The coordinate system for reverse strand matches differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
More information about the Bioperl-l