[Bioperl-l] AlignIO and Gbrowse_syn
sheldon.mckay at gmail.com
Tue Aug 17 16:42:50 EDT 2010
The growse_syn dev team is pretty small (n=1) right now, so any
patches would be welcome.
On Wed, Aug 11, 2010 at 6:02 PM, Chris Fields <cjfields at illinois.edu> wrote:
> We have had very few requests to support .maf until recently, which is why there has been little done with it. We welcome any help to improve it.
> On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote:
>> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague.
>> If GBrowse_syn is using .maf format, does AlignIO need more work?
>> Any comments?
>> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) . Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
>> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
>> *The coordinate system for reverse strand matches differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
>> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
>> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
More information about the Bioperl-l