[Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output

Cook, Malcolm MEC at stowers-institute.org
Fri Jun 23 14:27:19 EDT 2006


Guy,

I've just downloaded and installed your latest 1.1.0 version of
exonerate but unfortunately did not find any mention in the ChangeLog of
addressing this bug, though I still see in the TODO:

    o Should GFF show all coordinates on the +ve strand? (jason_p2g eg)

I was half expecting to see this fixed in this version based on this old
thread.  

Can you please confirm that it has not yet been addressed, and accept my
request that you continue to keep this change on your list for future
versions...

Also, might you elaborate on this entry from the ChangeLog.  I don't see
it mentioned in the manpage.

    o Added %tcs etc to --ryo for dumping coding sequences 

Thanks,

Malcolm Cook

>-----Original Message-----
>From: bioperl-l-bounces at portal.open-bio.org 
>[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Guy Slater
>Sent: Friday, September 02, 2005 11:52 AM
>To: Cook, Malcolm
>Cc: bioperl-l
>Subject: RE: [Bioperl-l] methods, etc. for Bio::SearchIO on 
>exonerate output
>
>On Fri, 2 Sep 2005, Cook, Malcolm wrote:
>
>> Hmmmm - I'd better get some clarification from Guy too.  
>>  
>> Guy, if you don't mind reading the thread below and chiming in on our
>> discussion of interpreting the output of your excellent exonerate
>> program:
>>  
>> The sections of the manpage (
>> <http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html>
>> http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html) that appear
>> relevant are these 2 excerpts:
>>  
>>  1) When an alignment is reported on the reverse complement of a
>> sequence, the coordinates are simply given on the reverse complement
>> copy of the sequence. Hence positions on the sequences are never
>> negative. Generally, the forward strand is indicated by '+', 
>the reverse
>> strand by '-', and an unknown or not-applicable strand (as 
>in the case
>> of a protein sequence) is indicated by '.' "
>>  
>> 2)  --forwardcoordinates <boolean> By default, all coordinates are
>> reported on the forward strand. Setting this option to false 
>reverts to
>> the old behaviour (pre-0.8.3) whereby alignments on the reverse
>> complement of a sequence are reported using coordinates on 
>the reverse
>> complement. 
>>  
>> We see GFF DUMP coordinates still reported on the reverse stand
>> regardless of the setting of --forwardcoordinates.  So these two
>> excerpts from you manpage seem contradictory to me.     Unless I
>> understand `--forwardcoordinates FALSE` to only effect the 
>coordinates
>> reported in the alignment section, not in the GFF DUMP 
>section, which is
>> what it appears to do in practice.
>>  
>> Guy, can you confirm that the --forwardcoordinates option 
>has no effect
>> on GFF output?
>>  
>
>Hi,
>
>Yes, it has no effect, and this is a bug
>(sorry - it was due to my misinterpretation of the GFF2 spec)
>- its on the list of things to be fixed for exonerate 1.1 (soon)
>
>> Further, can you tell us if you plan to comport more closely 
>to the GFF
>> spec, in particular in this case by reporting 
>forwardcoordinates in the
>> GFF DUMP section too?   I see 
>> I see in your TODO list "    o Should GFF show all coordinates on the
>> +ve strand? (jason_p2g eg)".  Hear hear!  I second the motion.
>>  
>> And TODO item " GFF3 support ? http://song.sf.net/" gets my 
>vote too....
>> though this is more of a sticky wicket....
>>  
>
>Yup, GFF3 support is on the list,
>but probably it will not be done in time for exonerate 1.1
>Of course, I'd welcome a patch ...    ;)
>
>(I'm mainly working on getting the cdna2genome
> and genome2genome models working properly for 1.1)
>
>Cheers,
>
>Guy.
>
>> Cheers and Thanks!
>>  
>> Malcolm Cook
>>  
>>  
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Friday, September 02, 2005 9:46 AM
>> To: Cook, Malcolm
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
>> output
>> 
>> 
>> I've already talked to Guy about some of this and I assume 
>fixes will be
>> part of the next release, but it can't hurt to have more people
>> requesting.  The main problem right now is reverse strand hits in GFF
>> output are still screwed up even if you provide the 
>--forwardcoordinates
>> option. 
>> 
>> If someone wanted to write/donate a VULGAR to GFF subroutine (okay
>> VULGAR to a list of Bio::Search::HSP::GenericHSP).  We can also
>> reconstruct everything needed from that, I gave a stab at it 
>once, but
>> there was something missing (or maybe it was pre --forwardcoordinates
>> option).   
>> 
>> 
>> -jason 
>> 
>> On Sep 2, 2005, at 10:36 AM, Cook, Malcolm wrote:
>> 
>> 
>> Jason,
>>  
>> Thanks for the scripts and clues (esp re: using the --ryo option to
>> inject the needed length into the exonerate output to compensate).
>>  
>> I'm considering asking exonerate author to comport with GFF spec.  Do
>> you think this is a road to take?
>>  
>> Cheers,
>>  
>> Malcolm
>>  
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Wednesday, August 31, 2005 12:35 PM
>> To: Cook, Malcolm
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
>> output
>> 
>> 
>> 
>http://fungal.genome.duke.edu/~jes12/software/scripts/process_e
>xonerate_
>> gff3.pl
>> 
>> You may still want to massage it some, but I use the script in this
>> basic form, maybe with a few tweaks:
>> 
>> Note that it requires you to run exonerate with specific 
>--ryo options
>> so that it includes the length of the query and hit sequences in the
>> report output. should be covered in the perldoc in the script.
>> 
>> Without the ryo options enabled,  you'll need to modify the 
>script more
>> to have access to the original sequence db, use 
>Bio::DB::Fasta,  and put
>> in some $dbh->length($seqid) calls instead.
>> 
>> I don't think the part which writes HSP/match lines is 
>actually correct
>> - it is trying to roll gapped HSPs from the similarity features. 
>> 
>> I end up ignoring all but the 'exon' and 'gene' lines for my gbrowse
>> instance and/or grepping out the lines I really think I need.  
>> You may want to s/exon/CDS/ for the protein2genome output as well.
>> 
>> -jason
>> 
>> On Aug 31, 2005, at 1:04 PM, Cook, Malcolm wrote:
>> 
>> 
>> Jason, 
>> 
>> This message is in regards to an old thread  in which you offered to
>> shared a 'script for munging over' exonerate output for lading in
>> DB::GFF (c.f.
>> <http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html>
>> http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html)
>> 
>> Would you be willing to still share that script, if you've got it
>> around? 
>> 
>> Thanks, and regards, 
>> 
>> Malcolm Cook -  <mailto:mec at stowers-institute.org>
>> mec at stowers-institute.org - 816-926-4449
>> Database Applications Manager - Bioinformatics
>> Stowers Institute for Medical Research - Kansas City, MO  USA
>> 
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
>
>-- 
>%!PS % <------ Guy St.C. Slater ------> 
>http://www.ebi.ac.uk/~guy/  <------
>210 297/a{def}def/b{translate}a b 36/c{rotate}a c 0 1 0 1 
>12/d{exch moveto}
>a/e{closepath stroke}a/f{index}a/g{0 0 0 0 4 
>f}a/h{setlinewidth newpath dup
>g}a{pop exch 1 f add 0 h neg d lineto 72 c lineto e 2 h d 3 f 
>0 108 arc d e
>18 c 0 2 f neg b 18 c}for 72 c newpath add g 0 7 arc d e pop showpage
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list