[Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output

Jason Stajich jason.stajich at duke.edu
Fri Sep 2 10:46:26 EDT 2005

I've already talked to Guy about some of this and I assume fixes will  
be part of the next release, but it can't hurt to have more people  
requesting.  The main problem right now is reverse strand hits in GFF  
output are still screwed up even if you provide the -- 
forwardcoordinates option.

If someone wanted to write/donate a VULGAR to GFF subroutine (okay  
VULGAR to a list of Bio::Search::HSP::GenericHSP).  We can also  
reconstruct everything needed from that, I gave a stab at it once,  
but there was something missing (or maybe it was pre -- 
forwardcoordinates option).


On Sep 2, 2005, at 10:36 AM, Cook, Malcolm wrote:

> Jason,
> Thanks for the scripts and clues (esp re: using the --ryo option to  
> inject the needed length into the exonerate output to compensate).
> I'm considering asking exonerate author to comport with GFF spec.   
> Do you think this is a road to take?
> Cheers,
> Malcolm
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Wednesday, August 31, 2005 12:35 PM
> To: Cook, Malcolm
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on  
> exonerate output
> http://fungal.genome.duke.edu/~jes12/software/scripts/ 
> process_exonerate_gff3.pl
> You may still want to massage it some, but I use the script in this  
> basic form, maybe with a few tweaks:
> Note that it requires you to run exonerate with specific --ryo  
> options so that it includes the length of the query and hit  
> sequences in the report output. should be covered in the perldoc in  
> the script.
> Without the ryo options enabled,  you'll need to modify the script  
> more to have access to the original sequence db, use  
> Bio::DB::Fasta,  and put in some $dbh->length($seqid) calls instead.
> I don't think the part which writes HSP/match lines is actually  
> correct - it is trying to roll gapped HSPs from the similarity  
> features.
> I end up ignoring all but the 'exon' and 'gene' lines for my  
> gbrowse instance and/or grepping out the lines I really think I need.
> You may want to s/exon/CDS/ for the protein2genome output as well.
> -jason
> On Aug 31, 2005, at 1:04 PM, Cook, Malcolm wrote:
>> Jason,
>> This message is in regards to an old thread  in which you offered  
>> to shared a 'script for munging over' exonerate output for lading  
>> in DB::GFF (c.f. http://bioperl.org/pipermail/bioperl-l/2005-April/ 
>> 018741.html)
>> Would you be willing to still share that script, if you've got it  
>> around?
>> Thanks, and regards,
>> Malcolm Cook - mec at stowers-institute.org - 816-926-4449
>> Database Applications Manager - Bioinformatics
>> Stowers Institute for Medical Research - Kansas City, MO  USA
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12

Jason Stajich
Duke University

More information about the Bioperl-l mailing list