[Bioperl-l] gff_string on an HSPI object is not Bio::DB::GFF friendly

Mark Wilkinson markw at illuminae.com
Fri Jan 9 10:27:51 EST 2004

Hi all, 

I'm wondering if the gff_string call on an HSPI object is perhaps
backwards (or if it is Bio::DB::GFF that is backwards ).  It certainly
appears that I get "mirror image" data from that call compared to what I
need for Gbrowse.

e.g. I blast an EST (a101) against genbank.  I then take the blast
report and parse it until I have an HSP object in my hand. Now...

If I do ->gff_string on that HSP object I get this:

DB<14> p $hsp->gff_string
a101 BLASTN similarity 138 160 23 + 0 Target gi|12329259 125209 125231

But by Gbrowse GFF standards what I expect to see (I think) is this:

gi|12329259 BLASTN similarity 138 160 23 + 0 Target a101  1  200

I know that Gbrowse GFF is a bit weird, but before I go coding something
new to deal with this problem I want to make sure that my interpretation
of the problem is correct, and that nobody has actually coded a solution
already (other than my GbroweGFF ResultWriterI, which is what I am
working on updating right now).  

One possibility is to modulate the output by passing an argument like
gff_string('query') or gff_string('hit') to indicate which of the
sequences you consider to be the "reference" sequence.  I tried calling
gff_string on $HSP->query and $HSP->hit, but they have lost all
information about each other, so that doesn't help.

If anyone has a preference on how this should behave please say so.  It
may be that we don't want BioPerl to exhibit Gbrowse GFF behaviour under
any circumstances, because it really is quite peculiar in the case of
alignment features.  My opinion is that the current bioperl output is
more comprehensible than what Gbrowse is expecting ("Target" surely
means what you hit with your query, rather than your query itself...??),
but since Gbrowse & Bio::DB::GFF are so tightly integrated with BioPerl
it would probably be better to have some BioPerl way to generate the
output format expected by Bio::DB::GFF.

Also, what is the "correct" way to represent alignment features in
GFF3?  Does ->gff_string output HSP's correctly in GFF3 format?  If not,
then we should probably revisit this issue in its entirety. 
Scott/Lincoln, is there a compelling reason for Gbrowse to require its
input in the format that it does, or could it be "flipped"?


Mark Wilkinson
Assistant Professor (Bioinformatics)
Dept. of Medical Genetics
University of British Columbia's iCAPTURE Centre
Vancouver, BC, Canada

It just goes to show you that SOAP::Lite is more intuitive than you might 
think, if you know enough Perl and have the patience to dive into the 
source code.
		-Byrne Reese

More information about the Bioperl-l mailing list