[Bioperl-l] hmmer3.pm question re query and hit coordinates

Fields, Christopher J cjfields at illinois.edu
Thu Jul 12 11:24:13 EDT 2012

On Jul 12, 2012, at 8:43 AM, Kai Blin wrote:

> Hash: SHA1
> On 2012-07-11 23:25, Wibowo Arindrarto wrote:
> Hi,
>> The current Biopython parser for the plain text format parses the
>> very first line to find out which HMMER flavor produces the result.
>> Both 'hmm from' and 'hmmto' are query coordinates if the flavor is
>> hmmsearch or phmmer; and they're hit coordinates if the flavor is
>> hmmscan.
> Whoops. I mostly looked at hmmscan when writing the parser, because
> that's the file format I needed for my code. The code clearly should
> follow the way the hmmer2 parser works, and differentiate between
> hmmsearch and hmmscan type output.
> As I said on the bug report, I'm happy to look at code fixing this.

Seems like it should be easy enough to address if there is something in the output that indicates the report type.

>> This information is not available in other HMMER command line
>> output formats (tblout and domtblout), which as Peter has
>> mentioned, required us to treat different flavors of the table
>> output as different formats for the time being.
> As far as I'm aware, BioPerl currently doesn't parse the table output
> format.

The only reason to do so is if the table provides additional information the actual hits don't (this can be the case with BLAST reports).

> Seeing how much repeated pain we run into with all these parsers in
> the different Bio* projects, I wonder if there was a smarter way to
> deal with parsing. Maybe at least some shared grammar file that we
> could use for testing, to make sure we at least have the same
> expectations about file formats in the different language
> implementations. Ideally we'd auto-generate the parsers from the
> grammar specification, but I guess that'll stay wishful thinking for
> quite a bit.

I would fully support something like this, been thinking about this with Marpa::XS (which now has a compiled library, libmarpa, to make it less perl-centric), and there have been talks of using a similar toolkit with the bioruby folks.  We could always have a plain-perl/python/ruby/etc fallback in the most common formats.


More information about the Bioperl-l mailing list