[Bioperl-guts-l] [Bug 2632] hmmpfam parsers are broken (both hmmer and hmmer_pull)

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Nov 6 14:35:15 EST 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2632





------- Comment #11 from online at davemessina.com  2008-11-06 14:35 EST -------
I emailed the Pfam folks about this. I'm adding the response below so that it's
recorded and so that we can use it as a guide if we decide to add support for
the CS lines in the future.

-----
Hi Dave,

Thanks for submitting this question as a ticket; sorry it's taken us a
while to get back to you about it.

> On the BioPerl project we noticed what appears to be a change in format in
> Pfam's models between version 22 and 23 -- querying them with hmmpfam
shows
> 'CS' lines when using the latter.

The "CS" line in the hmmpfam output gives the consensus secondary
structure for each match state. The CS lines will appear for a hit when
the matching HMM  was built from a seed alignment that included
secondary structure information. The SS information wasn't present in
the seeds in Pfam release 22 but it was added for release 23, but only
for those sequences that could be mapped to a protein structure.

> This change broke the BioPerl parser for hmmpfam. We have worked around it
> temporarily by simply ignoring the 'CS' lines, but ideally we could
capture
> them once we understand what they are.

We also had a problem with the CS lines, which caused our "pfam_scan.pl"
script to go into an endless loop... We too fixed the problem by simply
ignoring CS lines in the hmmpfam output, but we'd like to fix our parser
to include that information when we get around to it. 

> Could you provide or point me to a brief description of this part of the
> output?

The bible for anything related to HMMER is the user guide, written by
Sean Eddy, the author and ultimate source of all knowledge about HMMER:

http://hmmer.janelia.org/
ftp://selab.janelia.org/pub/software/hmmer/CURRENT/Userguide.pdf

There's a description of the hmmpfam output on page 27, which briefly
describes the contents of the CS line.

I hope that's useful. Good luck with the BioPerl parse.

John.

-- 
------------------------------------------------------------------------
John Tate                                  Phone: (+44/0)1223 494 724
The Wellcome Sanger Institute              Email: pfam-help at sanger.ac.uk
Wellcome Trust Genome Campus
Hinxton Hall,                           Protein families database (Pfam)
Cambridge, CB10 1SA, UK                 http://pfam.sanger.ac.uk/
------------------------------------------------------------------------


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Bioperl-guts-l mailing list