[Bioperl-guts-l] [Bug 2632] hmmpfam parsers are broken (both hmmer and hmmer_pull)

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Nov 6 19:05:56 EST 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2632





------- Comment #12 from cjfields at bioperl.org  2008-11-06 19:05 EST -------
The CS line data is akin to the RNA secondary structure data in Infernal
output:  

http://www.bioperl.org/wiki/Infernal

I actually derived a new HSP class to deal with that (ModelHSP); it contains an
extra meta() method to contain that string.  We could try switching to  that
class or using a similar method if needed.  It also disables several methods
related to alignments, but that can be revised if needed.

(In reply to comment #11)
> I emailed the Pfam folks about this. I'm adding the response below so that it's
> recorded and so that we can use it as a guide if we decide to add support for
> the CS lines in the future.
> 
> -----
> Hi Dave,
> 
> Thanks for submitting this question as a ticket; sorry it's taken us a
> while to get back to you about it.
> 
> > On the BioPerl project we noticed what appears to be a change in format in
> > Pfam's models between version 22 and 23 -- querying them with hmmpfam
> shows
> > 'CS' lines when using the latter.
> 
> The "CS" line in the hmmpfam output gives the consensus secondary
> structure for each match state. The CS lines will appear for a hit when
> the matching HMM  was built from a seed alignment that included
> secondary structure information. The SS information wasn't present in
> the seeds in Pfam release 22 but it was added for release 23, but only
> for those sequences that could be mapped to a protein structure.
> 
> > This change broke the BioPerl parser for hmmpfam. We have worked around it
> > temporarily by simply ignoring the 'CS' lines, but ideally we could
> capture
> > them once we understand what they are.
> 
> We also had a problem with the CS lines, which caused our "pfam_scan.pl"
> script to go into an endless loop... We too fixed the problem by simply
> ignoring CS lines in the hmmpfam output, but we'd like to fix our parser
> to include that information when we get around to it. 
> 
> > Could you provide or point me to a brief description of this part of the
> > output?
> 
> The bible for anything related to HMMER is the user guide, written by
> Sean Eddy, the author and ultimate source of all knowledge about HMMER:
> 
> http://hmmer.janelia.org/
> ftp://selab.janelia.org/pub/software/hmmer/CURRENT/Userguide.pdf
> 
> There's a description of the hmmpfam output on page 27, which briefly
> describes the contents of the CS line.
> 
> I hope that's useful. Good luck with the BioPerl parse.
> 
> John.
> 
> -- 
> ------------------------------------------------------------------------
> John Tate                                  Phone: (+44/0)1223 494 724
> The Wellcome Sanger Institute              Email: pfam-help at sanger.ac.uk
> Wellcome Trust Genome Campus
> Hinxton Hall,                           Protein families database (Pfam)
> Cambridge, CB10 1SA, UK                 http://pfam.sanger.ac.uk/
> ------------------------------------------------------------------------
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Bioperl-guts-l mailing list