[Bioperl-guts-l] [Bug 2632] hmmpfam parsers are broken (both hmmer and hmmer_pull)

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Wed Nov 19 12:47:19 EST 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2632





------- Comment #27 from fossandon at vtr.net  2008-11-19 12:47 EST -------
(In reply to comment #21)
> > Can you attach an example report for this?  That would be greatly appreciated. 
> Yes, I took a look at this and wasn't able to reproduce the given example, so
> it would be terrific if you could attach your report.
> Thanks!

Hello again. Seems my previous regular expression to omit the CS line was too
naive, but there is a way to fix it. I've realized that the script mechanics
relies on a counter ($count) to know how to parse each line, so:
###
Counter 0=                   ACSGlrsLiallalgllyayl...frrslwrrllllllaiipiailaNvl
Counter 1=                    CS l+s+    + g  +a+ +  +  ++ +r+l+l+  + ++a+l+
Counter 2=  Unison:820   189 -CSLLGSFTVPSTKGIGLAAQdilHNNPSSQRALCLC-LV-LLAVLG---
232
###
I "think" that the valids line 0 can never start with "<spaces>CS<spaces>" (the
regular expression), since line 0 is the query sequence so gaps are filled with
non-space characters. On the other hand, line 1 can be confused like in this
case, and line 2 could eventually match if the Query Name is "CS something".

So the skip-line regular expression must only be used at $counter = 0, that is
before parsing the 3 lines. Then I "hope" that all possible cases are covered.
You can reproduce the error using any report and then modifying the middle line
(line 1 in the example) so it starts with "CS ".

Please check if my logic have any flaws. Here is the patch that fixed the bug
for me:
###
Index: hmmer.pm
===================================================================
--- hmmer.pm    (revision 15004)
+++ hmmer.pm    (working copy)
@@ -729,7 +729,7 @@
                             && /^\s+RF\s+[x\s]+$/o )
                       );
                     # fix for bug 2632
-                    next if ($_ =~ m/^\s+CS\s+/o);
+                    next if ($_ =~ m/^\s+CS\s+/o && $count == 0);
                     if ( /^Histogram/o || m!^//!o || /^Query sequence/o ) {
                         if ( $self->in_element('hsp') ) {
                             $self->end_element( { 'Name' => 'Hsp' } );
###

Hope it helps!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Bioperl-guts-l mailing list