[Bioperl-l] hmmer3.pm question re query and hit coordinates

Fields, Christopher J cjfields at illinois.edu
Wed Jul 11 15:02:46 EDT 2012


Peng, 

Has this been filed as a bug yet?  

    https://redmine.open-bio.org/projects/bioperl

Seems like it would be fairly easy to fix, but I want to track it just in case.

chris

On Jul 11, 2012, at 12:45 PM, Peng Zhou wrote:

> Hello guys,
> 
> Just a follow-up, it seems to me the bioperl-live version is still having the same problem - calling hit "query" while query sequence "hit". I also looked into the test script written for hmmer3 (bioperl-live/t/SearchIO/hmmer.t), and it doesn't deal with the alignment part - I guess that's why this bug was not discovered.
> 
> To be simple, here's an output of hmmsearch v3.0:
> # hmmsearch :: search profile(s) against a sequence database
> # HMMER 3.0 (March 2010); http://hmmer.org/
> # Copyright (C) 2010 Howard Hughes Medical Institute.
> # Freely distributed under the GNU General Public License (GPLv3).
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # query HMM file:                  /project/youngn/zhoup/Scripts/spada/profile/21_all.hmm
> # target sequence database:        /project/youngn/zhoup/Data/misc3/spada/Athaliana/01_genome/12_refseq_orf.fa
> # output directed to file:         /project/youngn/zhoup/Data/misc3/spada/Athaliana/11_hmmSearchX/01_raw.txt
> # number of worker threads:        4
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> Query:       CRP0000  [M=75]
> Scores for complete sequences (score includes all domains):
>    --- full sequence ---   --- best 1 domain ---    -#dom-
>     E-value  score  bias    E-value  score  bias    exp  N  Sequence                 Description
>     ------- ------ -----    ------- ------ -----   ---- --  --------                 -----------
>     5.5e-25   95.0  14.4    5.7e-25   95.0  10.0    1.0  1  Chr2_540228_540404_+     
> 
> Domain annotation for each sequence (and alignments):
> >> Chr2_540228_540404_+  
>    #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
>  ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
>    1 !   95.0  10.0   3.6e-30   5.7e-25      20      74 ..       4      59 .]       1      59 [] 0.95
> 
>   Alignments for each domain:
>   == domain 1    score: 95.0 bits;  conditional E-value: 3.6e-30
>                CRP0000 20 tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg.rrrCfCtkpc 74
>                           ++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c
>   Chr2_540228_540404_+  4 GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC 59
>                           568899***99********************************************* PP
> 
> And here is a dump of the parsed HSP object:
> $VAR1 = bless( {
>                  'VERBOSE' => 0,
>                  'IDENTICAL' => 0,
>                  'RANK' => 1,
>                  'STRANDED' => 'NONE',
>                  'EVALUE' => '3.6e-30',
>                  'HSP_LENGTH' => 56,
>                  'ALGORITHM' => 'HMMSEARCH'
>                  'SCORE' => '95.0',
>                  'GAP_SYMBOL' => '-',
>                  'CONSERVED' => 0,
>                  
>                  'HIT_NAME' => 'Chr2_540228_540404_+',
>                  'HIT_DESC' => '',
>                  'HIT_START' => '20',
>                  'HIT_END' => '74',
>                  'HIT_LENGTH' => 56,
>                  'HIT_SEQ' => 'tegpkvaeartCesqShkFkGpCvsdtnCasvCrtEgfpgGecrg-rrrCfCtkpc',
>                  'HIT_FRAME' => 0,
>                  
>                  'QUERY_NAME' => 'CRP0000',
>                  'QUERY_DESC' => undef,
>                  'QUERY_START' => '4',
>                  'QUERY_END' => '59',
>                  'QUERY_LENGTH' => '75',
>                  'QUERY_FRAME' => 0,
>                  'QUERY_SEQ' => 'GMGPVTVEARTCESKSHRFKGPCVSTHNCANVCHNEGFGGGKCRGfRRRCYCTRHC',
>                  
>                  'HOMOLOGY_SEQ' => '++gp+++eartCes+Sh+FkGpCvs +nCa+vC++Egf gG+crg rrrC+Ct++c',
>                }, 'Bio::Search::HSP::HMMERHSP' );
> 
> Clearly, the "HIT_START", "HIT_END", "HIT_SEQ" should actually be exchanged with "QUERY_START", "QUERY_END" and "QUERY_SEQ" values.
> 
> Thanks,
> 
> Peng,
> 
> On Tuesday, July 19, 2011 11:23:20 PM UTC-5, Givan, Scott A. wrote:
> I'll try the bioperl-live version. Thanks guys.
> Scott Givan
> 541-740-4685
> Sent from an iPhone (so expect typos).
> 
> On Jul 19, 2011, at 10:34 PM, "Chris Fields" <cjfields at illinois.edu> wrote:
> 
> > This might be a disconnect between the HMMER3 version in bioperl-live and the one in Kai's bioperl-hmmer3 repo.  I believe the one in bioperl-live is newer.  Scott, can you give that a try?
> > 
> > chris
> > 
> > On Jul 19, 2011, at 9:45 PM, Thomas Sharpton wrote:
> > 
> >> Hi Scott,
> >> 
> >> Thanks for writing. I'm on the road at the moment so I have to be briefer and less thorough than I'd like to be.
> >> 
> >> What you are observing is not the intended behavior. Oddly, it's not what I recall obtaining in my tests on this software, though I was mostly interested in hmmsearch at the time and may have been sloppier than I should have been when it came to hmmscan.
> >> 
> >> What version of HMMER3 you're using? There have been some small formatting changes in the past that might be causing a burp in the parser, though I'm doubting it.
> >> 
> >> Kai Blin wrote some test scripts (found here: bioperl-live/t/SearchIO/hmmer.t) that, if I recall correctly, evaluate query/hit coordinates. It might be worth giving this a shot if you haven't already.
> >> 
> >> Also, if you don't mind, I'm happy to run your code on your output file on my end. It might help me diagnose the problem.
> >> 
> >> Sorry this is being a thorn in your side! I've cc'ed the list in case anyone else has insight into this matter.
> >> 
> >> Best,
> >> Thomas
> >> 
> >> On Jul 19, 2011, at 10:43 AM, Givan, Scott A. wrote:
> >> 
> >>> Hi Thomas,
> >>> 
> >>> I'm using modules in the bipoerl-hmmer3 git repository to parse hmmscan
> >>> reports. When I parse the files and walk through the HSP's like:
> >>> 
> >>> while (my $hit = $rslt->next_model) {
> >>> 
> >>>    while (my $domain = $hit->next_hsp) {
> >>> 
> >>> And retrieve the "hit" coordinates like:
> >>> 
> >>>        print "hit coords: ", $domain->start('hit'), "-", $domain->end('hit'),
> >>> "\n";
> >>> 
> >>> The coordinates returned correspond to what I would call the "query",
> >>> since they are for the sequence I fed to hmmscan to search the profile
> >>> database. Likewise, when retrieving the query coordinates like
> >>> $domain->start('query'), I get what I consider the "hit" coordinates,
> >>> since they are for the domain profile. Is this the intended behavior?
> >>> 
> >>> Thanks.
> >>> 
> >>> scott
> >>> 
> >>> -- 
> >>> Scott A. Givan
> >>> Associate Director
> >>> Informatics Research Core Facility
> >>> 240e Bond Life Sciences Center
> >>> Research Assistant Professor
> >>> Molecular Microbiology and Immunology
> >>> University of Missouri, Columbia
> >>> 
> >>> TEL 573-882-2948
> >>> FAX 573-884-9676
> >>> http://ircf.rnet.missouri.edu
> >>> 
> >>> 
> >>> 
> >> 
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 




More information about the Bioperl-l mailing list