[Bioperl-l] bioperl 1.4 SearchIO doesn't work parsingBlast output

Jason Stajich jason.stajich at duke.edu
Thu Feb 9 17:13:16 EST 2006


Uh, that was done in sept see the CVS log...

On Feb 9, 2006, at 4:33 PM, Joel Steele wrote:

> Greetings again,
> Its the colon...
> observe.
>
> -=Code Snippet=-
> #!/usr/bin/perl -w
> use strict;
>
> #the string as reported from your error.
> my $string1 = 'Query  1   WWWKWRW  7';
>
> #your string with a colon thrown in for testing.
> my $string2 = 'Query:  1   WWWKWRW  7';
>
> foreach ($string1, $string2){
> 	if(/^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/){
> 		print "Match Found in $_\n";
> 		print $1."\n";
> 		print $2."\n";
> 		print $3."\n";
> 		print $4."\n";
> 		print $5."\n";
> 	}else{
> 		print "no Match for $_\n";
> 	}
> }
>
> -=End Code=-
>
> The Output
>
> -=Code Snippet=-
> no Match for Query  1   WWWKWRW  7
> Match Found in Query:  1   WWWKWRW  7
> Query:  1
> Query
> 1
> WWWKWRW
> 7
>
> -=End Code=-
>
>
> Now I would suggest changing the regexp
>
> From:
> /^((Query|Sbjct)\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/
>
> To:
> /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/
>
> in SearchIO::Blast.
>
> General suggestion:
> Again I would like to suggest that everyone get use to using the  
> strict
> pragma. Though it may not applicable to this particular problem it  
> becomes
> essential if you wish progress in your use of Perl.
> It is a core module so there is nothing to download from CPAN. It  
> helps with
> development and once your code can run without warnings and errors  
> you can
> remove it. This is not a targeted attack as some may interpret it,  
> rather a
> general FYI for those out there new to Perl or programming in general.
> Better to start learning the rules early before bad habits creep in.
> One more thing. There is a wonderfully supportive Perl community  
> available
> to anyone who wants to join at PerlMonks.org check it out, who  
> knows you may
> even catch a glimpse of Larry Wall while youre there.
>
> -Joel Steele
>
> "The surest way to corrupt a youth is to instruct him to hold in  
> higher
> regard those who think alike than those who think differently." - 
> Nietzsche
>
> "I do not feel obliged to believe that the same God who endowed us  
> with
> sense, reason and intellect has intended us to forego their use." - 
> Galileo
>
>
>
>
>> From: Hubert Prielinger <hubert.prielinger at gmx.at>
>> To: rahall2 at ualr.edu, bioperl-l at bioperl.org, Chris Fields
>> <cjfields at uiuc.edu>,        Jason Stajich <jason.stajich at duke.edu>
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>> parsingBlast	output
>> Date: Thu, 09 Feb 2006 14:13:31 -0600
>> MIME-Version: 1.0
>> Received: from newportal.open-bio.org ([209.59.5.172]) by
>> bay0-mc3-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211);  
>> Thu, 9
>> Feb 2006 13:14:17 -0800
>> Received: from newportal.open-bio.org (localhost.localdomain  
>> [127.0.0.1])by
>> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id  
>> k19LAD2j009778;Thu, 9
>> Feb 2006 16:10:49 -0500
>> Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])by
>> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k19L9xBm009764for
>> <bioperl-l at bioperl.org>; Thu, 9 Feb 2006 16:09:59 -0500
>> Received: (qmail invoked by alias); 09 Feb 2006 21:10:05 -0000
>> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7])
>> [136.159.234.7]by mail.gmx.net (mp018) with SMTP; 09 Feb 2006  
>> 22:10:05
>> +0100
>> X-Message-Info: N4u0pqWW+O09Rw986s70rvz+qniXEeX0FLoTz5maLnA=
>> X-Authenticated: #16854991
>> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>> X-Accept-Language: en-us, en
>> References: <004301c62db4$c9bcbab0$d416a790 at LIBERAL>
>> X-Y-GMX-Trusted: 0
>> X-Greylist: Sender IP whitelisted, not delayed by milter- 
>> greylist-2.0.2
>> (newportal.open-bio.org [127.0.0.1]); Thu, 09 Feb 2006 16:12:08  
>> -0500 (EST)
>> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed
>> bymilter-greylist-2.0.2 (newportal.open-bio.org  
>> [207.154.17.70]);Thu, 09
>> Feb 2006 16:09:59 -0500 (EST)
>> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>> X-Scanned-By: MIMEDefang 2.52
>> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>> X-BeenThere: bioperl-l at lists.open-bio.org
>> X-Mailman-Version: 2.1.7
>> Precedence: list
>> List-Id: Bioperl Project Discussion List <bioperl-l.lists.open- 
>> bio.org>
>> List-Unsubscribe:
>> <http://lists.open-bio.org/mailman/listinfo/bioperl- 
>> l>,<mailto:bioperl-l-request at lists.open-bio.org?subject=unsubscribe>
>> List-Archive: <http://lists.open-bio.org/pipermail/bioperl-l>
>> List-Post: <mailto:bioperl-l at lists.open-bio.org>
>> List-Help: <mailto:bioperl-l-request at lists.open-bio.org?subject=help>
>> List-Subscribe:
>> <http://lists.open-bio.org/mailman/listinfo/bioperl- 
>> l>,<mailto:bioperl-l-request at lists.open-bio.org?subject=subscribe>
>> Errors-To: bioperl-l-bounces at lists.open-bio.org
>> Return-Path: bioperl-l-bounces at lists.open-bio.org
>> X-OriginalArrivalTime: 09 Feb 2006 21:14:17.0706 (UTC)
>> FILETIME=[C95D94A0:01C62DBD]
>>
>> dear roger,
>> this error message I got, when I tried to parse Blast output (version
>> 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>> a lot
>> of Blast output files
>> with version 2.2.13 and for that I don't get any error message.....it
>> just doesn't work
>>
>> Hubert
>>
>>
>>
>> Roger Hall wrote:
>>
>>> Guys - I'm looking at the error message:
>>>
>>> MSG: no data for midline Query  1   WWWKWRW  7
>>> STACK Bio::SearchIO::blast::next_result
>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>> STACK toplevel
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>> This is my line of thought:
>>> 1. "no data for midline $_" is a unique message generated by  
>>> blast.pm in
>> one
>>> location only at the point of a. reading three lines b. dropping  
>>> lines
>> with
>>> spaces only c. identifying the Query, Midline, and Match lines (0  
>>> <= $i <
>> 3)
>>> 2. There is a regexp match that fails in order to reach that error
>> message
>>> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the  
>>> expression
>>> 4. It does anyway
>>> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in  
>>> the blast
>>> reports
>>>
>>> I suspect a newline/chomp/metacharacter issue. Not finding the  
>>> string
>>> anywhere has me thoroughly confused - I asked Hubert for the  
>>> additional
>>> file, assuming that I didn't have it.
>>>
>>> My next thought is to write a quick script to test perl behavior on
>> "Fedora
>>> Core 9".
>>>
>>> Thoughts?
>>>
>>> Did I misread the issue entirely? :}
>>>
>>> Roger
>>>
>>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
>>> Fields
>>> Sent: Thursday, February 09, 2006 10:16 AM
>>> To: 'Jason Stajich'; 'Hubert Prielinger'
>>> Cc: bioperl-l at bioperl.org
>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>> parsing Blast
>>> output
>>>
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>> Sent: Thursday, February 09, 2006 9:13 AM
>>>> To: Hubert Prielinger
>>>> Cc: Chris Fields; bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>> parsing Blast output
>>>>
>>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>
>>>>
>>>>> hi chris,
>>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>
>>>>>
>>>> working,
>>>>
>>>>
>>>>> do you have any ohter idea, the problem I have is that I
>>>>>
>>>>>
>>>> have to parse
>>>>
>>>>
>>>>> a lot of textfiles....
>>>>> or shall I look for another option to parse those files...
>>>>>
>>>>> regards
>>>>> Hubert
>>>>>
>>>>>
>>>> The code from Bioperl 1.5.1 works fine for me for blast
>>>> 2.2.13 reports but unless you post your blast report we can't
>>>> really determine the problem.
>>>>
>>>> If you are still getting the same error like this I am not
>>>> convinced you have upgraded to 1.5.1 which includes a fix in
>>>> the fact that NCBI changed the HSP result format to remove
>>>> the ':' from the Query/Sbjct prefixes.  We fixed this as soon
>>>> as it was apparent sometime in September.
>>>>
>>>>
>>>>
>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>> STACK toplevel
>>>>>>>
>>>>>>>
>>>>>>>
>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>> Blast.pl:21
>>>>
>>>> If you are just getting no results but also no warnings wrt
>>>> parsing, are you sure your logic is correct?
>>>>
>>>> If you remove your filters do you see all the HSPS?
>>>>
>>>>
>>>> while (my $result = $search->next_result) {
>>>>     print $result->query_name, "\n";
>>>>     #iterate over each hit on the query sequence
>>>>     while (my $hit = $result->next_hit) {
>>>> 	print $hit->name, "\n";
>>>>         #iterate over each HSP in the hit
>>>>         while (my $hsp = $hit->next_hsp) {
>>>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>> hit_string, "\n";
>>>>        }
>>>>    }
>>>> }
>>>>
>>>>
>>>
>>> I tested some of the BLAST results that Hubert sent Roger and me  
>>> with a
>>> similar script to the above.  I removed the file parsing logic  
>>> and it
>> seemed
>>> to work just fine.  It may very well be a logic issue or that he  
>>> hasn't
>>> installed the latest fix.
>>>
>>> It's a funny thing, though.  When I tried using blastcl3 (v.  
>>> 2.2.13),
>> even
>>> though the returned output was from nr, the top of the blast output
>> showed
>>> that it was v2.2.12:
>>>
>>> BLASTP 2.2.12 [Aug-07-2005]
>>>
>>> I double-checked my local version and it's definitely v.2.2.13:
>>> -------------------------------------
>>> C:\Perl\Scripts>blastcl3 -
>>>
>>> blastcl3 2.2.13   arguments:...
>>> -------------------------------------
>>>
>>> If you use RemoteBlast using the same settings, the version in  
>>> the header
>>> looks like this:
>>>
>>> BLASTP 2.2.13 [Nov-27-2005]
>>>
>>> I'm wondering if all the blast executables (blast and netblast)  
>>> from NCBI
>>> have text output like v.2.2.12, while the wwwblast outputs a new  
>>> format
>>> (2.2.13).  I'll ask blast-help at NCBI about this.
>>>
>>>
>>>
>>>> To clarify some stuff -
>>>> Chris I don't necessarily think the XML is best way forward
>>>> for BLAST reports generated locally, it isn't as detailed as
>>>> the Text format and it is what most people expect to be able
>>>> to scroll through and parse -- it is also harder for the
>>>> format to change dramatically if you have a static binary on
>>>> your machine =).  I think for remoteblast the XML format
>>>> should be the way forward but I expect Bioperl to maintain
>>>> support of any plain text BLAST report format that people use
>>>> on a regular basis.
>>>>
>>>>
>>>>
>>>
>>> Does XML lack some specific info that text output has?  Didn't  
>>> know that.
>>  I
>>> believe that XML should be default in RemoteBlast since it will not
>> break,
>>> but I agree with you about text output.  I also agree that it  
>>> will need
>>> somebody to maintain it constantly, much like RemoteBlast.
>>>
>>>
>>>
>>>> -jason
>>>>
>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>
>>>>>
>>>>>> My guess is you're running into text parsing problems in
>>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>> (1.5.1) or
>>>>>> bioperl-live (CVS), then see the bug below.
>>>>>>
>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>
>>>>>> I think the first problem you ran into is solved in bioperl  
>>>>>> 1.5.1,
>>>>>> the last problem (more recent, not related to the first) has been
>>>>>> fixed but hasn't been committed to bioperl-live yet.  The fixed
>>>>>> SearchIO::blast is available in the link above, but
>>>>>>
>>>>>>
>>>> realize it hasn't
>>>>
>>>>
>>>>>> been committed yet and may change.
>>>>>>
>>>>>> Christopher Fields
>>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
>>>>>> University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of  
>>>>>>> Hubert
>>>>>>> Prielinger
>>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>> To: bioperl-l at bioperl.org
>>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>
>>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>>> output
>>>>>>>
>>>>>>> Hi,
>>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>>> Bio::SearchIO, I get the following error message:
>>>>>>>
>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>> STACK toplevel
>>>>>>>
>>>>>>>
>>>>>>>
>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>> Blast.pl:21
>>>>
>>>>
>>>>>>> is that a bug......
>>>>>>>
>>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>>> anything.....
>>>>>>> I'm using bioperl 1.4
>>>>>>>
>>>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>>>
>>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>>> Output (version 2.2.12), but I don't remember which
>>>>>>>
>>>>>>>
>>>> bioperl version
>>>>
>>>>
>>>>>>> I had installed
>>>>>>>
>>>>>>> thanks in advance
>>>>>>>
>>>>>>> Hubert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> --
>>>> Jason Stajich
>>>> Duke University
>>>> http://www.duke.edu/~jes12
>>>>
>>>>
>>>>
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




More information about the Bioperl-l mailing list