[Bioperl-l] Error parsing BLAST report

Freimuth, Robert freimuth at pathology.wustl.edu
Thu Aug 17 15:53:04 EDT 2006


Hi,

Thanks for the suggestion.  I have entered it as bug 2081
(http://bugzilla.open-bio.org/show_bug.cgi?id=2081) and uploaded both
the test code and BLAST report as attachments.

Thanks for the help,
Bob



> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu] 
> Sent: Thursday, August 17, 2006 10:58 AM
> To: Freimuth, Robert; 'Brian Osborne'; bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Error parsing BLAST report
> 
> Robert,
> 
> This sounds like a possible bug; the error you are getting is 
> from BLAST
> 2.2.11 output, so it should work.  The BLAST parsing errors 
> fixed in CVS had
> to do with parsing BLAST 2.2.13 (and later) output.
> 
> Could you add this as a bug to Bugzilla with your test script 
> and test case
> data that generates the error?
> 
> How to post the bug:
> 
> http://www.bioperl.org/wiki/Bugs
> 
> Bugzilla:
> 
> http://bugzilla.open-bio.org/
> 
> Chris
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Freimuth, Robert
> > Sent: Wednesday, August 16, 2006 11:42 PM
> > To: Brian Osborne; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Error parsing BLAST report
> > 
> > Hi,
> > 
> > Thank you for your reply.  I downloaded bioperl-1.5.1 from
> > http://bioperl.org/DIST/ and installed it (which appeared 
> successful),
> > but the one-liner:
> > 
> > perl -MBio::Root::Version -e 'print 
> $Bio::Root::Version::VERSION, "\n"'
> > 
> > prints 1.5 (I expected 1.5.1).
> > 
> > When I run the test case that I reported earlier, I get the 
> following
> > output:
> > 
> > -------------------- WARNING ---------------------
> > MSG: There is no HSP data for hit 'ENSP00000327738'.
> > You have called a method (Bio::Search::Hit::GenericHit::length_aln)
> > that requires HSP data and there was no HSP data for this hit,
> > most likely because it was absent from the BLAST report.
> > Note that by default, BLAST lists alignments for the first 250 hits,
> > but it lists descriptions for 500 hits. If this is the case,
> > and you care about these hits, you should re-run BLAST using the
> > -b option (or equivalent if not using blastall) to increase 
> the number
> > of alignments.
> > 
> > ---------------------------------------------------
> > Alignment length for ENSP00000327738 is -
> > Alignment length for ENSP00000350182 is 250
> > Alignment length for ENSP00000327738 is 398
> > 
> > Could someone that is running 1.5.1 please verify the output of the
> > one-liner above (did I somehow get the wrong file from the 
> ftp site?)
> > and try to reproduce the error with the test case?
> > 
> > Thanks for the help.  I'm stumped.
> > 
> > Bob
> > 
> > > -----Original Message-----
> > > From: Brian Osborne [mailto:osborne1 at optonline.net]
> > > Sent: Wednesday, August 16, 2006 6:21 PM
> > > To: Freimuth, Robert; bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] Error parsing BLAST report
> > >
> > > Robert,
> > >
> > > The standard answer to a complaint about SearchIO these days
> > > is to upgrade
> > > to version 1.5.1 - what Bioperl version are you using?
> > >
> > > Brian O.
> > >
> > >
> > > On 8/16/06 3:56 PM, "Freimuth, Robert"
> > > <freimuth at pathology.wustl.edu> wrote:
> > >
> > > > Hello,
> > > >
> > > > I'm trying to parse a BLAST report using the following code:
> > > >
> > > > use warnings;
> > > > use strict;
> > > >
> > > > use Bio::SearchIO;
> > > >
> > > > my $file = 'NP_006065_blast.out';
> > > >
> > > > my $searchio = new Bio::SearchIO( -format => 'blast',
> > > >                                   -file   => $file );
> > > >
> > > > while( my $result = $searchio->next_result() )
> > > > {
> > > >     while( my $hit = $result->next_hit )
> > > >     {
> > > >             my $hit_acc_num = $hit->accession();
> > > >
> > > >             # get the total length of the aligned region
> > > for query or
> > > > sbjct seq
> > > >             # (includes all HSPs, calculated after tiling)
> > > >
> > > >             my $align_len = $hit->length_aln( 'query' );
> > > >
> > > >             print "Alignment length for $hit_acc_num is
> > > $align_len\n";
> > > >     }
> > > > }
> > > >
> > > > There are 104 one-line descriptions in the report, and
> > > alignments for
> > > > each one of them (the blast report was created using
> > > > b_num_alignments_shown => 500 and 
> v_num_descriptions_shown => 500).
> > > > However, when I run the above code I get 14 errors like the
> > > following:
> > > >
> > > > -------------------- WARNING ---------------------
> > > > MSG: There is no HSP data for hit 'ENSP00000327738'.
> > > > You have called a method 
> (Bio::Search::Hit::GenericHit::length_aln)
> > > > that requires HSP data and there was no HSP data for this hit,
> > > > most likely because it was absent from the BLAST report.
> > > > Note that by default, BLAST lists alignments for the 
> first 250 hits,
> > > > but it lists descriptions for 500 hits. If this is the case,
> > > > and you care about these hits, you should re-run BLAST using the
> > > > -b option (or equivalent if not using blastall) to increase
> > > the number
> > > > of alignments.
> > > >
> > > > ---------------------------------------------------
> > > >
> > > > There is an alignment for this (and the other 13 
> sequences) in the
> > > > report.  In fact, if I edit the report and delete all but the
> > > > description and the alignment for ENSP00000327738, it
> > > parses fine (no
> > > > error).
> > > >
> > > > I continued editing the report and produced the following
> > > minimal test
> > > > case that reproduces the error.  Note that the description for
> > > > ENSP00000350182 appears twice, BUT THE ERROR IS FOR 
> ENSP00000327738.
> > > >
> > > > *********** BLAST REPORT FOR TEST CASE ***********
> > > >
> > > > BLASTP 2.2.11 [Jun-05-2005]
> > > >
> > > >
> > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
> > > > Schaffer,
> > > > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. 
> Lipman (1997),
> > > > "Gapped BLAST and PSI-BLAST: a new generation of protein
> > > database search
> > > > programs",  Nucleic Acids Res. 25:3389-3402.
> > > >
> > > > Query= NP_006065
> > > >          (442 letters)
> > > >
> > > > Database: Homo_sapiens.NCBI36.apr.pep.fa
> > > >            48,851 sequences; 23,910,368 total letters
> > > >
> > > > Searching..................................................done
> > > >
> > > >
> > >      Score
> > > > E
> > > > Sequences producing significant alignments:
> > >      (bits)
> > > > Value
> > > >
> > > > ENSP00000350182 pep:novel 
> clone::BX322644.8:4905:15090:-1 gene:E...
> > > > 120   3e-27
> > > > ENSP00000350182 pep:novel 
> clone::BX322644.8:4905:15090:-1 gene:E...
> > > > 120   3e-27
> > > > ENSP00000327738 pep:known-ccds 
> chromosome:NCBI36:4:189297592:189...
> > > > 115   8e-26
> > > >
> > > >> ENSP00000350182 pep:novel clone::BX322644.8:4905:15090:-1
> > > > gene:ENSG00000137397
> > > >            transcript:ENST00000357569
> > > >           Length = 425
> > > >
> > > >  Score =  120 bits (301), Expect = 3e-27
> > > >  Identities = 76/261 (29%), Positives = 140/261 (53%), 
> Gaps = 21/261
> > > > (8%)
> > > >
> > > > Query: 9
> > > IEKEVTCPICLELLTEPLSLDCGHSFCQACITAKIKESVIISRGESSCPVCQTRFQPGNL
> > > > 68
> > > >            +++EV CPICL++L +P+++DCGH+FC  CIT +I E+   S G
> > > CP+C+T  +   +
> > > > Sbjct: 10
> > > LQEEVICPICLDILQKPVTIDCGHNFCLKCIT-QIGET---SCGFFKCPLCKTSVRKNAI
> > > > 65
> > > >
> > > > Query: 69
> > > RPNRHLANIVERVKEVKMSP-QEGQKRDVCEHHGKKLQIFCKEDGKVICWVCELSQEHQG
> > > > 127
> > > >            R N  L N+VE+++ ++ S  Q  +K   C  H +    FC++DGK
> > > +C+VC  S++H+
> > > > Sbjct: 66
> > > RFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCCESKDHKS
> > > > 125
> > > >
> > > > Query: 128
> > > HQTFRINEVVKECQEKLQVALQRLIKEDQEAEKLED------DIRQERTAWKIERQKILK
> > > > 181
> > > >            H    I E  +  Q ++Q  +Q L ++++E  +++       D+  ++
> > >   + E+Q+IL
> > > > Sbjct: 126
> > > HNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQV--EHEKQRILT
> > > > 183
> > > >
> > > > Query: 182
> > > GFNEMRVILDNEEQRELQKL----EEGEVNVLDNLAAATDQLVQQRQDASTLISDLQRRL
> > > > 237
> > > >             F  +  +L+ E+   L ++     EG       +A+   QL     D
> > >   L+  L+ +
> > > > Sbjct: 184
> > > EFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQL----NDLKKLVDSLKTKQ
> > > > 239
> > > >
> > > > Query: 238 TGSSVEMLQDVIDVMKRSESW 258
> > > >                 ++L+     + RSE +
> > > > Sbjct: 240 NMPPRQLLEVTQPHLPRSEEF 260
> > > >
> > > >
> > > >> ENSP00000327738 pep:known-ccds
> > > > chromosome:NCBI36:4:189297592:189305643:1
> > > >            gene:ENSG00000184108 transcript:ENST00000332517
> > > >            CCDS3851.1
> > > >           Length = 468
> > > >
> > > >  Score =  115 bits (289), Expect = 8e-26
> > > >  Identities = 101/410 (24%), Positives = 180/410 (43%),
> > > Gaps = 39/410
> > > > (9%)
> > > >
> > > > Query: 8
> > > DIEKEVTCPICLELLTEPLSLDCGHSFCQACITAKIKESVIISRGESSCPVCQTRFQPGN
> > > > 67
> > > >            ++ +E+TC ICL+  + P++ +CGHSFC  C+    +E
> > > SCP C    +  +
> > > > Sbjct: 9
> > > NLREELTCFICLDYFSSPVTTECGHSFCLVCLLRSWEE----HNTPLSCPECWRTLEGPH
> > > > 64
> > > >
> > > > Query: 68
> > > LRPNRHLANIVERVKEVKMSPQEGQKRDVCEHHGK-----KLQIFCKEDGKVICWVCELS
> > > > 122
> > > >             + N  L  +    ++++   Q  Q  D    +G+     K     ++
> > > G        ++
> > > > Sbjct: 65
> > > FQSNERLGRLASIARQLR--SQVLQSEDEQGSYGRMPTTAKALSDDEQGGSAF-----VA
> > > > 117
> > > >
> > > > Query: 123
> > > QEHQGHQTFRINEVVKECQEKLQVALQRLIKEDQEA------EKLEDDIRQERTAWKIER
> > > > 176
> > > >            Q H  ++    +E  +  +EKLQ  L  L    +EA      EK
> > > + QE T  K  +
> > > > Sbjct: 118
> > > QSHGANRVHLSSEAEEHHREKLQEILNLLRVRRKEAQAVLTHEKERVKLCQEET--KTCK
> > > > 175
> > > >
> > > > Query: 177
> > > QKILKGFNEMRVILDNEEQRELQKLEEGEVNVLDNLAAATDQLVQQRQDASTLISDLQRR
> > > > 236
> > > >            Q ++  + +M   L  EEQ +LQ LE+ E   +  L     +L QQ +
> > >  S +I+ ++
> > > > Sbjct: 176
> > > QVVVSEYMKMHQFLKEEEQLQLQLLEQEEKENMRKLRNNEIKLTQQIRSLSKMIAQIESS
> > > > 235
> > > >
> > > > Query: 237
> > > LTGSSVEMLQDVIDVMKRSESWTXXXXXXXXXXXXXXFRVPDLSGMLQVLKELTDVQYYW
> > > > 296
> > > >               S+ E L++V   ++RSE                   +  
> ++GM ++L++ +
> > > > Sbjct: 236
> > > SQSSAFESLEEVRGALERSE----PLLLQCPEATTTELSLCRITGMKEMLRKFS------
> > > > 285
> > > >
> > > > Query: 297
> > > VDVMLNPGSATSNVAISVDQRQVKTVRTCTFKNSNPCDF-SAFGVFGCQYFSSGKYYWEV
> > > > 355
> > > >             ++ L+P +A + + +S D + VK   +      NP  F  +  V G
> > > Q F+SG++YWEV
> > > > Sbjct: 286
> > > TEITLDPATANAYLVLSEDLKSVKYGGSRQQLPDNPERFDQSATVLGTQIFTSGRHYWEV
> > > > 345
> > > >
> > > > Query: 356 
> DVSGKIAWILGVHSKISSLNKRKSSGFAFDPSVNYSKVYSRYRPQYGYWV 405
> > > >            +V  K  W +G+     S    +       P   +S +  +    Y  WV
> > > > Sbjct: 346 
> EVGNKTEWEVGICKDSVS----RKGNLPKPPGDLFSLIGLKIGDDYSLWV 391
> > > >
> > > >
> > > >   Database: Homo_sapiens.NCBI36.apr.pep.fa
> > > >     Posted date:  Jun 15, 2006  8:56 PM
> > > >   Number of letters in database: 23,910,368
> > > >   Number of sequences in database:  48,851
> > > >
> > > > Lambda     K      H
> > > >    0.319    0.133    0.398
> > > >
> > > > Gapped
> > > > Lambda     K      H
> > > >    0.267   0.0410    0.140
> > > >
> > > >
> > > > Matrix: BLOSUM62
> > > > Gap Penalties: Existence: 11, Extension: 1
> > > > Number of Hits to DB: 20,900,506
> > > > Number of Sequences: 48851
> > > > Number of extensions: 899179
> > > > Number of successful extensions: 6075
> > > > Number of sequences better than 1.0e-25: 105
> > > > Number of HSP's better than  0.0 without gapping: 18
> > > > Number of HSP's successfully gapped in prelim test: 87
> > > > Number of HSP's that attempted gapping in prelim test: 5632
> > > > Number of HSP's gapped (non-prelim): 157
> > > > length of query: 442
> > > > length of database: 23,910,368
> > > > effective HSP length: 107
> > > > effective length of query: 335
> > > > effective length of database: 18,683,311
> > > > effective search space: 6258909185
> > > > effective search space used: 6258909185
> > > > T: 11
> > > > A: 40
> > > > X1: 16 ( 7.4 bits)
> > > > X2: 38 (14.6 bits)
> > > > X3: 64 (24.7 bits)
> > > > S1: 41 (21.8 bits)
> > > > S2: 289 (115.9 bits)
> > > >
> > > > *********** END BLAST REPORT FOR TEST CASE ***********
> > > >
> > > > Any ideas?
> > > >
> > > > Thanks,
> > > >
> > > > Bob
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > >
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 



More information about the Bioperl-l mailing list