[Bioperl-l] Can't parse blast report written by Bio::SearchIO::Writer::TextResultWriter

Chris Fields cjfields at uiuc.edu
Thu May 8 20:03:11 EDT 2008


You can always post it as an enhancement request in bugzilla.  I don't  
think it would be too hard to implement.

chris

On May 8, 2008, at 5:35 PM, Prachi Shah wrote:

>> I suspect somehow you are not reconstituting the Hit or Result  
>> objects properly,
>> but I didn't try and debug this myself.
>
> Its possible, but I haven't been to point out what is going wrong. But
> then, the writer object is able to write the report without incident.
> I am at a loss.
>
>> You can specify a sort order function to the Result object now to  
>> specify the Hit order,
>> maybe we should add sort function to Hit object for retrieving the  
>> underlying HSPs in a
>> programmable order.  Seems like that would be a cleaner fix.
>
> That would be ideal! But, until that is available, I will have to
> make-do with such a solution.
>
> Thanks,
> Prachi
>
>
>> On May 8, 2008, at 1:54 PM, Prachi Shah wrote:
>>
>>> Hi all,
>>>
>>> I am trying to order of HSPs within each BLAST Hit in the order of
>>> ascending P-values. So, I parse my WU-BLAST report using  
>>> Bio::SearchIO
>>> and create new Result, Hit and HSP objects in the order and then  
>>> write
>>> out another BLAST report with the
>>> Bio::SearchIO::Writer::TextResultWriter module. All this works fine.
>>> But, when I try to parse this new blast report with
>>> Bio::SearchIO::blast, I get the following error:
>>>
>>> ------------- EXCEPTION  -------------
>>> MSG: no data for midline Query: 0       1
>>> STACK Bio::SearchIO::blast::next_result
>>> /tools/perl/5.6.1/lib/site_perl/5.6.1/Bio/SearchIO/blast.pm:1151
>>> STACK toplevel bin/testBlastParse.pl:12
>>> --------------------------------------
>>>
>>> I have copied below sample sections of both blast reports and the
>>> code. Any hints/ pointers/ suggestions are greatly appreciated.
>>>
>>> Thanks,
>>> Prachi
>>>
>>>
>>>
>>> The old vs new blast reports look slightly different, esp. note the
>>> HSP start and stop coordinates for the QUERY sequence.
>>>
>>> **Snippet of OLD blast report (generated by WU-BLAST):
>>> ----------------------------------------------------------------------------------------------------
>>> Query=  orf19.4890
>>>       (4931 letters)
>>>
>>> Database:  Ca21_Chromosomes
>>>          9 sequences; 14,324,492 total letters.
>>> Searching.... 
>>> 10....20....30....40....50....60....70....80....90....100% done
>>>
>>> WARNING:  hspmax=1000 was exceeded by 8 of the database sequences,  
>>> causing the
>>>         associated cutoff score, S2, to be transiently set as high  
>>> as 113.
>>>
>>>                                                                     
>>> Smallest
>>>                                                                      Sum
>>>                                                             High   
>>> Probability
>>> Sequences producing High-scoring Segment Pairs:               
>>> Score  P(N)      N
>>>
>>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)          
>>> 24655  0.        1
>>> Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides)           
>>> 1682  3.4e-68   3
>>> Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides)            
>>> 908  3.0e-34   3
>>> Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides)            
>>> 859  4.7e-30   1
>>> Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides)             
>>> 492  7.3e-24   3
>>> Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides)            
>>> 528  9.8e-21   2
>>> Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides)            
>>> 520  1.4e-19   5
>>> Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides)            
>>> 502  1.7e-14   2
>>> Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides)          
>>> 313  2.9e-06   2
>>>
>>>
>>>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)
>>>
>>>       Length = 3,188,577
>>>
>>> Plus Strand HSPs:
>>>
>>> Score = 506 (82.0 bits), Expect = 4.9e-14, P = 4.9e-14
>>> Identities = 850/1549 (54%), Positives = 850/1549 (54%), Strand =  
>>> Plus / Plus
>>>
>>> Query:   3450 ATGCATATGGTAATGTTAA-AATCACTGATTTTGGA- 
>>> TTTTGTGCTAAATTAAC-T-GAT 3505
>>>             | | ||| | | || |||| |||  ||||| ||| | ||||| || |  || |   
>>> | | |
>>> Sbjct: 155924 AGGGATACGATTAT-TTAAGAATT-CTGATATTGAAATTTTG-GC- 
>>> ATTTTCATATAGTT
>>> 155979
>>>
>>> Query:   3506 CAAAGA--AATAAACGTGCC-ACAATGGTGGGGACACCATATTGG- 
>>> ATGGCACCTGAAGT 3561
>>>             |||| |  ||||||  |    | |||| ||     | ||| | |  |||   |   
>>> | | |
>>> Sbjct: 155980 CAAACATTAATAAATATATTGAAAATGTTGATTTAATCAT-TAGTCATG--- 
>>> CTGGTACT
>>> 156035
>>>
>>> Query:   3562  
>>> GGTTAAACAAAAGGAATATGATGAAAAAGTTGATGTTTGGTCATTGGGGATTATGACTAT 3621
>>>             || | ||  |  |  || || | | |   ||||   |    ||||      
>>> ||||   ||
>>> Sbjct: 156036 GGATCAATCATTG--AT-TGTTTACAT--TTGAA-- 
>>> TAAACCATTAATTGTTATTGTTAA
>>> 156088
>>>
>>> Query:   3622 TGAAATGATTGAAGGAGAACCACCTTATTTGAA-T- 
>>> GAAGAACCATTAAAAGCATTATAT 3679
>>> ----------------------------------------------------------------------------------------------------
>>>
>>> **Snippet of NEW blast report (generated using
>>> Bio::SearchIO::Writer::TextResultWriter)
>>> ----------------------------------------------------------------------------------------------------
>>> uery= orf19.4890
>>>      (4,931 letters)
>>>
>>> Database: Ca21_Chromosomes
>>>          9 sequences; 14,324,492 total letters
>>>
>>>                                                                 
>>> Score       E
>>> Sequences producing significant alignments:                       
>>> (bits)    value
>>> Ca21chr1 Assembly 21, Ca21chr1 (3188577  
>>> nucleotides)                24655  0.
>>> Ca21chr5 Assembly 21, Ca21chr5 (1190941 nucleotides)
>>> 1682  3.4e-68
>>> Ca21chr6 Assembly 21, Ca21chr6 (1033553 nucleotides)
>>> 908   3.0e-34
>>> Ca21chr2 Assembly 21, Ca21chr2 (2232049 nucleotides)
>>> 859   4.7e-30
>>> Ca21chr7 Assembly 21, Ca21chr7 (949626 nucleotides)
>>> 492   7.3e-24
>>> Ca21chr4 Assembly 21, Ca21chr4 (1603475 nucleotides)
>>> 528   9.8e-21
>>> Ca21chrR Assembly 21, Ca21chrR (2286425 nucleotides)
>>> 520   1.4e-19
>>> Ca21chr3 Assembly 21, Ca21chr3 (1799426 nucleotides)
>>> 502   1.7e-14
>>> Ca19-mtDNA Assembly 19, Ca19-mtDNA (40420 nucleotides)
>>> 313   2.9e-06
>>>
>>>
>>>> Ca21chr1 Assembly 21, Ca21chr1 (3188577 nucleotides)
>>>
>>>        Length = 3188577
>>>
>>> Score = 3705.3 bits (24655), Expect = 0., P = 0.
>>> Identities = 4931/4931 (100%)
>>> Frame =  -1 / +1
>>>
>>> Query: 1        
>>> ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT -58
>>>               
>>> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>>> Sbjct: 2248574  
>>> ATAAAGGATGCCAAATAGTAGTAGTAAAATAGTAAATAGAATTGCAAAACAAAAATGATT
>>> 2248633
>>>
>>> Query: -59      
>>> AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG -118
>>>               
>>> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>>> Sbjct: 2248634  
>>> AAATAGCCCTTTATCAATAAATTTTTAAAGTTAGTTTCTTCTGGAACCCTACCCTCTTGG
>>> 2248693
>>>
>>> Query: -119     
>>> TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG -178
>>>               
>>> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>>> Sbjct: 2248694  
>>> TGTTAATCTTTTAAGTTAATATTTATAGTTAATAAAGTAGAAGTGTCTATTTATTGATTG
>>> 2248753
>>>
>>> Query: -179     
>>> TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA -238
>>>               
>>> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>>> Sbjct: 2248754  
>>> TTGTTGTTGTTGATTAAGAATATAAAGAAAAACAGAAAAGAAAAAAAGAAGGTTTAAAAA
>>> 2248813
>>>
>>> Query: -239     
>>> AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA -298
>>>               
>>> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>>> Sbjct: 2248814  
>>> AGTTAATTGTGAAGTAAAAGGGTTGAAAAATTTTTTTTTTTTCTGTTTCTCTCTTTGAGA
>>> 2248873
>>>
>>> Query: -299     
>>> TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA -358
>>>               
>>> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>>> Sbjct: 2248874  
>>> TTCTTTGACATATTTATTATTATAACACTATGCTATACTAAAAACAGTACTACCAATTGA
>>> 2248933
>>>
>>> Query: -359     
>>> ATTAAATTAAATTAAATTAAATTAAATTATTAGACCAATTTCAATAAAGATAAGCAATTT -418
>>>
>>> ----------------------------------------------------------------------------------------------------
>>>
>>> **Here is the snippet of code that reads the old report, generates  
>>> new
>>> objects and writes new report:
>>> ----------------------------------------------------------------------------------------------------
>>>   my $blast_report = Bio::SearchIO->new(-format => 'blast',
>>>                     -file   => $blastOutputTmp);
>>>
>>>   my $writer =
>>> Bio::SearchIO::Writer::TextResultWriter->new(-no_wublastlinks => 0);
>>>   my $out_blast_report = Bio::SearchIO->new(-writer => $writer,
>>>                         -file   => ">$blastOutputFile");
>>>
>>>   my $sorted_blast_report;
>>>
>>>   while( my $result = $blast_report->next_result ) {
>>>
>>>       my (%parameters, %statistics);
>>>
>>>   foreach my $param ($result->available_parameters) {
>>>
>>>       $parameters{$param} = $result->get_parameter($param);
>>>   }
>>>
>>>   foreach my $stat ($result->available_statistics) {
>>>
>>>       $statistics{$stat} = $result->get_statistic($stat);
>>>   }
>>>
>>>       my $generic_result =
>>> Bio::Search::Result::BlastResult->new(-query_name          =>
>>> $result->query_name,
>>>                                  -query_length        =>
>>> $result->query_length,
>>>                                  -database_name       =>
>>> $result->database_name,
>>>                                  -database_entries    =>
>>> $result->database_entries,
>>>                                  -parameters          => \ 
>>> %parameters,
>>>                                  -statistics          => \ 
>>> %statistics,
>>>                                  -algorithm           => $result- 
>>> >algorithm,
>>>                                  -query_description   =>
>>> $result->query_description,
>>>                                  -algorithm_reference =>
>>> $result->algorithm_reference,
>>>                                  -algorithm_version   =>
>>> $result->algorithm_version,
>>>                                  -database_letters    =>
>>> $result->database_letters);
>>>
>>>       while( my $hit = $result->next_hit ) {
>>>
>>>       my $generic_hit = Bio::Search::Hit::BlastHit->new(-name
>>> => $hit->name,
>>>                                 -algorithm    => $hit->algorithm,
>>>                                 -description  => $hit->description,
>>>                                 -length       => $hit->length,
>>>                                 -score        => $hit->score,
>>>                                 -bits         => $hit->bits,
>>>                                 -significance => $hit- 
>>> >significance);
>>>
>>>       my (@hsp_sorted, @hsps);
>>>           while( my $hsp = $hit->next_hsp ) {
>>>
>>>           push(@hsps, $hsp);
>>>       }
>>>
>>>       @hsp_sorted = sort {$a->pvalue <=> $b->pvalue} @hsps;
>>>
>>>       for(my $i=0; $i<=$#hsp_sorted; $i++) {
>>>
>>>           $generic_hit->add_hsp($hsp_sorted[$i]);
>>>
>>>       }
>>>
>>>       $generic_result->add_hit($generic_hit);
>>>
>>>   }
>>>
>>>   $out_blast_report->write_result($generic_result);
>>>
>>>   }
>>> ----------------------------------------------------------------------------------------------------
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





More information about the Bioperl-l mailing list