[Bioperl-guts-l] [Bug 2576] New: SearchIO is ignoring an excellent match
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Wed Aug 27 20:35:38 EDT 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2576
Summary: SearchIO is ignoring an excellent match
Product: BioPerl
Version: 1.5 branch
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Bio::Search/Bio::SearchIO
AssignedTo: bioperl-guts-l at bioperl.org
ReportedBy: jayoung at fhcrc.org
CC: jayoung at fhcrc.org
Hi again,
I'm parsing a lot of blast reports (NCBI blastall BLASTN v 2.2.18) using
SearchIO and have been doing some spot checks on output.
I have some blast outputs where SearchIO seems to be ignoring the first hit,
even though it's a good one. But on other outputs it picks up the first hit as
expected. A simplified version of my script is below.
In the example I'm attaching, there are 6 hits. The result object has 6 hits
according to $result->num_hits(). BUT when I cycle through the hits using
$result->next_hit(), the first, really good hit (E value e-122) doesn't appear.
(An aside - did NCBI recently start leaving off the first 1 in Ev-value 1e-122
- I doubt this is the problem as the second hit has E-value e-105 and it parses
fine.)
Another odd thing that might shed some light is this: without a signif
parameter, $result->num_hits gives the correct answer (6 hits), but if I add
-signif=>'1e-5' when I create the SearchIO object, then the result object has
only 5 hits, even though all 6 hits have E-value better than I specified.
I guess one solution is to re-do all the blasts with -m 8 output format but I
would love to stick to doing this with bioperl if possible.
thanks,
Janet
The script:
#!/usr/bin/perl
use warnings;
use strict;
use Bio::SearchIO;
foreach my $file (@ARGV){
#my $blastObj = new Bio::SearchIO(-file=>$file,-format=>'blast',-signif
=>'1e-5');
my $blastObj = new Bio::SearchIO(-file=>$file,-format=>'blast');
while ( my $result = $blastObj->next_result() ) {
print "num hits ", $result->num_hits(), "\n";
while( my $hit = $result->next_hit() ) {
my $hitname = $hit->name();
while( my $hsp = $hit->next_hsp() ) {
my $frac = $hsp->frac_identical();
print "hitname $hitname frac_ident $frac\n";
}
}
}
}
-------------------------------------------------------------------
Dr. Janet Young (Trask lab)
Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168,
P.O. Box 19024, Seattle, WA 98109-1024, USA.
tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung at fhcrc.org
-------------------------------------------------------------------
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Bioperl-guts-l
mailing list