[Bioperl-l] help using SEARCH IO to parse blast results

dennis prickett (IAH-C) dennis.prickett at bbsrc.ac.uk
Wed Nov 28 05:18:26 EST 2007

Dear Alison
Or, if you are absolutely only interested in the top hit you could limit
it to that in the blast  command by adding the parameters  " -b 1 ".  

This will truncate the report to 1 hsp per query (or -b 5 for 5 hsps,
etc).  Your blasts run faster and then you won't have to worry about how
to parse out the top blast hit(s).

However, if there are any caveats for using this parameter that I am not
aware of please let us know. 

Dennis Prickett
Institute of Animal Health
Compton, nr Newbury
United Kingdom

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of alison waller
Sent: 26 November 2007 21:07
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] help using SEARCH IO to parse blast results

Hello all,


It's the usual story, I'm an engineer turned biologist who now needs
help with bioinformatics so I can analyze huge amounts of data to finish
my thesis.  


I am trying to write a script that will parse large blast files (usually
blastx) I also want to be able to specify how many hits I want to report
information on.

Most of the time I will only want information on the top hit, but I want
to have the flexibility to obtain information on say the top5.  I am
pretty sure I have done this wrong, any advice on how to correct my
script to do this, would be great.


Thanks so much,





#!/usr/local/bin/perl -w


# Parsing BLAST reports with BioPerl's Bio::SearchIO module

# alison waller November 2007

use strict;

use warnings;

use Bio::SearchIO;


# to run type: blast_parse_aw.pl input.txt #of hits


my $infile =shift(@ARGV);

my $outfile ="$ARGV[0].parsed";

my $tophit = $ARGV[1]; # I want to specify in the command line how many
hits to report for each query


open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n";

open (OUT,">$outfile");


$report = new Bio::SearchIO(


              -format => "blast"); 




# Go through BLAST reports one by one              

while($result = $report->next_result) 


      if ($top_hit=$result->next_hit) # this might be wrong - I want to
specify how many hits to print results for

            # Print some tab-delimited data about this hit


            print $result->query_name, "\t";

            print $hit->description, "\t";

            print $hit->significance, "\t";

            print $hit->bits,"\t";    

            print $hsp->evalue, "\t";

            print $hsp->percent_identity, "\t";

            print $hsp->length('total'),"\t";

            print $hsp->num_identical,"\t";

            print $hsp->gaps,"\t";

            print $hsp->strand('query'),"\t";

            print $hsp->strand('hit'), "\n";


      else print "no hits\n";










Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5



Bioperl-l mailing list
Bioperl-l at lists.open-bio.org

More information about the Bioperl-l mailing list