[Bioperl-l] help using SEARCH IO to parse blast results

alison waller alison.waller at utoronto.ca
Mon Nov 26 16:06:35 EST 2007


Hello all,

 

It's the usual story, I'm an engineer turned biologist who now needs help
with bioinformatics so I can analyze huge amounts of data to finish my
thesis.  

 

I am trying to write a script that will parse large blast files (usually
blastx) I also want to be able to specify how many hits I want to report
information on.

Most of the time I will only want information on the top hit, but I want to
have the flexibility to obtain information on say the top5.  I am pretty
sure I have done this wrong, any advice on how to correct my script to do
this, would be great.

 

Thanks so much,

 

Alison

 

 

#!/usr/local/bin/perl -w

 

# Parsing BLAST reports with BioPerl's Bio::SearchIO module

# alison waller November 2007

use strict;

use warnings;

use Bio::SearchIO;

 

# to run type: blast_parse_aw.pl input.txt #of hits

 

my $infile =shift(@ARGV);

my $outfile ="$ARGV[0].parsed";

my $tophit = $ARGV[1]; # I want to specify in the command line how many hits
to report for each query

 

open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n";

open (OUT,">$outfile");

 

$report = new Bio::SearchIO(

         -file=>"$inFile",

              -format => "blast"); 

 

print
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tgaps\t
Qstrand\tHstrand\n";

 

# Go through BLAST reports one by one              

while($result = $report->next_result) 

{

      if ($top_hit=$result->next_hit) # this might be wrong - I want to
specify how many hits to print results for

            # Print some tab-delimited data about this hit

           { 

            print $result->query_name, "\t";

            print $hit->description, "\t";

            print $hit->significance, "\t";

            print $hit->bits,"\t";    

            print $hsp->evalue, "\t";

            print $hsp->percent_identity, "\t";

            print $hsp->length('total'),"\t";

            print $hsp->num_identical,"\t";

            print $hsp->gaps,"\t";

            print $hsp->strand('query'),"\t";

            print $hsp->strand('hit'), "\n";

          }

      else print "no hits\n";

   } 

 

 

 

            

 

 

 

 

******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  

 



More information about the Bioperl-l mailing list