[Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file

Jason Stajich jason at bioperl.org
Sun Nov 29 10:54:53 EST 2009


or
while( my $hit = $result->next_hit ) {
}
On Nov 28, 2009, at 6:55 PM, Mark A. Jensen wrote:

> Hi Tim--
> There's a bug in my code; should be
> for my $hit ($result->hits) {
> ...
> }
> and you're right about the comma. My bad.
>
> But I don't think you need this-- you're already looping over your
> query sequences and doing blastn on each one. So in the middle of
> your loop, you can simply write the blast result that you got:
>
> my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", - 
> format=>"blast" );
> $blio->write_result($result);
>
> and forget about the foreach my $qid loop entirely.
>
> The files should show up in the directory from which you're
> running the script.
> cheers, MAJ
>
>
>
> ----- Original Message ----- From: "Tim Koehler" <timbourine81 at googlemail.com 
> >
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, November 26, 2009 11:02 AM
> Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of  
> eachqueryinnew file
>
>
> ups, sent too early...
>
> Hey Mark,
>
> thanks for the answer. But I am still struggling, especially where  
> to put in
> your code.
>
> Here ist the code I have, so far:
>
> #!/usr/bin/perl -w
>
> ### should I put your code here as push is a perl command?
> my %hits_by_query;
> for ($result->hits) {
> ### I inserted a comma after name}}; if there is no comma, there was  
> the
> error: Scalar found where operator expected at
> 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit"
> ###        (Missing operator before  $hit?)
> ###Useless use of push with no values at
> 12_BLAST_two_sequence_each_query_one_file.PL line 7.
> ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line  
> 7, near
> "} $hit"
> ###BEGIN not safe after errors--compilation aborted at
> 12_BLAST_two_sequence_each_query_one_file.PL line 13.
> push @{$hits_by_query{$hit->name}}, $hit;
> ###here, every time this terror appears: Name "main::result" used  
> only once:
> possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5.
> ###error: Can't call method "hits" on an undefined value at
> 12_BLAST_two_sequence_each_query_one_file.PL line 5.
> }
>
>
> use strict;
> use Bio::Tools::Run::StandAloneBlast;
> use Bio::SeqIO;
> use Bio::SearchIO;
> use Bio::Search::Result::BlastResult;
>
> my $Seq_in = Bio::SeqIO->new (
> -file =>
> "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/ 
> 1_to_BLAST_two_seq.fasta",
> -format => 'fasta'
> );
> while (my $query = $Seq_in->next_seq()) {
> my $factory = Bio::Tools::Run::StandAloneBlast->new(
> 'program' => 'blastn',
> 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/ 
> 3_BLAST_db',
> _READMETHOD => "Blast"
> );
>
> my $blast_report = $factory->blastall($query);
>
> ### Should I need to use a module? are the commands here at the right
> position? errors, e.g., Global symbol "$hit" requires explicit  
> package name
> #my %hits_by_query;
> #for ($result->hits) {
> ### inserted comma after name}}
> # push @{$hits_by_query{$hit->name}}, $hit;
> #}
>
> foreach my $qid ( keys %hits_by_query ) {
> my $result = Bio::Search::Result::BlastResult->new();
> $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
> my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - 
> format=>'blast' );
> $blio->write_result($result);
> }
>
> ###where are the files stored? what is their name. Sorry, but I  
> cannot get
> behind that :(
>
> while( my $result = $blast_report->next_result ) {
> ## $result is a Bio::Search::Result::ResultI compliant object
> while( my $hit = $result->next_hit ) {
>  ## $hit is a Bio::Search::Hit::HitI compliant object
>  while( my $hsp = $hit->next_hsp ) {
>   ## $hsp is a Bio::Search::HSP::HSPI compliant object
>   if( $hsp->length('total') > 50 ) {
>    if ( $hsp->percent_identity >= 75 ) {
>    print  "Query= ",        $result->query_name,
>       "Hit= ",        $hit->name,
>           "Length= ",     $hsp->length('total'),
>           "Percent_id= ", $hsp->percent_identity,
>       "Subject=",        $hsp->hit_string,"\n";
>    }
>   }
>  }
> }
> }
> }
>
> Again, a big thanks in advance :)
>
> All the best,
>
> Tim
>
>
> On Thu, Nov 26, 2009 at 4:52 PM, Tim <timbourine81 at gmail.com> wrote:
>
>> Hey Mark,
>>
>> thanks for the answer
>>
>> On 25.11.2009 20:21, Mark A. Jensen wrote:
>> > whoops: change the following line:
>> > my $blio = Bio::SearchIO->new( -file => $qid.".bls", - 
>> format=>'blast' );
>> >
>> > to
>> >
>> > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - 
>> format=>'blast' );
>> >
>> > (I always forget that...)
>> > MAJ
>> >
>> > ----- Original Message ----- From: "Mark A. Jensen" <maj at fortinbras.us 
>> >
>> > To: "Tim" <timbourine81 at gmail.com>; <bioperl-l at lists.open-bio.org>
>> > Sent: Wednesday, November 25, 2009 1:20 PM
>> > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of  
>> each
>> > queryinnew file
>> >
>> >
>> >> hey Tim--
>> >>
>> >> Sound like you need to go about collecting your queries inside  
>> out:
>> >>
>> >> my %hits_by_query;
>> >> for ($result->hits) {
>> >>  push @{$hits_by_query{$hit->name}} $hit;
>> >> }
>> >>
>> >> I believe now each hash element, keyed by the query name, will  
>> contain
>> >> an arrayref to the set of hits assoc with that query.
>> >>> From here, I believe
>> >>
>> >> use Bio::Search::Result::BlastResult;
>> >> use Bio::SearchIO;
>> >>
>> >> foreach my $qid ( keys %hits_by_query ) {
>> >>  my $result = Bio::Search::Result::BlastResult->new();
>> >>  $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
>> >>  my $blio = Bio::SearchIO->new( -file => $qid.".bls", - 
>> format=>'blast'
>> );
>> >>  $blio->write_result($result);
>> >> }
>> >>
>> >> will do what you want.
>> >>
>> >> hope this helps -
>> >> Mark
>> >>
>> >> ----- Original Message ----- From: "Tim" <timbourine81 at gmail.com>
>> >> To: <bioperl-l at lists.open-bio.org>
>> >> Sent: Wednesday, November 25, 2009 12:40 PM
>> >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each
>> >> query innew file
>> >>
>> >>
>> >>> Dear bioperl users,
>> >>>
>> >>> I am a real newbie and have - maybe a very trivial - question.
>> >>>
>> >>> I searched the mailing list archive and many howtos but I have  
>> not
>> found
>> >>> a concrete answer to my problem. So hopefully you can help me :)
>> >>>
>> >>> Background: I use the latest Bioperl version (installed it two  
>> weeks
>> >>> before).
>> >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta  
>> file
>> >>> including different sequences, I get a BLAST output with many  
>> queries
>> >>> each having several hits / sbjcts.
>> >>>
>> >>> My problem is how to parse *all* hits of *one* query into a  
>> single new
>> >>> file. And this for all the queries I have in my BLAST output  
>> file.
>> >>>
>> >>> Or is it better the other way round; first to make fasta files  
>> with
>> only
>> >>> single sequences inside and BLAST each file? But how can I  
>> automize
>> that
>> >>> using Bioperl?
>> >>>
>> >>> I tried Bio::SearchIO but can only parse all queries and their
>> >>> respective hits in only one file...
>> >>> I think iteration is also necessary here, but I do not really  
>> know how
>> >>> to include that into Bio::SearchIO.
>> >>> Or do I have to use Module:Bio::Index::Blast?
>> >>>
>> >>> I can index a file (see below), but I have no idea what comes  
>> next...
>> >>>
>> >>> ###How I index a file...
>> >>>
>> >>> #!/usr/bin/perl -w
>> >>>
>> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File";
>> >>>
>> >>> use Bio::Index::Fasta;
>> >>>
>> >>>
>> >>> $file_name = "8_to_BLAST_two_seq_index.fasta";
>> >>> $id = "48882";
>> >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx",
>> >>> -write_flag => 1);
>> >>> $inx->make_index($file_name);
>> >>>
>> >>>
>> >>> Hopefully, you can give me at least hints what to look for.
>> >>>
>> >>> A big THANKS in advance!
>> >>>
>> >>> Cheers,
>> >>>
>> >>> Tim
>> >>> _______________________________________________
>> >>> Bioperl-l mailing list
>> >>> Bioperl-l at lists.open-bio.org
>> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>>
>> >>>
>> >>
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> >>
>> >
>>
>> Tim Köhler
> MPI for Terrestrial Microbiology
> Karl-von-Frisch-Straße
> D-35043 Marburg / Germany
>
> Email: koehlerd at mpi-marburg.mpg.de
> Phone: +49 6421 178-740
> Fax:   +49 6421 178-999
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org




More information about the Bioperl-l mailing list