[Bioperl-l] blastxml format

Chris Fields cjfields at uiuc.edu
Wed Oct 25 11:04:14 EDT 2006


Iterations (which are related to PSIBLAST) aren't currently handled in
blastxml, which is why the tag isn't being parsed.  I'll give it a look but
I don't think it will be properly fixed anytime soon, since we're gearing up
for a developer release and are sorting out various bugs in relation to
that.

In the meantime, you could always try changing the relevant tag in the
%MAPPING hash in your local copy of Bio::SearchIO::blastxml from
'BlastOutput_query-def' to 'Iteration_query-def', which may do the trick for
you.  I'm a bit reluctant to change this in CVS as it would be better to add
this in when iterations are handled properly by blastxml, and I'm not sure
all BLAST XML varieties have the <Iteration_query-def> tag.

If you want you can add this to the bioperl bugzilla as an enhancement
request to remind us:

http://bugzilla.open-bio.org/

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Massimo Ubaldi
> Sent: Wednesday, October 25, 2006 9:29 AM
> To: bioperl-l List
> Subject: [Bioperl-l] blastxml format
> 
> Hi
> I'm using the script below to parse a blastn output to multiple sequences
> I got the output from the blast web interface asking for xml formatted
> output.
> Everything work fine except that I cannot print the name of each input
> sequence (see below).
> That is, using the line (see below) $result->query_description I got just
> the name of the first sequence. Infact this is defined by the
> <BlastOutput_query-def> tag.
> What I really want is to extract the name that is defined by the
> <Iteration_query-def> tag.
> Now I digged out the bioperl mailing list and other sources but I did not
> find anything to solve this.
> Can somebody help me?
> Thanks alot
> Massimo
> 
> 
>  This is an example of ouput I got
> 
> MRDNA_probe
> 46.1    PREDICTED: Danio rerio similar to mineralocorticoid receptor form
> B
> (LOC562171), mRNA    68354945    XM_685568
> 81.8    Danio rerio VDR-B mRNA, partial cds    68132043    DQ017633
> PREDICTED: Danio rerio similar to Rarab protein (LOC560679), mRNA
> 68420187    XM_684078
> 
> This what I'd like to get
> MRDNA_probe
> 46.1    PREDICTED: Danio rerio similar to mineralocorticoid receptor form
> B
> (LOC562171), mRNA    68354945    XM_685568
> VDRacterm_probe
> 81.8    Danio rerio VDR-B mRNA, partial cds    68132043    DQ017633
> ARalpcterm_probe
> PREDICTED: Danio rerio similar to Rarab protein (LOC560679), mRNA
> 68420187    XM_684078
> 
> This is the script
> #!/usr/bin/perl
> use strict;
> use Bio::SearchIO;
> my $in = new Bio::SearchIO(-format => 'blast',
>                             -file   => 'Blastn_danio.bls');
> open OUTFILE, ">parsed_blastn_danio.txt" or die "Could not open file,
> stopped";
> my $result = $in->next_result;
> print OUTFILE $result->algorithm, "\n";
> print OUTFILE $result->database_name, "\n";
> 
> print OUTFILE "Score", "\t", "Description", "\t", "NCBI gi identifiers",
> "\t", "GenBank Accession", "\n";
> 
> while($result = $in->next_result ) {
>     print OUTFILE $result->query_description, "\n";
>       while( my $hit = $result->next_hit ) {
>            while( my $hsp = $hit->next_hsp ) {
> 
>                 my $acc=$hit->name;
>                 my $description= $hit->description;
> 
>                 $acc =~ /gi\|(\d+)\|\w+\|(\w+)\.\d/;
> 
>                 print OUTFILE
> 
>                   $hit->raw_score, "\t", # Score
>                   $hit->description, "\t", # Description
> 
>                 $1, "\t", $2, "\n";
>          }
>       }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list