[Bioperl-l] Remote blast

Roopa Raghuveer rtbio.2009 at gmail.com
Fri Nov 20 10:52:09 EST 2009


Hello everybody,

I have tried to use Remote blast on Trypanasoma brucei sequences and could
get certain hits.But I am unable to retrieve the complete sequence from
where I got hits.
i.e., I am unable to parse the blast output file for getting the complete
sequences of the hits. Here is my code.

#!/usr/bin/perl -w
use Bio::SearchIO;
my $blast_report = new Bio::SearchIO ('-format' => 'blast',
                                      '-file'   => $ARGV[0]);
my $result = $blast_report->next_result;
my $level = $ARGV[1];

while( my $hit = $result->next_hit) {
       print $hit->name;
       push(@arr1,$hit->name);
       while( my $hsp = $hit->next_hsp()) {
        if ($hsp->frac_identical() >= $level) {
            #print $hsp->hit_string, "\n";
            push(@arr,$hsp->hit_string);
        }
    }
}
$k=@arr1;
for($i=0;$i<$k;$i++){
push(@arr2,split(/|/,$arr1[$i]));
#print "$arr[$i]\n";
}
#$t=@arr2;

Here,I am trying to use the blast output file and get the complete sequence
where I found a hit  but  I could not get the complete sequence.

i/p:-
Last login: Mon Nov 16 11:57:22 on console
Welcome to Darwin!
lmbicip-mac1:~ cip$ ssh admin at 141.84.66.66
The authenticity of host '141.84.66.66 (141.84.66.66)' can't be established.
RSA key fingerprint is 2d:4a:09:1d:2e:f3:51:c7:ba:8b:29:37:36:f6:44:db.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '141.84.66.66' (RSA) to the list of known hosts.
Password:
Last login: Fri Nov 20 13:52:57 2009 from 10.153.189.239
Have a lot of fun...
admin at BosLinux:~> clear













admin at BosLinux:~> cd Documents/
admin at BosLinux:~/Documents> clear






















admin at BosLinux:~/Documents> vim blast.pl
admin at BosLinux:~/Documents> clear






















admin at BosLinux:~/Documents> vim nnn.pl
admin at BosLinux:~/Documents> vim other.pl
admin at BosLinux:~/Documents> vim amino.fa
admin at BosLinux:~/Documents> vim Tb09.211.2410.out
admin at BosLinux:~/Documents> vim Tb09.211.2410.out



















             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  661   TTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGCTTAAATTCCCC
720

Query  721   AATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACG
780
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  721   AATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACG
780

Query  781   AAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGT
840
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  781   AAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGT
840

Query  841   GGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTG
900
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  841   GGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTG
900

Query  901   AAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCT
960
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  901   AAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCT
960

Query  961   CCTCCACTAACCCCTTCGCAACAGGTTGCATTCCGTGGTTTTTAG  1005
             |||||||||||||||||||||||||||||||||||||||||||||
Sbjct  961   CCTCCACTAACCCCTTCGCAACAGGTTGCATTCCGTGGTTTTTAG  1005

>ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A catalytic
subunit
isoform 2 (Tb09.211.2360) partial mRNA
Length=1011

 Score = 1622 bits (1798),  Expect = 0.0
 Identities = 944/974 (96%), Gaps = 0/974 (0%)
 Strand=Plus/Plus

Query  32    TGTTTACCAAGCCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGC
91
             |||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  38    TGTTTACCAAACCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGC
97

Query  92    TAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATT
151
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  98    TAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATT
157

Query  152   ATGCAATAAAATGTCTAAAGAAGCATGAGATACTAAAGATGAAGCAGGTACAACACCTGA
211
             |||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||
Sbjct  158   ATGCAATAAAATGTCTAAAGAAGCGTGAGATACTAAAGATGAAGCAGGTACAACACCTGA
217

Query  212   ACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTT
271
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  218   ACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTT
277

uery  272   CCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTAT
331
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  278   CCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTAT
337

Query  332   TTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGG
391
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  338   TTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGG
397

Query  392   AGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAAC
451
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  398   AGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAAC
457

Query  452   CTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTA
511
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  458   CTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTA
517

Query  512   AGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGG
571
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  518   AGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGG
577

Query  572   TAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGT
631
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

It follows like this.

The output I got is
ATGACGACAACTCCCACTGGTGATGGCCAACTGTTTACCAAGCCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGCTAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATTATGCAATAAAATGTCTAAAGAAGCATGAGATACTAAAGATGAAGCAGGTACAACACCTGAACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTTCCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTATTTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGGAGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAACCTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTAAGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGGTAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGTATGAATTCATAGCTGGCCATCCTCCCTTTTTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGCTTAAATTCCCCAATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACGAAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGTGGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTGAAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCTCCTCCACTAACCCCTTCGCAACAGG
TTGCATTCCGTGGTTTTTAG

TGTTTACCAAACCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGCTAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATTATGCAATAAAATGTCTAAAGAAGCGTGAGATACTAAAGATGAAGCAGGTACAACACCTGAACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTTCCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTATTTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGGAGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAACCTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTAAGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGGTAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGTATGAATTCATAGCTGGCCATCCTCCCTTTTTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGTTCAAATTCCCCAATTGGTTTGACTCCCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACGAAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGTGGTGCGAATTGGGAGAAACTCTATGGACGTCATTATCACGCTCCCATTCCTGTAAAAGTGAAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGGGATAAGCGGTTGCCCCCGTTAGCACCATCACAACAATTGGAGTTCCGTGGGTTTTAG
GGATGATGACCGATTGTACCTCCTCCTCGAGTATGTGGTGGGTGGCGAGCTGT

TCTCCCACCTCCGGAAGGCGGGAAAATTCCCTAATGATGTAGCCAAGTTCTACTCCGCAGAAGTGGTTTTGGCGTTTGAATATATTCATGAGTGCGGCATCGTATACCGTGACTTGAAGCCAGAAAATGTGCTTTTGGACAAGCAGGGAAACATTAAGATTACGGACTTTGGGTTCGCGAAACGCGTTAGGGACAGAACGTACACGCTATGTGGGACTCCAGAGTATCTTGCGCCGGAGATAATCCAAAGTAAAGGTCACGATCGGGCTGTGGATTGGTGGACACTCGGAATTCTTCTCTATGAGATGCTTGTCGGTTATCCTCCTTTTTTCGACGAGAGTCCTTTTAGAACATACGAAAAAATTTTAGAGGGGAAACTTCAGTTTCCAAAGTGGGTGGAGATGCGGGCGAAGGACCTCATAAAGAGTTTTTTAACAATTGAACCAACGAAACG

i.e.,It is only giving the region where it could find the best alignment
i.e., the best hit ones.

I want the complete sequence i.e., sequences corresponding to the accession
numbers
XM_822292.1
XM_822286.1
XM_822694.1

Database used in Remote blast was RefSeq i.e.,(refseq_rna),organism used
:Trypanasoma brucei.

Can any one please help me in solving this problem

Regards,
Roopa.
On Fri, Nov 20, 2009 at 12:30 PM, Roopa Raghuveer <rtbio.2009 at gmail.com>wrote:

>
> Hello Roy,
>
> Thanks a lot for your reply.My code is working for my sequence now.
>
> Thanks alot.
>
> Regards,
> Roopa.
>
> On Thu, Nov 19, 2009 at 5:10 PM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:
>
>> Hi Roopa,
>>
>> I think that the -Organism parameter that you specify for
>> Bio::Tools::Run::RemoteBlast is ignored - I can't find any reference to it
>> in the documentation:
>>
>> http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm<http://search.cpan.org/%7Ecjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm>
>>
>> You have the correct approach in your code - limiting the search to the
>> Entrez query "Trypanosoma brucei[ORGN]", but the line is commented out. If
>> you uncomment the line (and add a semicolon afterwards), the program runs
>> correctly, but no hits are reported below your threshold e-value. If you
>> change the value of $e_val to 10 then some T.brucei hits are reported.
>>
>> Roy.
>>
>> Roopa Raghuveer wrote:
>>
>>> Hello everybody,
>>>
>>> I have a problem. I would like to use remote blast to find sequences
>>> matching for an input sequence.
>>>
>>> Ex:-I would like to search sequences which match Trypanosoma Brucei
>>> sequence.
>>>
>>> I want the output to be only Trypanosoma Brucei sequences matching with
>>> my
>>> query.When i tried to use remoteblast to nr database,I got sequences from
>>> different organisms like E.coli,Pseudomonas etc.,
>>>
>>> Could you please tell me how can this be solved...?
>>>
>>> My code is as follows.
>>>
>>> use Bio::Tools::Run::RemoteBlast;
>>>  use strict;
>>>  my $prog = 'blastn';
>>>  my $db   = 'nr';
>>>  my $e_val= '1e-10';
>>>  my $organism= 'Trypanosoma Brucei';
>>>
>>>  my @params = ( '-prog' => $prog,
>>>         '-data' => $db,
>>>         '-expect' => $e_val,
>>>         '-readmethod' => 'SearchIO',
>>>         '-Organism'   => $organism );
>>>
>>>  my $factory = Bio::Tools::Run::RemoteBlast->
>>> new(@params);
>>>
>>>  #change a paramter
>>>  #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>> brucei[ORGN]'
>>>
>>>  #remove a parameter
>>>  #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'};
>>>
>>>  my $v = 1;
>>>  #$v is just to turn on and off the messages
>>>
>>>  my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' ,
>>> '-organism' => 'Trypanosoma Brucei' );
>>>
>>>  while (my $input = $str->next_seq()){
>>>    #Blast a sequence against a database:
>>>   my $r = $factory->submit_blast($input);
>>>    #my $r = $factory->submit_blast('amino.fa');
>>>
>>>    print STDERR "waiting..." if( $v > 0 );
>>>    while ( my @rids = $factory->each_rid ) {
>>>      foreach my $rid ( @rids ) {
>>>        my $rc = $factory->retrieve_blast($rid);
>>>        if( !ref($rc) ) {
>>>          if( $rc < 0 ) {
>>>            $factory->remove_rid($rid);
>>>          }
>>>          print STDERR "." if ( $v > 0 );
>>>         sleep 5;
>>>        }
>>>     else {
>>>          my $result = $rc->next_result();
>>>          #save the output
>>>          my $filename = $result->query_name()."\.out";
>>>          $factory->save_output($filename);
>>>          $factory->remove_rid($rid);
>>>          print "\nQuery Name: ", $result->query_name(), "\n";
>>>          while ( my $hit = $result->next_hit ) {
>>>            next unless ( $v > 0);
>>>            print "\thit name is ", $hit->name, "\n";
>>>            while( my $hsp = $hit->next_hsp ) {
>>>              print "\t\tscore is ", $hsp->score, "\n";
>>>            }
>>>          }
>>>        }
>>>      }
>>>    }
>>>  }
>>>
>>> My input sequence is
>>>
>>>  ref|NC_009512.1|:385-1902
>>>>
>>> GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA
>>> CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT
>>> TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT
>>> GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG
>>> TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA
>>> ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG
>>> GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC
>>> TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT
>>> CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC
>>> GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG
>>> CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT
>>> CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC
>>> AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC
>>> TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG
>>> CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG
>>> GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC
>>> TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT
>>> TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC
>>> GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC
>>> CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT
>>> CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG
>>> GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA
>>>
>>> Please mail me regarding any queries.
>>>
>>> Regards,
>>> Roopa.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>



More information about the Bioperl-l mailing list