[Bioperl-l] While loop - SearchIO for BioPerl

Chris Fields cjfields at illinois.edu
Wed Jul 8 21:41:01 EDT 2009


Yep, that's what I was thinking.  The fragment in question is fairly  
short.

Inna, if you want the best HSP you could just grab the one that best  
fits what you expect (best eval, score, whatever).

chris

On Jul 8, 2009, at 8:31 PM, Mark A. Jensen wrote:

> A lack of low-complexity filtering  (as seems apparent from this  
> report snippet, if
> I understand that concept correctly) could explain the multiple  
> query hits on a
> short (24bp) region of the same subject...
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu 
> >
> To: "Rytsareva, Inna (I)" <IRytsareva at dow.com>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, July 08, 2009 9:08 PM
> Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl
>
>
>> I'm curious as to what this report looks like.  The example report  
>> you  posted to the gbrowse list had serious issues (different  
>> problem, 'No  midline' error which I replicated); mainly there were  
>> no blank lines  making it pretty much invalid, so the parser had  
>> issues with it.   Example lines from one HSP:
>>
>> > gnl|DAS|24699 pDAB101580
>>          Length = 12942
>> Score = 50.1 bits (25), Expect = 5e-06
>> Identities = 37/41 (90%)
>> Strand = Plus / Plus
>> Query: 10   ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50
>>            ||||||||||||||| ||| |||||||| ||||| ||||||
>> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>> Score = 46.1 bits (23), Expect = 8e-05
>> Identities = 35/39 (89%)
>> Strand = Plus / Plus
>> Query: 13   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51
>>            ||||||||||||| ||| |||||||| ||||| ||||||
>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>> Score = 46.1 bits (23), Expect = 8e-05
>> Identities = 35/39 (89%)
>> Strand = Plus / Plus
>> Query: 14   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52
>>            ||||||||||||| ||| |||||||| ||||| ||||||
>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659
>>
>> ...
>>
>> chris
>>
>>
>>
>>
>> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote:
>>
>>> Hello,
>>>
>>> I have a follow script to parse the BLAST  report:
>>>
>>> my $in = Bio::SearchIO->new (  -file =>$out_file,
>>> -format =>'blast') or die $!;
>>>
>>> while (my $result = $in->next_result) {
>>> while (my $hit = $result->next_hit)
>>> {
>>> while (my $hsp = $hit->next_hsp) {
>>> $qhit = $hit->name;
>>> $start = $hsp->hit->start;
>>> $end = $hsp->hit->end;
>>> }
>>>
>>>
>>> } print "Hit= ", $qhit,
>>> ",Start = ", $start,
>>> ",End = ", $end,"\n"; }
>>>
>>> Usually, the report has a number of the same hsp for each hit.
>>> Using "print" command it gives me a hit name, start and end  
>>> positions
>>> for each hit, except last on. For last one it prints all the hsps.
>>> Something like this:
>>>
>>> Hit= gnl|DAS|22386,Start = 7578,End = 7601
>>> Hit= gnl|DAS|25627,Start = 2824,End = 2863
>>> Hit= gnl|DAS|25328,Start = 8864,End = 8887
>>> Hit= gnl|DAS|4890,Start = 1896,End = 1919
>>> Hit= gnl|DAS|12191,Start = 1898,End = 1921
>>> Hit= gnl|DAS|4276,Start = 557,End = 580
>>> Hit= gnl|DAS|12959,Start = 801,End = 824
>>> Hit= gnl|DAS|4092,Start = 2266,End = 2304
>>> Hit= gnl|DAS|19740,Start = 13572,End = 13610
>>> Hit= gnl|DAS|12393,Start = 3901,End = 3924
>>> Hit= gnl|DAS|25687,Start = 10415,End = 10438
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433
>>>
>>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one.
>>> I don't need these duplicates.
>>> How can I fix that?
>>>
>>> Thanks,
>>> Inna Rytsareva
>>> Discovery Information Management
>>> Dow AgroSciences
>>> Indianapolis, IN
>>> 317-337-4716
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>



More information about the Bioperl-l mailing list