[Bioperl-l] remoteblast xml problem

Chris Fields cjfields at uiuc.edu
Mon Jun 5 14:32:47 EDT 2006


Hubert, 

Make sure you have the latest Bio::Tools::Run::RemoteBlast from CVS.  The
option to save XML was committed relatively recently (last month or so).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Monday, June 05, 2006 1:18 PM
> To: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] remoteblast xml problem
> 
> hi,
> you were right, removing the composition-based statistics solved the
> problem. Now I get the result viewed on STDIN, but it doesn't save the
> output in the file.
> I haved tried it by reopening the file and writing it to an other file
> again, but it doesn't work.....
> The strange thing is that if I retrieve text instead of xml output it
> works without any problem. Don't know why
> 
> Hubert
> 
> 
> 
> Chris Fields wrote:
> > On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
> >
> >
> >> hi chris,
> >> thanks but I never intended to run the remoteblast with so much,
> >> only a few of them, acutally I goal is to run the phiblast with
> >> regular expression, so that i just don't need that
> >> file anymore
> >>
> >
> > Not a problem.  Just to let you know, I did manage to get the script
> > working, so I'm marking the bug INVALID.  I think the problem isn't
> > that there is an infinite loop so much as setting composition-based
> > statistics causes the search to take much much longer; try removing
> > that line to see what I mean.
> >
> > Just so you know, using $result->query_name doesn't get you what you
> > would expect (it gives you a part of the RID, which you don't want;
> > this is something in the XML output that is beyond our control).  You
> > might want to change it to something else or you'll get filenames
> > with numerical names.
> >
> >
> >> another question for parsing the xml output....is there a xml
> >> parser available for blast xml output or how to start.....
> >> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,
> >> but I'm not sure how to start....sorry, I guess I'm too stupid....
> >> is their maybe another introduction or an example.
> >>
> >
> > Bio::SearchIO objects are used to parse BLAST XML output if you have
> > it saved to a file.  For instance:
> >
> > my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
> >
> > while (my $result = $factory->next_result) {
> >    while (my $hit = $result->next_hit) {
> >       while (my $hsp = $hit->next_hsp {
> >          #do stuff here
> >        }
> >     }
> > }
> >
> > The only thing that changes in parsing a text BLAST report from an
> > XML BLAST report is the -format line (similar to the -readmethod
> > parameter in RemoteBlast).  You shouldn't need to look up any more
> > documentation other than these on the wiki:
> >
> > http://www.bioperl.org/wiki/HOWTO:SearchIO
> >
> > http://www.bioperl.org/wiki/Module:Bio::SearchIO
> >
> > http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
> >
> > Pay attention to the fact you'll need to install XML::SAX (CPAN) and
> > that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding
> > up parsing.
> >
> > Chris
> >
> >
> >> thanks
> >> Hubert
> >>
> >>
> >> Chris Fields wrote:
> >>
> >>> Yes, I see the same error you do.  But I have a similar script
> >>> (blastp, XML blast report, XML parsing, similar loop structure)
> >>> that  works fine.  I'm trying to dissect the problem but I think
> >>> it may be  something logically wrong here (something not so
> >>> obvious) and not a  bug...
> >>>
> >>> What I'm trying to say is, when you send sequences using
> >>> remoteblast  like, this you are essentially spamming the NCBI
> >>> BLAST server with  ~1600 requests.  This script wasn't set up with
> >>> that intent in mind;  you should really try to set up your own
> >>> local blast database if  possible.  If you can't, try running this
> >>> script in off-hours  (10pm-6am EST or something like that).
> >>>
> >>>
> >>> Chris
> >>>
> >>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
> >>>
> >>>
> >>>
> >>>> hi,
> >>>> input database: swissprot
> >>>>         matrix: pam30
> >>>>         count: 1
> >>>>         gapcosts: 9 1
> >>>>
> >>>> I know that there are  a lot of sequences, but that doesn't
> >>>> matter,  you can delete all of them except one, the amount of the
> >>>> sequences  is not the problem, the script reads one line and
> >>>> submits  it.....then the second line and so on.....I have tried
> >>>> it with only  one sequence either and I got the same result....
> >>>> the script run at  that time for more than 20
> >>>> minutes!!!!!! .....and that should be  enough time to retrieve
> >>>> the results for ONE sequence, I guess
> >>>>
> >>>> regards
> >>>> Hubert
> >>>>
> >>>>
> >>>>
> >>>> Chris Fields wrote:
> >>>>
> >>>>
> >>>>> You need to add the input conditions as well (you have several
> >>>>> <STDIN> lines which may play a role; I would like to know what
> >>>>> you  normally enter for those).
> >>>>>
> >>>>> How long did you let the script run?  I ran a quick check on
> >>>>> your  sequences; you have almost 1600, so you have to expect
> >>>>> that you'll  run into some problems here!  Most here (including
> >>>>> me) would  suggest you try installing a local blast setup for
> >>>>> something like  this.
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> hi,
> >>>>>> I have submitted the bug -> Bug 2017
> >>>>>> with the script and input file, just start it from command line
> >>>>>>
> >>>>>> thank you very much
> >>>>>> greetings
> >>>>>>
> >>>>>> Hubert
> >>>>>>
> >>>>>> Chris Fields wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Hubert,
> >>>>>>>
> >>>>>>> I have a script that's using blastxml and XML output which
> >>>>>>> seems  to work.
> >>>>>>> I'll try looking at it to get a better idea this weekend.
> >>>>>>>
> >>>>>>> Chris
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> >>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
> >>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;
> >>>>>>>> 'Sendu  Bala'
> >>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>>>>>>>
> >>>>>>>> hi,
> >>>>>>>> sorry, but I have updated the remoteblast module and I have
> >>>>>>>> run  several
> >>>>>>>> attempts with the same results as before. It didn't work.
> >>>>>>>> I didn't get any results.
> >>>>>>>>
> >>>>>>>> regards
> >>>>>>>> Hubert
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Chris Fields wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Sendu, Hubert,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix
> >>>>>>>>> the  problem
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> (break
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> out of that infinite loop).  I applied Sendu's patch to
> >>>>>>>>> RemoteBlast in
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> CVS;
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from
> >>>>>>>>> CVS  to see if
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> it
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> works.
> >>>>>>>>>
> >>>>>>>>> Chris
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> >>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
> >>>>>>>>>> To: bioperl-l at lists.open-bio.org
> >>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
> >>>>>>>>>>
> >>>>>>>>>> Hubert Prielinger wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> hi,
> >>>>>>>>>>> I have the following program and it worked quite well,
> >>>>>>>>>>> for  retrieving
> >>>>>>>>>>> remoteblast results in a textfile,
> >>>>>>>>>>> now I have altered it to to xml, and it didn't work
> >>>>>>>>>>> anymore.....
> >>>>>>>>>>> it takes all the parameter at the commandline, submits
> >>>>>>>>>>> the  query, but
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>> I
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>> don't retrieve any results file anymore.....
> >>>>>>>>>>>
> >>>>>>>>>>> it seems that it hangs in a endless loop......
> >>>>>>>>>>> the only output I get is:  $rc is not a ref! over and
> >>>>>>>>>>> over..... it
> >>>>>>>>>>> doesn't enter the else term anymore....
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> There is no problem with your code. The problem is with
> >>>>>>>>>> the  NCBI server
> >>>>>>>>>> and should be reported to them. You can visit the site and
> >>>>>>>>>> do  a blast,
> >>>>>>>>>> requesting xml format, and you will typically get one
> >>>>>>>>>> normal  'waiting'
> >>>>>>>>>> message and the promise that it will be updated in x
> >>>>>>>>>> seconds,  but
> >>>>>>>>>> subsequent attempts to get progress information result in
> >>>>>>>>>> an  xml error
> >>>>>>>>>> page because the NCBI server doesn't actually send any data.
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately the way that the bioperl code is written, it
> >>>>>>>>>> treats no
> >>>>>>>>>> data as 'waiting' instead of an error. I've offered a
> >>>>>>>>>> patch  to fix this
> >>>>>>>>>> at this bug page:
> >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Bioperl-l mailing list
> >>>>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Bioperl-l mailing list
> >>>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>> Christopher Fields
> >>>>> Postdoctoral Researcher
> >>>>> Lab of Dr. Robert Switzer
> >>>>> Dept of Biochemistry
> >>>>> University of Illinois Urbana-Champaign
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>> Christopher Fields
> >>> Postdoctoral Researcher
> >>> Lab of Dr. Robert Switzer
> >>> Dept of Biochemistry
> >>> University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>>
> >>>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list