Module Discussion:Bio::Tools::Run::StandAloneBlast

From BioPerl
Jump to: navigation, search

Copied comments over from the module page. Original comments still preserved on that page.--Stewarta 16:07, 14 November 2006 (EST)

Contents

Jason Stajich 16:43, 4 November 2005 (EST)

I think this module is too much of a hack and needs a fresh start. It should someone run without creating tempfiles, but it should also be able to run on as many platforms/OSes possible (if that means it needs a lot of if ($WINDOWS) { ... } sections so be it. What about generalizing this to go with the PISE way of defining an XML interface to an application and allowing that to predefine a set of methods (command-line arguments in this case) that can be set for running an application. Unifying how the _READMETHOD works would be nice too - and making that more explicit object method.

Jason Stajich 15:57, 13 November 2005 (EST)

Have a look at Bio::EnsEMBL::Analysis::Runnable::Blast in Ensembl runnables and/or ask on the ensembl list. Will Spooner was the primary person working on this in the past and he may have some good ideas about how the Ensembl BLAST modules are setup. The Ensembl Runnables are intended to be able to fetch and run analyses where there is also a database and are somewhat optimized for how things are run in an ensembl pipeline. I think the BioPerl one will need to be simpler. See also Runnable::Finished::Blast?. I'm not sure which of those implementations would serve as the best basis for new development.

Also see NHGRI::Blastall module for running and parsing BLAST results.

The intention for any BioPerl wrapper for BLAST should be accept Bio::PrimarySeq sequence(s) as input, create the tempfiles necessary for running (and/or run via streams) and writing, cleanup after itself. Some tweaks for speed are nice and should be considered, where the output is filtered, (see the Finished::Blast module in above link).

As to whether or not it should be started from scratch - I would call it Bio::Tools::Run::BLAST and would allow the following apps (please update if needed)

  • blastall
  • blastcl3 (runs BLAST remotely at NCBI so runs cmd-line app and then sends back ASN.1 from NCBI which gets re-chunked into a blast report).
  • blastpgp
  • rpsblast
  • bl2seq
  • wu-blastall and/or all the WUBLAST apps.

Torst 16:38, 21 December 2005 (EST)

Here is my opinion:

  • After just adding rpsblast support to this module, I agree it needs to be re-written (with a new name, too much legacy code using the old one).
  • The use of temp files is not ideal, but it is usually reliable and mostly foolproof. The proper Unix way to do it would be to use IPC::Open3 but this can have deadlock problems if the binaries you are running are not well behaved or deterministic in their behaviour. I'm not sure how this would be done in in Windows.
  • Ultimately all this module does is
    1. run a binary on the local system, with an interface to its cmdline parameters and input stream
    2. capture its output stream(s)
    3. convert the output stream(s) into BioPerl objects or throw BioPerl exceptions
  • It should be easy to add support for any new tool which can produce BLAST-compatible output, eg. FSA-BLAST, BLAT, and so on.
  • There is a lot of confusion about how $PATH, $BLASTDIR, $BLASTDATADIR, $BLASTMATDIR, $HOME/.ncbirc, $WUBLASTDIR all interact etc which also needs to be clarified, perhaps StandAloneBlast HOWTO needs to be written, and I guess I should put my hand up at some stage for that one :-)

Andrew Stewart 16:08, 14 November 2006 (EST)

How difficult would it be to include support for mpi-blast?

Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox