[Bioperl-l] Bio::Index::Blast bug
biopython at maubp.freeserve.co.uk
Wed Mar 10 09:40:16 EST 2010
On Wed, Mar 10, 2010 at 2:27 PM, Chris Fields wrote:
> On Mar 10, 2010, at 4:35 AM, Peter wrote:
>> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote:
>>> Hi all!
>>> I tried to use Bio::Index::Blast, but always got the first hit back, no
>>> matter what ID I used. The reason is that the Blast indexer seems to use
>>> 'BLAST' as a record separator in all cases, except for RPS-BLAST.
>>> I think however that for the current versions of blastall and blast+
>>> 'Query=' should be used.
>> That fits with changes I had to make in Biopython for breaking
>> up the plain text BLAST output into each query. For a while only
>> the RPS-BLAST report omitted the "header" (the BLAST line
>> and the journal references users should cite) between records,
>> but now all the NCBI BLAST tools do this - forcing us to look
>> for the Query= line.
>> i.e. I can't comment on the BioPerl change itself, but your
>> reasoning about the BLAST output makes sense.
> One side-effect of this is we will be missing the search
> algorithm and a few small odds and ends from all but
> the first report; this trickles down into how we properly
> deal with HSP coordinates, but we can probably wrangle
> some magic there to get things working for the most part.
Yeah - I had similar issues with the Biopython plain
text BLAST parser. The hack/magic I used was to
cache the header text from the first record and then
re-insert it on subsequence records. Nasty, but works.
> This is similar to how XML format is currently dealt with
> (and another reason this format is the easiest to support,
> as it doesn't change based on NCBI's whims).
They may have changed a few things here too - watch out.
> Do we have example reports with multiple queries from
> BLAST+ available? It would be invaluable for the projects;
> if not I can probably generate a few locally.
I've got one example in Biopython's unit tests,
More information about the Bioperl-l