[Bioperl-l] Fwd: blast.pm patch

Peter biopython at maubp.freeserve.co.uk
Mon Dec 21 10:27:47 EST 2009

On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza
<robfsouza at gmail.com> wrote:
> Hi Peter,
> I just upload my example. I also reported this bug to the NCBI
> developers and I hope they can fix it, since it is easy to reproduce.
> I just forgot to mention the blastpgp version: 2.2.18
> Best,
> Robson

Hi again Robson,

Having a reproducible example to investigate this issue is
incredibly helpful - thank you!

I've been looking at the output, and while I can make sense of
it "by hand", it would be very tricky to try and parse as a special
case. It really does look like a bug in BLAST to me. The alignment
includes an initial pair, a leading gap in the query (with a coordinate
of zero), plus a residue from the match sequence (with a sensible
coordinate). The alignment statistics include this (extra) pair in
the alignment length.

You said you were using blastpgp version 2.2.18, so I tried this
with the latest (final?) version of the "legacy" BLAST suite,
blastpgp 2.2.22, which I already had installed. It looks like my
copy of NR is more recent (bigger), but the same odd output
was produced:

blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000

I also tried what I think would be the equivalent command line
on the new BLAST+ suite, using psiblast 2.2.22+ like this:

psiblast -db nr -query Ngru1000013938.fa -out Ngru1000013938.fa.blast
-num_threads 8 -parse_deflines -num_alignments 10000

This was much faster, and seems to output sensible alignments.

I might therefore expect the NCBI so say "yes, this is a bug in
the old blastpgp tool, just use the new psiblast tool instead".
However,  fingers crossed they will do another maintenance
release of the "legacy" BLAST suite and fix this in blastpgp.

Have you had any reply from the NCBI? Admittedly it is almost
Christmas/New Year so we may not expect an answer until Jan.


More information about the Bioperl-l mailing list