[Bioperl-l] PullHSP bug
cjfields at illinois.edu
Tue Dec 2 21:40:32 EST 2008
On Dec 2, 2008, at 6:25 PM, Sendu Bala wrote:
> Chris Fields wrote:
>> I found a silent bug while fixing some issues with SimpleAlign
>> which are causing blast_pull tests to fail (HSP-related). It
>> appears that the query sequence ID is not being caught
> I don't see this as a bug of PullHSPI. The blastn report being
> parsed in that failing test has no query sequence id, so obviously
> PullHSPI can't give an ID to the query sequence it creates.
It appears the problem boils down to a small inconsistency with the
way blank IDs are handled between the two BLAST parsers. The regular
blast parser explicitly sets this to '', while blast_pull leaves it
undef. When one constructs a LocatableSeq w/o an ID it dies, but ''
works, hence the problem. I don't think using '' is a real solution
though (seems like something added to silence errors).
The short and easy fix is either having get_nse() or blast_pull fall
back to '', but both are technically wrong as there isn't a true ID.
Ideas? Maybe a fallback 'unknown' ID (which seems to be used as a ).
> Why does SimpleAlign now fall over when an added sequence has no id?
> I'd suggest the old behaviour be retained...
I don't think going back to the old behavior would be the correct
thing to do. When adding seqs to a SimpleAlign, the NSE is used to
index sequences and check for conflicts. The index was previously
built from scratch (didn't use LocatableSeq::get_nse) but was recalled
using get_nse, which is inconsistent. If get_nse changes slightly,
such as optionally adding the version to remain consistent with Pfam/
Rfam, this suddenly breaks. That behavior is now fixed to always use
get_nse regardless. Coming up with a fix for the ID so indexing works
correctly would be a better solution.
More information about the Bioperl-l