[Bioperl-l] Bug # 2908: ecoli.nt not available

Dave Messina David.Messina at sbc.su.se
Sun Apr 25 15:59:58 EDT 2010

On Apr 25, 2010, at 3:47 PM, Chris Fields wrote:
> How does this work cross-platform, cross-OS, 32 vs 64 bit, new vs old BLAST versions, etc?  That's my only concern with using a single database.

Good point; I hadn't thought about that. I assumed that since NCBI offers downloadable databases, the BLAST db binary format was "universal" — network byte order etc.

> I don't think there is an issue, just can't recall.

I don't know for sure, either. And I don't know whether, with the switch from old-style BLAST to BLAST+ if there's a db format change. One option for us would be to start with a FASTA file of sequences and have our test code build the necessary test dbs using locally installed formatdb if present and skip otherwise.

On Apr 25, 2010, at 4:44 PM, Iftekharul Haque wrote:
> If we were to include a database ourselves, where would it reside?

In t/data/

> I imagine you use the word "database" fairly loosely, as in, it could
> be just a flat text file?

If I read the test correctly, the database needed for some of the StandAloneBlast.t tests was a BLAST database, which is a special binary format and not a flat text file.

Now, the bug report was about the ecoli.nt FASTA flatfile being missing. I'll admit I'm a little confused about that, because I didn't see anywhere in StandAloneBlast.t that said how ecoli.nt was downloaded and how it was formatdb'd into a BLAST database.

The tests seem to be assuming those steps have already been done. Which I can't imagine they very often would have been, so probably these tests have been almost always skipped (as Chris' comment on the bug report suggests).

Arguably the more important part of the bugfix would be fixing the testing structure such that test 43 doesn't fail due to the absence of output from the skipped tests; presumably it should be skipped too.

That is, if that failure is even still happening. I just ran it and all tests are passed or skipped. (But I don't have blastall installed on this machine, so the SKIP is triggered in a wider scope for me).

> I was thinking if other tools need reference sequences to run tests as
> well, if we had a standing set of sequences to test tools against, you
> wouldn't have to add too many sequence files with the distribution
> (helping control download file size).

We sort of have this in the form of the t/data directory. Undoubtedly there's some redundancy and cruft in there that's built up a little over the years, but as far as I know it hasn't been too much of a problem.

One thing that we could do to impose a little more order on that dir is to put the files in t/data in subdirectories to match the existing hierarchy of modules in Bio/ (as the t/ directory of tests itself does for the most part).

But since we'll be likely splitting out tests and testfiles in the relatively near future as part of the overall decentralization of BioPerl into smaller independent distributions, it may not be worth the time.


More information about the Bioperl-l mailing list