[Bioperl-l] Comparing DB_FILE and SDBM

Josh Lauricha laurichj at bioinfo.ucr.edu
Fri Aug 13 14:00:35 EDT 2004

On Fri 08/13/04 12:38, Mike Muratet wrote:
> On Thu, 12 Aug 2004, Josh Lauricha wrote:
> > On Thu 08/12/04 13:44, Mike Muratet wrote:
> > > Greetings
> > > 
> > > I did a comparison myself of Bio::Index::GenBank between DB_FILE and SDBM
> > > on the latest version of the files from the Genbank primate division using
> > > a Compaq with 376K of memory and a 2.4GHz Pentium 4 Xeon. I used the
> > 
> > Wow, now thats a machine in desprate need of a memory upgrade.
> >
> Well, yes. But it's all I've got.

I was jokeing about the K vs M mix up ;) If the machine really does have
only 376K of memory, thats your problem... 

> If anyone has any experience(s) to relate regarding the indexing of big
> portions of Genbank, I'd like to hear how they did it. Should it really
> take days?

I've got the ests of genbank indexed... around 60G:

$ time gbfetch -i est.index BI509189 > /dev/null 

real    0m0.361s
user    0m0.300s
sys     0m0.050s

Don't want to go through the full indexing but that took quite some
time.. several hours I think. But, just reading the data takes at least
25 minutes, and this on a set of disks that will do ~60MB/s sustained
(hdparm -t) an normal IDE drive will do around 20.

The indexing doesn't do a full parse, just enough to get the ID numbers.

I don't think it should take days.... but if you've got a slow disk( as
root do: hdparm -t /dev/hda ), that could be it. The scripts I use are
basically just the synopsis, with the default DB.


| Josh Lauricha            | Ford, you're turning    |
| laurichj at bioinfo.ucr.edu | into a penguin. Stop    |
| Bioinformatics, UCR      | it                      |
| OpenPG:                                            |
|  4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 |

More information about the Bioperl-l mailing list