[Bioperl-l] acquiring a local refseq + index

Erik er at xs4all.nl
Sat Dec 30 19:05:16 EST 2006

Hi all,

I downloaded the refseq files (.gbff) and want to index the lot with

It turns out that there are many cases where the SOURCE and ORGANISM lines
are messed up, sometimes to a degree where the indexing fails on a
Bio::SeqIO::genbank error.

I'd like to change Bio::SeqIO::genbank to let this parsing go at least so
far as to make the indexing of the refseq files possible, and hopefully
improving the taxonomic output ($seq->species->binomial is often mutilated
at the moment).

Is it still worthwhile to change parsing modules like Bio::SeqIO::genbank?
 Is anyone already working on a rewrite? Because if this is the case I may
be better off writing my own indexing scheme?

Below is (outline of) my indexing program, which uses Bio::DB::Flat::DBD.
If anyone knows of a better way to get a locally searchable refseq flat
file index, I would be very interested.

Thanks for your help,


use Bio::DB::Flat;

my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
my $db=Bio::DB::Flat->new(
   -directory  => $refseq_dir,
   -dbname     => 'refseq',
   -format     => 'genbank',
   -index      => 'bdb',
   -write_flag => 1,
my @files = getfiles($refseq_dir);
for my $f (@files) {

More information about the Bioperl-l mailing list