[Bioperl-l] acquiring a local refseq + index

Chris Fields cjfields at uiuc.edu
Sat Dec 30 21:33:23 EST 2006


Agree with Hilmar, in that we need examples.  If you are referring to  
your submitted bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=2167

we could add this in as long as it passes (I'll try giving it a  
workout with my local bacterial seqs tonight or tomorrow).  However,  
in the not-too-distant future your patch would likely be rendered  
obsolete, as any parsing in Bio::SeqIO modules pertaining to  
Bio::Species-related matters will be deprecated in favor of simple  
parsing (more foolproof, less uncertainty) and Bio::Taxon (which has  
optional db lookups using NCBI Taxonomy).  Bio::Species and anything  
related to it are considered marked for deprecation.  Fair warning...

chris

On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote:

> Can you send examples and the resulting error messages? Also, I'm
> assuming you running the 1.5.2 release of Bioperl; if not that's what
> I would try first.
>
> 	-hilmar
>
> On Dec 30, 2006, at 7:05 PM, Erik wrote:
>
>> Hi all,
>>
>> I downloaded the refseq files (.gbff) and want to index the lot with
>> Bio::DB::Flat.
>>
>> It turns out that there are many cases where the SOURCE and
>> ORGANISM lines
>> are messed up, sometimes to a degree where the indexing fails on a
>> Bio::SeqIO::genbank error.
>>
>> I'd like to change Bio::SeqIO::genbank to let this parsing go at
>> least so
>> far as to make the indexing of the refseq files possible, and
>> hopefully
>> improving the taxonomic output ($seq->species->binomial is often
>> mutilated
>> at the moment).
>>
>> Is it still worthwhile to change parsing modules like
>> Bio::SeqIO::genbank?
>>  Is anyone already working on a rewrite? Because if this is the
>> case I may
>> be better off writing my own indexing scheme?
>>
>> Below is (outline of) my indexing program, which uses
>> Bio::DB::Flat::DBD.
>> If anyone knows of a better way to get a locally searchable refseq
>> flat
>> file index, I would be very interested.
>>
>> Thanks for your help,
>>
>> Erikjan
>>
>>
>> -------------
>> use Bio::DB::Flat;
>>
>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
>> my $db=Bio::DB::Flat->new(
>>    -directory  => $refseq_dir,
>>    -dbname     => 'refseq',
>>    -format     => 'genbank',
>>    -index      => 'bdb',
>>    -write_flag => 1,
>> );
>> my @files = getfiles($refseq_dir);
>> for my $f (@files) {
>>         db->build_index($f);
>> }
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





More information about the Bioperl-l mailing list