[Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures

Chris Fields cjfields at uiuc.edu
Wed Apr 18 15:41:56 EDT 2007


The problem appears to be with eutils.  Using bare accession numbers  
no longer works with esearch (which Bio::DB::Query::GenBank uses).   
Using them via efetch still works, which explains why  
Bio::DB::GenBank passes tests using the same accession/GI mix.

NCBI has added an extra field descriptor specifically for accessions  
in esearch, which means any queries with accessions must look like  
the following (the last is a GI):

'J00522[accession] OR AF303112[accession] OR 2981014'

'J00522[accession] | AF303112[accession] | 2981014' also works.

We could separate them into two groups based on presence of letters  
and set up the query that way, or we can define exactly what kind of  
ID is acceptable for passing to ids() (GI or accession), or have ids 
() be GI and have a new method for accessions (or vice versa).   
Thoughts?

chris

On Apr 18, 2007, at 1:32 PM, Baik, Ki wrote:

> I have had similar problems in which a couple of accession numbers out
> of a series were not retrieved, yet they do exist in ncbi.
>
> Ki Baik
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
> Fields
> Sent: Wednesday, April 18, 2007 10:05 AM
> To: Sendu Bala
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
>
> I can verify on this end.  Not sure why, but the same accessions are
> used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)
> with success.
>
> chris
>
> On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:
>
>> Hi all,
>>
>> t/DB.t is currently failing tests 40 and 41:
>>
>> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>>                                           '-ids' => [qw(J00522
>> AF303112
>> 2981014)],
>>                                           -verbose => 1);
>>
>> cmp_ok $query->count, '>', 0;
>>
>> You can see that
>> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi?
>> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522%
>> 2CAF303112%2C2981014&retmax=100
>> gives no results, where presumably it used to give 3. querying on
>> the 3
>> ids individually works fine. So... what changed and how do we get
>> around it?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





More information about the Bioperl-l mailing list