[Bioperl-l] Batch mode in Bio::DB::GenBank

Hilmar Lapp hlapp at gmx.net
Fri Mar 31 12:43:15 EST 2006


There used to be get_Stream_by_batch() which apparently is now 
deprecated and forwards to get_Stream_by_id(), which therefore I assume 
is supposed to do the Right Thing depending on its arguments. I don't 
know where this is going wrong.

	-hilmar

On Mar 31, 2006, at 8:56 AM, Chris Fields wrote:

>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Marc Logghe
>> Sent: Friday, March 31, 2006 8:45 AM
>> To: bioperl-l at bioperl.org
>> Subject: [Bioperl-l] Batch mode in Bio::DB::GenBank
>>
>> Hi,
>> It seems that in the current (CVS of last night) Bio::DB::GenBank
>> implementation it is not at all possible to set the mode to 'batch'
>> instead of the default 'single'. Devel::StackTrace revealed that the
>> mode is hardcoded in the Bio::DB::WebDBSeqI::get_Stream_by_id method.
>> Is that intended ?
>> The problem is that with single mode, the request is always done with 
>> a
>> GET. In most cases (at least in my hands) when you pass a batch of 500
>> id's the request fails because of the url getting too long. All goes
>> well when the method is overridden whereby the mode option is 
>> hardcoded
>> to 'batch' so that a POST is done.
>
> You're right about the 500 seq limit.  If it's particularly busy 
> (during
> peak hours) it's less, around 200-400.  I have been grabbing them 400 
> at a
> time using a loop, which works but batch would be better.
>
> I remember asking about this a few years ago and, according to 
> Lincoln, we
> use the approved batch method retrieval.  However, now you point it 
> out, I
> just don't see it here (no epost).  NCBIHelper has, for some reason, 
> this:
>
>     %CGILOCATION = (
> 		    'batch'  => ['post' => '/entrez/eutils/efetch.fcgi'],
> 		    'query'  => ['get'  => '/entrez/eutils/efetch.fcgi'],
> 		    'single' => ['get'  => '/entrez/eutils/efetch.fcgi'],
> 		    'version'=> ['get'  => '/entrez/eutils/efetch.fcgi'],
> 		    'gi'   =>   ['get'  => '/entrez/eutils/efetch.fcgi'],
> 		     );
>
> Which has batch set to efetch, not epost.
>
>> I think there are at least 2 possibilities:
>> 1) change single to batch in Bio::DB::WebDBSeqI::get_Stream_by_id
>> 2) allow the possibility to pass the mode option when get_Stream_by_id
>> is called using the Bio::DB::GenBank object
>
> I would say the second is the most flexible, though I'm not exactly 
> sure why
> we hardcode in 'single' for sequence streams.  It may have something 
> to do
> with the way single sequences are retrieved; looks like get_Seq_by_acc 
> in
> WebDBSeqI calls get_Stream_by_acc with one sequence instead of an 
> array ref;
> I guess get_Stream_by_id is the same.
>
> Anyway, I'm for it as long as some tests are added for batch retrieval 
> and
> everything passes.
>
>> Any comments/preferences before I actually commit some edits ?
>> Regards,
>> Marc
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



More information about the Bioperl-l mailing list