[Bioperl-l] Appending efetch results to a file

Chris Fields cjfields at illinois.edu
Fri May 8 10:22:31 EDT 2009

On May 8, 2009, at 8:42 AM, Mark A. Jensen wrote:

>> I need to request 100's of records, and to avoid stress the Entrez   
>> server I do my fetching inside a loop that increments the - 
>> retstart  parameter in the factory.
> This raises a question in my mind: should EUtilities use  
> Bio::WebAgent rather
> than LWP::UserAgent directly, and doesn't Bio::WebAgent have
> magical properties that ease the server burden without having to  
> build it into the user code directly?

I thought about that originally, but there is a significant difference  
between the two agent implementations.  Bio::WebAgent is-a  
LWP::Useragent subclass, whereas Bio::DB::GenericWebAgent and it's ilk  
contain a user agent instance (has-a).  I choose the latter course b/c  
I favor composition over inheritance, and LWP::UserAgent uses  
different named parameter handling than BioPerl (no '-');  
Bio::WebAgent code works around that in the constructor.  Rather that  
than the possibility of down the road to run into odd parameter issues.

Not to mention, I may genericize it more in the future to be capable  
of using SOAP-based methods, so switching out the ua made more sense  
in the long run (still a lot to do on that end).

I haven't discussed this extensively on the list before, but when I  
redesigned EUtilities I wanted to separate out the various tasks, e.g.  
ua, parser, parameter handling, etc.  So, for the specific eutil  
tools, parser = Bio::Tools:EUtilities, parameter =  
Bio::Tools::EUtilities::EUtilParameters, ua = LWP::UserAgent.  For  
other DBs one could switch out the relevant bits for DB-specific  
implementations.  Then, Bio::DB::EUtilities basically decorates all  
three, acts as the traffic cop to get the various bits playing well  
together, delegates as needed, etc.

This'll allow additional components to be added in at later points if  
needed, and the basic tool can be used for retrieving raw data or as a  
souped-up agent for retrieving remote data in a new set of modules  
(Bio::Entrez::*, maybe).  There are some experimental bits in there  
still (repeated requests with the exact same params do not spam  
eutils, for instance, and there is some 'lazy' code in the parser),  
but it seems to largely work, and those bits can be removed fairly  
easily if they prove problematic.


More information about the Bioperl-l mailing list