[Bioperl-l] LWP in Bio::DB::GenBank.pm

Hilmar Lapp hlapp@gmx.net
Sat, 25 Nov 2000 23:51:33 -0800

Jason Stajich wrote:
> Hilmar, have you made any progress on the concept of a single class for DB
> remote access with GenBank and GenPept inheriting off of it?  It would make
> sense to try and do all the changes at the same time, and DB::SwissProt
> might fit into that as well? It is just a different base url.  Perhaps we
> need a class that is Bio::DB::WebDB.

I was proposing something in this notion, namely a NetIO class.
Something more focused like Bio::DB::WebDB is probably the better
starting point. Yes, let's do this. The extent of code duplication in
the Bio::DB classes tends to be embarassing.

Regarding streaming through LWP, it is not really clear to me how to
turn a callback into a stream of sequences that can be processed by
client-code one after the other. If you look at the code of LWP, you may
notice that processing of the stream (reading from the socket) is deeply
buried, subclassing LWP::UserAgent to me doesn't seem to help. As a
difference to returning a SeqIO object, which in fact reads from the
socket on demand of the client requesting the next sequence, a callback
puts control to the LWP code. At the time $ua->request() and therefore
$genbank->get_stream_by_XXX() return to the client-side caller, all data
have by definition already been read (since Perl is not multithreaded!).
So, at present it seems that the only way to get around is to require a
client to provide a callback itself, which is a significant change in
the API, or to discourage its use when possibly many and long sequences
can be expected. BTW the stream methods are not enforced by the
interface the module DB::GenBank is supposed to adhere to.

Hmm. Any comments? Am I missing something?


Hilmar Lapp                                email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757