[Bioperl-l] final proposal: Bio::DB::WebSeqDBI
Mon, 11 Dec 2000 18:05:52 -0500 (EST)
The final proposal before I commit the code (all tests pass on my
2 new modules
Bio::DB::WebSeqDBI - ISA Bio::DB::RandomAccessI
Bio::DB::NCBIHelper ISA Bio::DB::WebSeqDBI
rewrites of Bio::DB::GenBank, Bio::DB::GenPept, Bio::DB::SwissProt.
This interface encapsulates the standard data retrieval methods from a
Web Sequence Database. Implementing classes must implement the method
get_request while takes as arguments a hash
of qualifiers - uids, format, etc with which to query the database and
returns a HTTP::Request object. The WebSeqDBI class manages a
LWP::UserAgent for obtaining data from the web dbs and turning that data
stream into a Bio::SeqIO.
Because of the way LWP works right now, it is not possible to take a data
stream from webserver and transform it into a Bio::SeqIO, rather, one must
read all the data from the server and then either store that in a tempfile
or transform it into a IO::String which can be treated as a filehandle.
Also a pain, the retrieval method from NCBI has some HTML 'contamination'
which needs to be screened out through a method call to postprocess_data.
One issue I am not sure how to best deal with, the temporary file removal
at the end of the life of the Bio::DB::WebSeqDBI object. The following
code illustrates a case this will remove files too soon.
my $seqdb = new Bio::DB::Genbank(-retrievaltype=>'tempfile');
my $seqio = $seqdb->get_Stream_by_id($accession);
undef $seqdb; # this will remove the seqdb object and cleanup the
# tempfile that was created
my $seq = $seqio->next_seq(); # bomb because no file exists now.
Anyone with better ideas on this feel free to let me know.
Since the Bio::DB::GenBank and Bio::DB::GenPept are so similar I wrote a
class that encapsulates all the of common functionality for retrieving
sequence data from these databases.
I'm sure it will all make much more sense once I check the code in, I just
wanted to check and see if anyone has comments or wants clarification
before I checkin major reworks to the current modules.
Is the name WebSeqDBI misleading - (ie looks like it would be a DBI
module...?) We like to use 'I' at the end of a module name to denote
Center for Human Genetics
Duke University Medical Center