[Bioperl-l] Database Retrieval

Sendu Bala bix at sendu.me.uk
Tue Aug 8 08:44:13 EDT 2006

Sean Davis wrote:
> That is certainly possible--this is perl, right?  I'll think about it, but I
> doubt that I have the time to put together a satisfactory "grand" solution
> that allows arbitrary queries without specifying SQL, returns bioperl
> objects, and doesn't reflect some of the underlying schema.  If one settles
> on a set of objects that one wants to return, the process will be easier,
> but that limits the information that one can get from the database.
> Practically, to have a "table-browser-like" code interface will require
> exposing some of the SQL schema, as column names and table names will need
> to come into it.

Not necessarily. You only have to have a mapping from the conceptual 
purpose of the table to its current name (and likewise for columns). So 
instead of a module called 'refLink' because there is a table called 
'refLink', you might have something called refseq_mrna_links which maps 
to 'refLink'. Oh, and given the sheer number of tables, I don't think it 
would be appropriate to have a module per table.

How about some single module that does the selection of the relevant 
database and table given $db and $table_concept? Perhaps:

# map 'human' to the possible human databases, default 'hgXX'
my $db = Bio::DB::UCSC::Databases('human');

# map 'refseq_mrna_links' to 'refLink' and return a
# Bio::DB::UCSC::Queryable
my $queryable = new Bio::DB::UCSC::Table($db, 'refseq_mrna_links');

# map mrna_accession method and its args to
# query => [mrnaAcc => {like => 'NM_00002%'}]
my $row_data = $queryable->mrna_accession(-like => 'NM_00002%');

Even that's not so hot; you still have to know some massive list of 
inflexible table-concept names like 'refseq_mrna_links'. Perhaps it 
would be even better if it was truly concept based. You say what you 
want and it figures out the correct table:

my $queryable = new Bio::DB::UCSC::Table($db, 'mrna_accession', 

Sane? Reasonable? Desirable? Possible? I'm just throwing ideas out; you 
may see a better way of achieving similar ends.

> Taking such an approach, either based on RDBO or with
> hand-coded SQL management, precludes returning bioperl-type objects.   On the
> other hand, if one wants only bioperl-type objects returned, the information
> that can be returned is quite limited and the query structure (from a perl
> point of view) will need to be limited to a set of fields that can
> ultimately be used to look up only the information associated with bioperl
> objects.  I think the table-browser-like approach is the better way to go to
> start; let the user deal with making bioperl objects as he/she sees fit once
> the data is back.  As a second round of development, one could certainly
> build a compatibility layer that uses the primary query engine to pull out
> information for constructing key bioperl objects, but I don't think that
> should be the primary goal, but a secondary one.

Yes, that's the way it should be done, but the interface for the primary 
query engine ought still be independent of the table structure.

More information about the Bioperl-l mailing list