[Bioperl-l] DB2 driver for BioPerl

Florian Mittag florian.mittag at uni-tuebingen.de
Mon Jul 6 12:08:18 EDT 2009


On Saturday 04 July 2009 12:39, Hilmar Lapp wrote:
> On Jul 2, 2009, at 11:28 AM, Florian Mittag wrote:
> > We were able to adapt the "load_ncbi_taxonomy.pl" script from BioSQL
> > to fill
> > our DB2 database with taxonomy data
> Would you mind posting to the BioSQL list which changes you had to
> make to make the script work with DB2?

No problem, I will post the diff sometime this week, since there are a few 
changes not necessary anymore, e.g., the new DB2 Express-C version 9.7 
supports the "TRUNCATE TABLE" command, which it previously didn't.

> More generally, is there some kind of comprehensive documentation on
> what is different in DB2 from standard SQL92? The
> load_ncbi_taxonomy.pl script should in principle work with any SQL92-
> compliant RDBMS ... Have you found that not to be the case (which
> would be a bug), or is DB2 in some ways not SQL92-compliant?

I don't know, I haven't looked for this kind of documentation, but the two 
things that annoyed me most were:
1) DB2 doesn't support UNIQUE on columns that allow for NULL values.
Solution: create triggers that ensure UNIQUEness and create an INDEX.

2) Columns of type CLOB do not allow to be compared through "=", but only 
through "LIKE", which leads to problems with BioJava's Hibernate queries.
Solution: currently none

I want to discuss these problems in more detail on the other mailinglists, 
since they do not really belong here.

> > , but loading the gene ontology with BioPerl's "load_ontology.pl" is
> > somewhat harder.
> The ontology as well as the sequence loader are really just front-ends
> to the Bioperl-db object-relational mappers (ORMs). So I would start
> there, rather than looking at errors the script does or does not throw
> (you don't want to run all combinations of command line parameters
> that would exercise each and every feature of the script).
> In order to create DB2 driver support in Bioperl-db, you need to add
> two things. First, you need to create a module Bio/DB/DBI/DB2.pm that
> overrides the methods from base.pm according to DB2. The fact that you
> didn't report any errors about that module not having been found
> suggests that you've done this already.

Correct ;-)

> The second step is as you say to create a package Bio/DB/BioSQL/DB2
> with at least BasePersistenceAdaptorDriver.pm	as module in it, and
> starting with a copy of the existing ones is indeed the best way to
> get started on this. Unless you also created the DB2 database DDL
> scripts from the Oracle ones, I wouldn't necessarily copy from Oracle
> though, but maybe rather from Pg. And rather than looking for errors
> of one of the scripts, I'd just go systematically through the files
> and make sure the SQL in there is DB2 compliant.

Okay, I'll do that, but that will take some time and I'll probably turn to 
this mailings for further assistance with more specific questions.

> > [...]
> > It first ran a few minutes processing the file and then died after the
> > following SQL-command was prepared and executed:
> >
> > "SELECT term.term_id, term.identifier, term.name, term.definition,
> > term.is_obsolete, NULL, term.ontology_id FROM term WHERE identifier
> > = ?"
> Could you post the full error message? It is rather difficult to
> diagnose what's going on w/o the error message and stack trace.

Right now, unfortunately not, because this error message won't appear again. 
I'm not sure is this is because of the database now containing data or 
because of some other changes I've made, but I will see this in the process 
of rewriting the DDL scripts.

> I'd be surprised BTW if DB2 were indeed offended by the NULL in the
> above statement - I'm pretty sure that "SELECT NULL FROM
> sometable" (or "SELECT 1 FROM sometable") is standard SQL. Are you
> sure that if you execute such a statement at a SQL prompt it results
> in an error?
> Since I can hardly believe that DB2 doesn't support selecting
> constants (NULL is as much a constant as 1 is), maybe what it wants
> though is aliasing the column. So if
> SELECT NULL FROM bioentry;
> yields an error, does
> SELECT NULL AS colAlias FROM bioentry;
> work fine?

Well, it is like this with version 9.5 of DB2 Express-C:


 SQL0206N  "NULL" is not valid in the context where it is used.  

But if I do:

SELECT cast(NULL AS VARCHAR(255)) FROM bioentry;

it returns the correct result without error. Thew new version 9.7 claims to 
have changed this behavior, so that the first query would run fine, but I 
didn't have time to test the new version, yet.


> > I don't know if the "NULL" column is supposed to be there
> It is. The code in BaseDriver.pm that you were looking at should not
> need to be modified. (Rather, DB2/BasePersistenceAdaptorDriver.pm is
> supposed to override any method that needs to be adapted to DB2.) The
> way the ORM works is by trying to map all properties of a BioPerl
> object that are persistent to a column of a table in the database. If
> it can't map a property (for whatever reason) its value is simply
> always undef (or NULL in SQL). I.e., NULL columns are the placeholder
> for a column that failed to be mapped to a property. You can't simply
> remove them or all subsequent columns are shifted.

It ran fine without the NULL column, but that isn't necessarily a sign of 
correctness. My problem was that (as stated above) the old version of DB2 
requires you to cast the NULL value to a data type, which I wasn't able to 
determine from the code. With the new version, it should work, so I'll have 
to rerun my tests again and see if the problem is still there.

I will keep you updated on the Perl issues and hope to have some useful 
results by the end of the week. And I hope you excuse me for posting things 
here that are hardly related to BioPerl, but the some problems are a complex 
entanglement of issues with BioSQL, BioPerl and BioJava, so it's hard to 
decide where to post it ;-)


More information about the Bioperl-l mailing list