David J. States
Wed, 30 Dec 1998 11:57:05 -0600
We have been thinking about XML applications in bioinformatics. From some
of the work we have been doing on automated interface generation for
databases, it appears that would be straightforward to write a PERL module
that would automatically generate XML from a generic stored procedure. We
already have contributed scripts to CPAN (RDBAL.pm and PQEDIT.pm) that
will produce a query by example HTML form by reverse engineering database
schema's. Extending this to XML, a developer would only need to write a
stored procedure, and the rest of the work needed to transform this into an
Internet XML data source would be automatic. Generating the RDF to
describe the full database or any stored procedure would also be part of
this module. We are also moving to XML wrappers for the tools such as MSA
and FASTA that we provide as Internet services.
Which brings me to the question of bio data type definitions. It seems to
me there is already somewhat of a proliferation with BIOML and BSML drafts
out as well as natural mappings from NCBI ASN.1, ACEDB, GenBank flat file,
and EMBL flat file formats to XML. One could also use the XML PERL modules
to automatically generate XML from bio PERL objects. Finally, the
automated database interfaces proposed above would result in each database
and tool generating its own data type definition. Does anyone see a way to
motivate standards generation out of this mess? Note that motivate is the
operative verb since we obviously have no means to enforce compliance.
Is BIOML under active development? While the draft spec seems relatively
simple, there are lots of things that it leaves unresolved. For example,
you can specify a domain on DNA or proteins sequence, but the type of
domain appears to be free text. I agree with Lincoln Stein's kitchen sink
assessment of BSML. The sheer complexity of NCBI's ASN.1 definitions were
probably more of an impediment to broad utilization than the ASN 1 itself
was, and this seems like a real concern for BSML.
Is anyone aware of plans on the part of the database organizations to serve
An alternative to agreement on a bio DTD is to push the burden of data
resolution issues onto the client. In writing an applet for a specific
display function, you would need to know the relationship of the various
fields in the data sources that you were referencing. This seems less
desirable, but at least it is a way forward.
Thoughts or suggestions?
David J. States, M.D., Ph.D.
Associate Professor and Director
Institute for Biomedical Computing
Washington University in St. Louis
700 S. Euclid Ave.
St. Louis, MO 63110
tel: 314 362 2134
fax: 314 362 0234
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc: