Bioperl: XML/BioPerl

David J. States
Wed, 30 Dec 1998 11:57:05 -0600


We have been thinking about XML applications in bioinformatics.  From some 
of the work we have been doing on automated interface generation for 
databases, it appears that would be straightforward to write a PERL module 
that would automatically generate XML from a generic stored procedure.  We 
already have contributed scripts to CPAN ( and  that 
will produce a query by example HTML form by reverse engineering database 
schema's.  Extending this to XML, a developer would only need to write a 
stored procedure, and the rest of the work needed to transform this into an 
Internet XML data source would be automatic.  Generating the RDF to 
describe the full database or any stored procedure would also be part of 
this module.  We are also moving to XML wrappers for the tools such as MSA 
and FASTA that we provide as Internet services.

Which brings me to the question of bio data type definitions.  It seems to 
me there is already somewhat of a proliferation with BIOML and BSML drafts 
out as well as natural mappings from NCBI  ASN.1, ACEDB, GenBank flat file, 
and EMBL flat file formats to XML.  One could also use the XML PERL modules 
to automatically generate XML from bio PERL objects.  Finally, the 
automated database interfaces proposed above would result in each database 
and tool generating its own data type definition.  Does anyone see a way to 
motivate standards generation out of this mess?  Note that motivate is the 
operative verb since we obviously have no means to enforce compliance.

Is BIOML under active development?  While the draft spec seems relatively 
simple, there are lots of things that it leaves unresolved.  For example, 
you can specify a domain on DNA or proteins sequence, but the type of 
domain appears to be free text.  I agree with Lincoln Stein's kitchen sink 
assessment of BSML.  The sheer complexity of NCBI's  ASN.1 definitions were 
probably more of an impediment to broad utilization than the ASN 1 itself 
was, and this seems like a real concern for BSML.

Is anyone aware of plans on the part of the database organizations to serve 

An alternative to agreement on a bio DTD is to push the burden of data 
resolution issues onto the client.  In writing an applet for a specific 
display function, you would need to know the relationship of the various 
fields in the data sources that you were referencing.  This seems less 
desirable, but at least it is a way forward.

Thoughts or suggestions?


David J. States, M.D., Ph.D.
Associate Professor and Director
Institute for Biomedical Computing
Washington University in St. Louis
700 S. Euclid Ave.
St. Louis, MO   63110

tel: 314 362 2134
fax: 314 362 0234

=========== Bioperl Project Mailing List Message Footer =======
Project URL:
For info about how to (un)subscribe, where messages are archived, etc: