CCL: Re: XML for Bioinformtics Data


None


Gerald Loeffler <Gerald.Loeffler@vienna.at> wrote,
> I know that XML is currently being used for chemistry-related data (CML,
> see http://www.xml-cml.org/), but I haven't heard of any efforts in the
> area of Bioinformatics.

See http://bio.perl.org/Projects/XML/
and the discussion at
http://www.uni-bielefeld.de/mailinglists/BCD/vsns-bcd-perl/9901/index.html

> So please view this message as targeted towards
> the Bioinformatics community that is not served by CML. (CML has a
> DNA/protein sequence tag.)

(note that I'm not subscribed to CCL-- pls mail me Cc's if you respond
to this mail.)

best wishes,
  georg
fuellen@alum.mit.edu
Univ. Bielefeld, Research Group in Practical Comp. Science
http://www.techfak.uni-bielefeld.de/~fuellen/
<<<<<<<<<<<<

> With the permission of Mr. Loeffler:
> 
> -----Original Message-----
> >From: Gerald Loeffler <Gerald.Loeffler@vienna.at>
> >To: Computational Chemistry Mailing List <chemistry@infomeister.osc.edu>
> >Date: Friday, April 30, 1999 4:00 AM
> >Subject: CCL:XML for Bioinformtics Data
> 
> >Hi!
> >
> >Recently, I've been working a lot with XML (see http://www.w3c.org/xml/
> >and e.g. http://www.ibm.com/xml/), which is a standard, human-readable,
> >extensible markup-language that is rapidly becoming _the_ method of
> >choice for exchange and storage of any kind of data and documents. It
> >seems to me that XML would simply be _perfect_ for data exchange and
> >maybe even data storage in bioinformatics (see end of message for a note
> >on chemistry and CML).
> >
> >E.g. (from the top of my head), a DNA/protein sequence similarity search
> >engine (e.g. NCBIs BLAST server) might return its search results in the
> >form of an XML document that
> >could look like this:
> >
> ><seq-sim-search-results>
> >  <query>
> >    <type>                         protein     </type>
> >    <seq name="My stupid peptide"> GAVLIFYWSTQ </seq>
> >    <algorithm>                    FASTA3      </algorithm>
> >    <db>                           SwissProt   </db>
> >    <gap-open>                    -12          </gap-open>
> >    <gap-extension>               -2           </gap-extension>
> >  </query>
> >  <hits>
> >    <hit>
> >      <accession>      HPS_HUMAN    </accession>
> >      <organism>       homo sapiens </organism>
> >      <overlap>        11           </overlap>
> >      <overlaping-seq> GAEVLFYWTDQ  </overlaping-seq>
> >      <z-score>        129.3        </z-score>
> >    </hit>
> >    <hit>
> >      <accession>      PA24_MOUSE   </accession>
> >      <organism>       mus musculus </organism>
> >      <overlap>        8            </overlap>
> >      <overlaping-seq> VFIFYWTT     </overlaping-seq>
> >      <z-score>        133.3        </z-score>
> >    </hit>
> >  </hits>
> ></seq-sim-search-results>
> >
> >There are several important points here:
> >
> >1) Without knowing what this XML document is about, a program can assert
> >that it is well-formed! These programs exist, are free and are
> >applicable to all XML documents!
> >
> >2) The rules for the nesting and naming of the tags in XML documents of
> >this type can be formally defined in XML. The above document would be of
> >type "seq-sim-search-results" and you could easily write a formal
> >definition (in a DTD file) that says that such a document must contain a
> >"query" and a "hits" tag; the "query" tag in turn must contain exactly
> >one of each "type", "seq", ... The "hits" tag in turn may contain 0 or
> >more "hit" tags which in turn ...
> >
> >3) Having a formal definition of documents of this type, a program can
> >verify that our above XML document complies with the formal definiton
> >(is valid). These programs exist, are free and are applicable to all XML
> >documents!
> >
> >4) Free utilities exist (e.g. IBMs xml4j) that can programmatically
> >write and read (parse) any XML document and thus give a program access
> >to the structure and content of the document!! (No more perl-parsers for
> >BLAST-output!!)
> >
> >5) This file is human-readable! (in contrast to a Corba struct or a
> >serialized Java object!)
> >
> >6) Modern WWW-browsers can (if a style-sheet is supplied) directly
> >display this XML document. For old browsers, the XML document can easily
> >be converted to HTML for display.
> >
> >I think you get the idea.
> >
> >Does such an XML-based approach sound reasonable?
> >What does this approach leave to be desired?
> >Are efforts underway in this direction?
> >Wouldn't it be a better world if we all used XML (-:
> >
> >I know that XML is currently being used for chemistry-related data (CML,
> >see http://www.xml-cml.org/), but I haven't heard of any efforts in the
> >area of Bioinformatics. So please view this message as targeted towards
> >the Bioinformatics community that is not served by CML. (CML has a
> >DNA/protein sequence tag.)
> >
> >        cheers,
> >        gerald
> >        cheers,
> >        gerald
> >--
> > Gerald Loeffler
> > Email: Gerald.Loeffler@vienna.at
> > Smail: Apollo Imaging, Marchettigasse 7, A-1060 Vienna, Austria
> > Phone: +43 676 3289588 (+43 1 5952333 27)
> > Fax:   +43 1 5952333 20
> > Keywords: Java, CORBA, OOA&D, Databases, Bioinformatics,
> >           Computational Biology, Computational Biophysics
> -----
>  //=\   Vicki Brown <vlb@deltagen.com>
>  \=//    Journeyman Sourcerer: Scripts & Philtres
>   //=\    (Mac)Perl, awk, sed, *sh..., occasional C
>   \=//     A little web-gardening on the weekends
>    //=\
>    \=//      Deltagen, Inc.
>     //=\     1031 Bing St, San Carlos, CA 94070
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
> 

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl-guts.html
====================================================================