[Bioperl-l] Genomics Gets a New Code: GEML

Eugene.Leitl@lrz.uni-muenchen.de Eugene.Leitl@lrz.uni-muenchen.de
Wed, 13 Dec 2000 00:52:29 +0100 (CET)


Genomics Gets a New Code: GEML 
by Kristen Philipkoski 

2:00 a.m. Dec. 12, 2000 PST 

The Internet uses HTML, and soon perhaps genomics will use GEML. 

At least that's what Rosetta Inpharmatics (RSTA), the creators of
Genetic Expression Markup Language, or GEML, is hoping. The
prestigious science journal Nature adopted the language on Monday,
which should give a significant boost to its acceptance in the
scientific community.

 See also: 
 Myhrvold: Genomics Will Rule
 Genome Map Heralds Cheap Drugs
 Genetic Data Glut Looms
 Gene Researchers Get SNPpy
 Check yourself into Med-Tech

Standardization of data is a big worry for genetic researchers at the
moment, with the unprecedented glut of information generated by the
Human Genome Project, an effort to locate every human gene. A working
draft of the map was completed in June.

The project has spawned over 400 individual databases at companies and
academic institutions, containing information about the jobs that
genes and proteins perform -- data that researchers need to share and
exchange in order to make discoveries that will benefit human health.

"It's not who's got the best technology, but who knows best how to
share the information," said Friedrich von Bohlen, CEO of Lion
Bioscience (LEON) at a conference in October. "We have to integrate
all of the types of data in the world and in the end bring
intelligence to the system."

GEML is a standardized format that helps scientists do just that.

In November, Rosetta launched the GEML community, a group of
organizations -- including Harvard University, Agilent Technologies,
Spotfire, and Europroteome -- that will develop and promote the

"Standardization of gene expression data sets is necessary for both
the exchange and publication of genomic research," said Annett Thomas,
managing director at the Nature Publishing Group, in a
statement. Nature and its sister publication Nature Genetics have
published some of the most cutting edge genetic research.

The GEML format is designed to consistently label genetic information
coming from biochips -- chips that can show researchers tens of
thousands of genes at a time, and point out which are
active. Companies like Affymetrix and Agilent have developed biochips
that can look at up to 60,000 genes at a time.

Other companies have their own solutions to the standardization
problem. Lion Bioscience has its own standardization platform, and
IBM's life sciences unit is working on a product called DiscoveryLink
-- a virtual database that will allow scientists to mine information
from different types of files, from graphic to database to text, to
find genetic or protein information.

Physiome Sciences has developed a similar technology using an
XML-based language called CellML. It helps researchers create models
of living systems to predict which drugs will work before they begin
clinical trials. Scientists can create a mathematical representation
of any type of cell -- from heart, to lung, to kidney -- and perform
simulations to test drugs.

According to Metcalfe's law, penned by 3Com founder Robert Metcalfe,
the more people who use any system, the more valuable it becomes. And
since researchers will now be required to submit papers using GEML,
the value of the language should increase exponentially.