[Bioperl-l] GO ontology browser module available

Mark Wilkinson mwilkinson@gene.pbi.nrc.ca
Wed, 31 Jan 2001 09:25:51 -0600

Ewan Birney wrote:

> > > Wouldn't it make sense to add it to bioperl-gui?
> > >
> > >     Hilmar
> > >
> > Inasmuch as it is completely separate from SeqCanvas, and we are still
> > thinking bioperl-gui=SeqCanvas, no; but since bioperl-gui could be greater
> > than SeqCanvas, maybe.  Mark?  I think it would be okay.
> Sounds like the right place to me....

indeed - that was where I intended to put it when it was a little more
"polished"... I am just hesitant to use the BioPerl CVS repository to store my
half-baked code.

There are several things which "don't work right" (tm).  I think a lot of this
has to do with the fact that I can not get my hands on the GO.dtd - it isn't
available on the GO website, though all of the other XML files are (yet they
reference the DTD in these same XML files).   Neither do I receive a response to
inquiries sent to the consortium e-mail address.

The consequence is that XML::Parser doesn't know what to do with the HTML-like
formatting tags that they are using in some of their "free text", and in some
cases tries to treat them as sub-level tags (for example,  what should be a
subscript or superscript will become a sub-element of the preceeding word, so
Carbon<down>14</down> parses as $GO->{Carbon}->{14}... which is ridiculous of
course....).  In addition they use HTML designations for the greek alpha, beta,
gamma, and so on, preceeded with an ampersand and ending with a semicolon  These
can not be parsed by XML::Parser *at all* unless it is specifically told that
these are going to be #CDATA elements... which requires a DTD.... which I don't

So, GO_Browser (for the time being) hacks away at the XML in its first parsing
pass, replacing these tags with things that will not break XML::Parser, and then
reads from this hacked data.  As a result, what you get is not "strict" GO
ontology, but a slightly modified version of the same.... which effectively
defeats the purpose of GO which is that everyone should use a consensus
nomenclature.  :-(

In any case, after all that griping, I am perfectly willing to cvs add this
module to bioperl-gui, so long as I am not judged too harshly by it - I know it's
a hack!!   :-)

I'll get on to that later this afternoon.

b.t.w. If anyone can assist me in getting ahold of a GO.dtd please speak up!  It
would make my miserable life a bit brighter!!

Dr. Mark Wilkinson
Bioinformatics Group
National Research Council of Canada
Plant Biotechnology Institute
110 Gymnasium Place
Saskatoon, SK