Bioperl: Parsing Medline Docs

Paul Gordon
Tue, 28 Sep 1999 14:34:31 -0300 (ADT)

> The MEDLARS format is documented in the MEDLINE chapter of the NLM's
> Online Services Reference Manual at: 
> It's quite hard to find otherwise.
> A while ago, I started working on a python module to parse it, but never
> quite finished...
> Jeff
> > Hi
> > Does any one know of perl scripts/modules that parse Medline documents
> > in Medlars format?  Surely someone has attempted this before.

It seems to be a common trend...  I too started to write a MedlarsII
parser in Perl, but didn't finish it because we ended up not using the

There is a distinction to be made here for those who aren't familiar.
Medline as you normally see it is nice and text-based, but NLM
ships it to you in MedlarsII format, which is a very complicated binary
format with embedded EBCDIC text.  If this is what you have, I have never
seen a Perl parser for it (though maybe Boulder::Medline has Medlars
support, I haven't checked it out).

