[Bioperl-l] More on PDB and chains...

Dave Howorth dhoworth at mrc-lmb.cam.ac.uk
Fri Sep 15 12:14:57 EDT 2006


Bernd Web wrote:
> Is anyone else on this list using StructureIO::pdb at all?

I don't :(

The problems with writing a PDB parser as I see them are:

(1) Legacy data - for every rule that you try to rely on, there is at
least one PDB file that breaks it! Even today, new files are not fully
validated before release.

(2) Different uses and ambiguity - depending on the use to which
somebody wants to put the data, results may be different. Does a file
with just C alpha positions contain residues? It all depends. Do the
last few observed atoms constitute a residue? It all depends. Does one
always accept the authors' labelling? etc etc. Compatibility with
specific tools and/or other analysis of the same data also influence how
the files are interpreted.

So I think many people write their own parser, with just those tweaks
and interpretations that they require for their application. In mine, I
have over 600 lines just correcting simple errors in various PDB files
so my parser is able to read them. And I don't even read the ATOM
records! I'm just reading header lines. I did say people had different
use cases :)

I would suggest looking at other existing solutions to try to benefit
from that knowledge. The mmCIF/XML data model and the MSD schema can
suggest object and data structures. Andrew Dalke's work on the biopython
module has lots of hard-won experience, I believe. See for example,
<http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc56>

Cheers, Dave


More information about the Bioperl-l mailing list