Back (for now)
Steven E. Brenner
Thu, 3 Jul 1997 18:10:29 +0900 (JST)
Thanks for your detailed thoughts about the relationships between
modules for 2D and 3D structure. [My comments deal with your overall
design; I agree with all the issues, so I haven't repeated the text.]
I like the general outline; I think I agree with you that 2D structure
is a module which should somehow be applicable to both 3D and 1D
structure. One consideration is that every (modern) 3D protein structure
has a known 1D structure (i.e., sequence). So, perhaps an easy way to
impelement all of this would be that a 3D-structure has a 1D-structure,
and a 1D-structure has a 2D structure.
I use 'has a,' as one sort of relationship, though I haven't figured out
if that is best. Comments appreciated! One reason for this approach is
that 1D-structure and 2D-structure are both discrete and linear.
3D-structure is neither; any atom can be in any place (though there are
obviously some correlations), and the atomic geometry is not a linear. So,
2D-structure more neatly maps onto 1D-structure; since we do need to link
the 1D and 3D strucutre, we might as well use that link to get to the 2D
I like your thoughts about folds (e.g., 4-helix-bundle), as a
description of the 3D structure; I had not previously considered this.
However, these describe a domain as a whole rather than any particular
details of either the secondary or tertiary structure. Perhaps we should
have a DomainDescription module which is sort of like the 2D-structure
module. Where 2D-structure contains secondary structure elements,
DomainDescriptions have folds. A tricky caveat here is that folds can be
discontinuous in sequence.
> However, there's one case where I can see some overlap between 3D and 2D
> structural issues: circular dichroism (CD) experiments. Using CD you can
> estimate the overall percentage of helix, sheet, and coil in a protein
I think that these data are not archived anywhere and are basically not
much trusted. They can be useful and we should keep the possibility of
using them open. However, I don't think that they are of sufficient
import that they should play a large role in building the hierarchy.
> One more point: my hypothetical Bio::Struct.pm module doesn't know
> anything about 3D structures but delegates this task to Bio::Struct::PDB.pm.
> Similarly, there could be another module that handles strictly 2D issues.
Naming is more of a philosophical and political question than a techical
one. On these grounds, I think that it is important that the object which
knows about coordinates be Bio::Struct. The reason is that the thing most
people will want to do most often is parse in a PDB file and do something
with it -- this "jumble of coordinates" will be the "currency" for
structures just as "Bio::Seq" will be the corresponding one for sequences.
To reduece learning curve and to make things appear as simple as possible,
I think that having a 'Bio::Seq' and a 'Bio::Struct' which are
more-or-less capable of appearing to do everything necessary is important.
> I decided to go ahead and create a scop module it since I knew I
> would be doing alot of work with scop data. scop_dict.cf is a little
> dictionary I created for converting between class/fold number to class/fold
> name. You probably already have such a thing, but it was easy enough to
> create. Here's a snippet:
I see. We do have a similar type of thing which uses cdb files. (It's
just a set of functions. For various historical and performance reasons,
scop is not very OO). As an aside, cdb files are great!
> > I have no objection to this, but curious to know why you want to
> > be able to do slices for revcom, etc.
> I needed to process sequences for all genes on a yeast chromosome. It
> seemed easiest to create a big PreSeq object for the chromosomal sequence
> and then extract sub-sequences for each gene as needed. Since some genes
> are on the complementary strand, I needed revcom() to work like str().
> See, for example:
Ok; this makes sense. I had forgotten about revcom's current
impelmentation. One idea was that it would modify the existing object;
another idea was that it would return a modified object. Right now it
seems to be roughly in-between. :)
My suggested modification (probably can't show up until Bio::Seq) would be
for revcom to return an object with the required modification. Probably
my preferred calling sequence would be:
$mybackgene = new Bio::Preseq ($mychromasome->str($end,$beg));
$mygene = $mybackgene->revcom();
print $mygene->str(), "\n";
Or, maybe we should add another method like getseq to return a sequence
object of a slice:
$mybackgene = $mychromasome->get_seq_obj($end,$beg);
# ick! get_seq_obj is a horrible method name!
$mygene = $mybackgene->revcom();