Back (for now)
Thu, 3 Jul 1997 12:00:53 +0000 (GMT)
Steve Brenner wrote,
> Dear Steve,
> Thanks for your detailed thoughts about the relationships between
> modules for 2D and 3D structure. [My comments deal with your overall
> design; I agree with all the issues, so I haven't repeated the text.]
> I like the general outline; I think I agree with you that 2D structure
> is a module which should somehow be applicable to both 3D and 1D
> structure. One consideration is that every (modern) 3D protein structure
> has a known 1D structure (i.e., sequence). So, perhaps an easy way to
> impelement all of this would be that a 3D-structure has a 1D-structure,
> and a 1D-structure has a 2D structure.
Sounds very elegant !
> I use 'has a,' as one sort of relationship, though I haven't figured out
> if that is best. Comments appreciated! One reason for this approach is
> that 1D-structure and 2D-structure are both discrete and linear.
actually, 2D-structure of RNA may be non-linear -- it can have
pseudo-knots, etc. But that info could be stored extra.
> 3D-structure is neither; any atom can be in any place (though there are
> obviously some correlations), and the atomic geometry is not a linear. So,
> 2D-structure more neatly maps onto 1D-structure; since we do need to link
> the 1D and 3D strucutre, we might as well use that link to get to the 2D
> as well.
> I like your thoughts about folds (e.g., 4-helix-bundle), as a
> description of the 3D structure; I had not previously considered this.
> However, these describe a domain as a whole rather than any particular
> details of either the secondary or tertiary structure. Perhaps we should
> have a DomainDescription module which is sort of like the 2D-structure
> module. Where 2D-structure contains secondary structure elements,
> DomainDescriptions have folds. A tricky caveat here is that folds can be
> discontinuous in sequence.
> > However, there's one case where I can see some overlap between 3D and 2D
> > structural issues: circular dichroism (CD) experiments. Using CD you can
> > estimate the overall percentage of helix, sheet, and coil in a protein
> I think that these data are not archived anywhere and are basically not
> much trusted. They can be useful and we should keep the possibility of
> using them open. However, I don't think that they are of sufficient
> import that they should play a large role in building the hierarchy.
> > One more point: my hypothetical Bio::Struct.pm module doesn't know
> > anything about 3D structures but delegates this task to Bio::Struct::PDB.pm.
> > Similarly, there could be another module that handles strictly 2D issues.
> Naming is more of a philosophical and political question than a techical
> one. On these grounds, I think that it is important that the object which
> knows about coordinates be Bio::Struct. The reason is that the thing most
> people will want to do most often is parse in a PDB file and do something
> with it -- this "jumble of coordinates" will be the "currency" for
> structures just as "Bio::Seq" will be the corresponding one for sequences.
> To reduece learning curve and to make things appear as simple as possible,
> I think that having a 'Bio::Seq' and a 'Bio::Struct' which are
> more-or-less capable of appearing to do everything necessary is important.
> > I decided to go ahead and create a scop module it since I knew I
> > would be doing alot of work with scop data. scop_dict.cf is a little
> > dictionary I created for converting between class/fold number to class/fold
> > name. You probably already have such a thing, but it was easy enough to
> > create. Here's a snippet:
> I see. We do have a similar type of thing which uses cdb files. (It's
> just a set of functions. For various historical and performance reasons,
> scop is not very OO). As an aside, cdb files are great!
> > > I have no objection to this, but curious to know why you want to
> > > be able to do slices for revcom, etc.
> > I needed to process sequences for all genes on a yeast chromosome. It
> > seemed easiest to create a big PreSeq object for the chromosomal sequence
> > and then extract sub-sequences for each gene as needed. Since some genes
> > are on the complementary strand, I needed revcom() to work like str().
> > See, for example:
> > http://genome-www.stanford.edu/~sac/perlOOP/bioperl/lib/Bio/Gene/Seq.pm
> Ok; this makes sense. I had forgotten about revcom's current
> impelmentation. One idea was that it would modify the existing object;
> another idea was that it would return a modified object. Right now it
> seems to be roughly in-between. :)
In UnivAln.pm, there's the 'inplace' flag; if it is set, the existing object
is modified (if that makes sense).
> My suggested modification (probably can't show up until Bio::Seq) would be
> for revcom to return an object with the required modification. Probably
> my preferred calling sequence would be:
> $mybackgene = new Bio::Preseq ($mychromasome->str($end,$beg));
> $mygene = $mybackgene->revcom();
> print $mygene->str(), "\n";
> Or, maybe we should add another method like getseq to return a sequence
> object of a slice:
> $mybackgene = $mychromasome->get_seq_obj($end,$beg);
> # ick! get_seq_obj is a horrible method name!
> $mygene = $mybackgene->revcom();