New Bio::Seq and Bio::Seq::Parse (.025 BETA)

Georg Fuellen fuellen@dali.Mathematik.Uni-Bielefeld.DE
Tue, 18 Mar 1997 19:38:01 +0000 (GMT)

Steve wrote,

about ``always have valid objects''
> I'm not sure this is always possible.  When it is possible, it may require
> doing non-intuitive things.  What if someone initializes a bio::seq to be
> DNA but puts in illegal bases? 

I would carp and then the user knows that warnings and even fatal errors are 
possible later on. 

> > > > The problem w/ comma-separated is that according to our current
> > > > specs, comma is a legal component of an ID; we only carp on whitespace.
> > > > In other words, ``Mus,musculus'' is a legal ID.
> > > > Since non-whitespace is also a legal component of filenames on many systems 
> > > > I believe, I'd like to keep the convention.
> > > 
> > > I thought ID's had to be in '\s'; if not, maybe they should be.  Further,
> > 
> > Do you mean ``\S'', i.e. everything but space and ``\t\n\r\f''  ??
> Ooops.  Meant \w.  Sorry!!!

I strongly believe we should support ids with ``\S''; I've seen them in 
Fasta-files, Nexus files, etc, etc.

> > > whitespace is a legal component of most filesystems.  (It is on Unix,
> > > Macintosh, and Windows, for example). 
> > 
> > Space (`` '') may be OK, but newline (``\n'') certainly not ?!
> On unix, at least, \n certainly is valid.  Typically the only illegal
> characters are "\0" and "/".  Some filesystems even allow those.

I guess I failed to say that I'm talking about filenames. In a lot of
cases, the filename will give rise to the (default) id and vice versa.

> We discussed this before iwth Fasta/FastA/FASTA/fasta.  Changing all of
> these to lower case follows the same rationale  (DNA/dna/Dna?)
> OtherSeq/Otherseq/otherseq.  Everything should be kept consistent, and
> lower case is easy for htis

Ok, ok, will use dna,rna,protein in the future...

> > Hm. What about ids that we inherit from somewhere ? E.g. from a file ?
> > On a parallel machine, this won't work either I think. What about other
> > distributed computation; CORBA may offer solutions, but it's another
> > big can of worms although I feel that we'll have to open it at some time -
> > does anyone know more about CORBA ? (I've just heard rumors! :)
> Why would you inherit ids?  These ids are ONLY for setting names of
> bio::seq's.  I don't see how parallel programs and/or CORBA have anything
> to do with it.  We only need to guarantee that id's are unique within a
> given program.

For me (and in the current code, e.g. parse_fasta), the ids are the identifiers 
you find in files, etc, etc. It seems that you're introducing a new notion of
id, the merit of which is rather unclear to me.

best wishes,