[Bioperl-l] Seq(IO) documentation thoughts

Ewan Birney birney@ebi.ac.uk
Mon, 18 Dec 2000 10:34:00 +0000 (GMT)

On Mon, 18 Dec 2000, Hilmar Lapp wrote:

> Kris Boulez wrote:
> > 
> > One thing I want to ask here. How far do we want to go in having BioPerl
> > (SeqIO) being a format convertor ? I've played around a bit with
> > converting one sequence formati (a) to another (b) and using (b) as
> > input for another round. It turns out that after some rounds (mostly <10)
> > BioPerl isn't -w clean anymore ('use of uninitialized value ...') or
> > just throws an error. Is it worth investigating this, or do we just say
> > that we only support one conversion.
> > 
> We had a similar discussion some months ago. Quoting myself :O) from
> http://bioperl.org/pipermail/bioperl-l/2000-September/001282.html:
> --- quote on
> The point I'd like to make may be best illustrated by comparing with
> automated language translators that are around (like babelfish;
> babelfish.altavista.com). Try to translate an only slightly complicated
> sentence from one language into another, which already screws it up
> half-way, and then translate the result into a third. I think it is
> pointless for BioPerl to aim at clean and complete conversion from any
> rich format into another rich format for sequences.
> The only way this could be achieved with a reasonable effort is by
> mapping languages to a common meta-representation, like XML or ASN.1
> (and
> anything the meta-format doesn't cover will still be lost).
> --- quote off
> This more or less was approved consensus, at least to my humble
> understanding. I can easily imagine that there are still parsing bugs
> making things worse, and these obviously need to be eliminated.

I agree with Hilmar here for between rich-format transfer, but I do think
that mutliple read->write cycles of one format should have minimal if not
0 information loss. We could also try for an embl->genbank->embl loop
being close to 0.

the sorts of things I am ok at "losing" is white space formatting in the
comments. maintaining this inside the objects effectively means
maintaining the file as it was read in an object. Not nice in my view...

> 	Hilmar
> -- 
> -----------------------------------------------------------------
> Hilmar Lapp                                email: hlapp@gmx.net
> GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
> -----------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420