[Bioperl-l] Re: Gene Structure / GenScan

Ewan Birney birney@ebi.ac.uk
Tue, 1 Aug 2000 14:30:58 +0100 (GMT)

On Tue, 1 Aug 2000 hilmar.lapp@pharma.novartis.com wrote:

> > The Ensembl Genscan parser Ewan sent yesterday seems to be a good
> starting
> > point. However, I'd prefer to have a gene structure represented
> optionally
> > independent of the/an underlying sequence (object), that is, as a feature
> > which may or may not have a sequence attached. In addition, a parser
> should
> > not need to rely on being provided with the source sequence, and the
> > resulting gene structure representation can be attached to the pertaining
> > source sequence by the client.
> >
> > I'd propose the following:
> > Bio::SeqFeature::GeneStructure is-a Bio::SeqFeature::Generic (or just a
> > Bio::SeqFeatureI ?)
> > and offers specific support for gene structure related things, like
> [...]
> Aha. Now you want the appropiate Ensembl gene objects, not the genscan
> parser. Look at
>   Bio::EnsEMBL::Gene
>               ::Transcript
>               ::Translation
> Look at
> http://www.ensembl.org/Docs/Pdoc/ensembl/modules/Bio/EnsEMBL/modules.html
> Again, I would be happy if these moved "across" to bioperl.
> you will want to add additional stuff to the Gene object to handle
> promoters (or perhaps the transcript object). Don't forget about
> alternative splicing.
>      Well, that's not really what I was aiming at. I thought about a
>      representation of the _data_ which make up a gene structure, as, e.g.
>      people find it or programs predict it. IMHO all that _interpretation_
>      of the data (features in this case) belongs to separate classes,

This is a fine distinction to make; I certainly would not object to your
making these feature classes, but if they are "just" to do with gene
prediction - or - even more specifically - just with genscan, put them in
a namespace that indicates this, ie

   Bio::SeqFeature::Gene # bad 

# good namespaces.




>      either derived ones, or within another hierarchy (you could think of a
>      GeneTranscriber who knows about alternative splicing). So, the modules
>      I proposed shouldn't do much with actual sequences apart from maybe
>      very basic things. They're just features, which in the first place is
>      all you need to represent e.g. GenScan results. And they should be
>      rich enough to allow other modules to make real stuff like protein
>      sequences out of it. So, lightweight, but heavy enough.

I worry about us rewriting things (eg, exons) but I do feel confident that
splitting out Genscan "output" from Genscan "interpretation" is a good

go for it.

>      Am I missing something?
>           Hilmar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420