[Bioperl-l] Re: Gene Structure / GenScan

Ewan Birney birney@ebi.ac.uk
Tue, 1 Aug 2000 12:54:08 +0100 (GMT)

On Tue, 1 Aug 2000 hilmar.lapp@pharma.Novartis.com wrote:

> The Ensembl Genscan parser Ewan sent yesterday seems to be a good starting
> point. However, I'd prefer to have a gene structure represented optionally
> independent of the/an underlying sequence (object), that is, as a feature
> which may or may not have a sequence attached. In addition, a parser should
> not need to rely on being provided with the source sequence, and the
> resulting gene structure representation can be attached to the pertaining
> source sequence by the client.
> I'd propose the following:
> Bio::SeqFeature::GeneStructure is-a Bio::SeqFeature::Generic (or just a
> Bio::SeqFeatureI ?)
> and offers specific support for gene structure related things, like
>    $gene->promotor();
>    $gene->initial_exon();
>    $gene->exon($which);
>    $gene->intron($which);
>    $gene->all_exons();
>    $gene->terminal_exon();
>    $gene->poly_adenylation();
> All of the above would return a SeqFeature::Generic object. The following
> can only work if a sequence (and the correct one!) is attached:
>    $gene->cds();               # returns a string (exons concatenated,
>    phase to be taken into account)
>    $gene->translation();  # dto.
> The problem with the latter is that there appears to be this phase
> ambiguity for the exons if the prediction is not complete (i.e., initial
> exons is missing). As a first guess, I'd suspect that at least GenScan
> would not predict a CDS (= concatenated exons) that contains stops within
> the correct phase. So, ideally I'd check which of the three frames does not
> yield an intervening stop, and take that. I guess the Ensembl people will
> have checked for this, and probably it wasn't as easy.  Any experiences out
> there?

Aha. Now you want the appropiate Ensembl gene objects, not the genscan
parser. Look at


Look at


Again, I would be happy if these moved "across" to bioperl.

you will want to add additional stuff to the Gene object to handle
promoters (or perhaps the transcript object). Don't forget about
alternative splicing.

> What do you think? And please let me know what I'm duplicating here from
> what other people have already written.
>      Hilmar

Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420