Bioperl-guts: Embl parsing suggestions
Mon, 20 Mar 2000 10:51:33 +0000 (GMT)
On Mon, 20 Mar 2000, Ewan Birney wrote:
> On 19 Mar 2000, Keith James wrote:
> > Hi all,
> > I've had a look at the parsing of features & qualifiers from EMBL
> > files and made some fixes (just on my local copy - nothing is cvs
> > commited yet).
> > Unquoted qualifers (e.g. /codon_start=2) are now split into
> > qualifer/value correctly.
> > Locations like 1234^1235 etc. are still ignored.
> Right. I guess those should go to
> $feature->start() = 1234
> $feature->end() = 1235
> $feature->has_tag_value('point_feature') = 1;
Or rather: $feature->has_tag_value('point_feature', 1);
> or something similar. Not so pretty...
> > Apparently Ace can output feature tables where a "" is split, leaving
> > a lone quote at the end of a line and causing premature termination of
> > a multiline qualifier. This is handled now.
Keith, if you're using the embl output from acedb
I'd be a bit careful; we've had problems with
partial genes on the reverse strand being labelled
psuedogenes when they're not.
> > Locations with < and/or > cause a roff_l and/or roff_r tag to be added
> > to the feature, indicating that it runs off that end. I was going
> > indicate which end (5- or 3-prime) was present, but couldn't rule out
> > strand 0 features where this wouldn't make sense.
> As EMBL does not have the concept of a strand 0 feature, I think we can
> put in the 5/3' ness of the uncertainity.
I guess that for reverse strand genes
"5_prime_missing" would indicate that the start of
the gene is missing? Would it be less ambiguous
if we had "start_not_found" and "end_not_found"?
James G.R. Gilbert
The Sanger Centre
Wellcome Trust Genome Campus
Cambridge Tel: 01223 494906
CB10 1SA Fax: 01223 494919
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org
For info about how to (un)subscribe, where messages are archived, etc: