Bioperl-guts: Embl parsing suggestions

James Gilbert jgrg@sanger.ac.uk
Mon, 20 Mar 2000 10:51:33 +0000 (GMT)



On Mon, 20 Mar 2000, Ewan Birney wrote:

> On 19 Mar 2000, Keith James wrote:
> 
> > 
> > Hi all,
> > 
> > I've had a look at the parsing of features & qualifiers from EMBL
> > files and made some fixes (just on my local copy - nothing is cvs
> > commited yet).
> > 
> > Unquoted qualifers (e.g. /codon_start=2) are now split into
> > qualifer/value correctly.
> 
> Cool.
> 
> > 
> > Locations like 1234^1235 etc. are still ignored.
> > 
> 
> Right. I guess those should go to 
> 
> 	$feature->start() = 1234
>  	$feature->end()   = 1235
> 	$feature->has_tag_value('point_feature') = 1;

Or rather: $feature->has_tag_value('point_feature', 1); 

> or something similar. Not so pretty...
> 
> > Apparently Ace can output feature tables where a "" is split, leaving
> > a lone quote at the end of a line and causing premature termination of
> > a multiline qualifier. This is handled now.

Keith, if you're using the embl output from acedb
I'd be a bit careful; we've had problems with
partial genes on the reverse strand being labelled
psuedogenes when they're not.

> > Locations with < and/or > cause a roff_l and/or roff_r tag to be added
> > to the feature, indicating that it runs off that end. I was going
> > indicate which end (5- or 3-prime) was present, but couldn't rule out
> > strand 0 features where this wouldn't make sense.
> 
> As EMBL does not have the concept of a strand 0 feature, I think we can
> put in the 5/3' ness of the uncertainity. 

I guess that for reverse strand genes
"5_prime_missing" would indicate that the start of
the gene is missing?  Would it be less ambiguous
if we had "start_not_found" and "end_not_found"?

	James

James G.R. Gilbert
The Sanger Centre
Wellcome Trust Genome Campus
Hinxton
Cambridge                        Tel: 01223 494906
CB10 1SA                         Fax: 01223 494919

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl-guts.html
====================================================================