Bioperl-guts: Embl parsing suggestions
Keith James
kdj@sanger.ac.uk
19 Mar 2000 18:00:44 +0000
Hi all,
I've had a look at the parsing of features & qualifiers from EMBL
files and made some fixes (just on my local copy - nothing is cvs
commited yet).
Unquoted qualifers (e.g. /codon_start=2) are now split into
qualifer/value correctly.
Locations like 1234^1235 etc. are still ignored.
Apparently Ace can output feature tables where a "" is split, leaving
a lone quote at the end of a line and causing premature termination of
a multiline qualifier. This is handled now.
Locations with < and/or > cause a roff_l and/or roff_r tag to be added
to the feature, indicating that it runs off that end. I was going
indicate which end (5- or 3-prime) was present, but couldn't rule out
strand 0 features where this wouldn't make sense.
All the edits I've made are in Bio::SeqIO::embl.pm and Bio::SeqIO.pm.
In order to accept feature qualifiers where a terminal quote doesn't
necessarily mean the end of the qualifier, I needed to buffer the
following line somewhere.
I've added a _pushback subroutine to Bio::SeqIO.pm where the line can
be stored in the object hash. Now _readline checks this buffer first,
before getting a new line from its filehandle.
make test is still passed.
I've never written a Perl module and a lot of the code makes no sense
to me. I'm sure someone will intervene if I'm about to shoot someone
else in the foot.
Keith
--
Keith James -- kdj@sanger.ac.uk -- http://www.sanger.ac.uk/Users/kdj
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl-guts.html
====================================================================