[Bioperl-l] changes to GenBank (and other) parsing in 0.6.2?

Hilmar Lapp hlapp@gmx.net
Sat, 07 Oct 2000 17:41:19 +0200

Mark Wilkinson wrote:
> Just a quick question re: whether there are any changes to the parsing
> of GenBank (or others) to create a SeqI object in the 0.6.2 release of
> BioPerl.  Given that the module that we are writing re-parses this
> parsed output any changes to the output will have repercussions on our
> code and I would like to be as up-to-date as possible when we finally
> release the module.

I guess the general line in parsing strategy as I sent it to you in
response to your reference entry will stay for some (long?) time. This
basically boils down to no interpretation whatsoever being done with the
value of keys or tags, which would have been necessary to resolve the
difficulties you had with the parsing result.

Changing this strategy will be a major change, and I doubt that there are
many in favour of it. It will make the code much more fragile (and it
already is). What do people think? Is there consensus?

There may be other changes driving to a semantically more correct feature
table representation: if the underlying ASN.1 model can capture the
'real' structure much better, switching to ASN.1 or XML as input may
substantially change the resulting hierarchy of seqFeatures. I haven't
yet checked a feature table in ASN.1 (or XML) against the same in flat
text (has anyone done so?).

That's how I see it so far.

BTW if you there are things in your re-parsing that correct for syntactic
errors (i.e., bugs) of the Bioperl parser, please let us know.


Hilmar Lapp                                email: hlapp@gmx.net
NFI Vienna, IFD/Bioinformatics             phone: +43 1 86634 631
A-1235 Vienna                                fax: +43 1 86634 727