[Bioperl-l] [Open-bio-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL
cjfields at illinois.edu
Thu Jan 21 08:34:12 EST 2010
The relevant code is in Bio::Annotation::TagTree in bioperl-live, which is a decorator for Data::Stag:
This is where the text output is derived from. It's a bit of a heavyweight solution to the problem, but it's capable of round-tripping the DE data and parses out the data in a way that's approachable. We could probably abstract out the serialization backend there and allow a pure bioperl solution (or the current solution) as a fallback.
If the plain-text DE info is represented in a hierarchy already in UniProt XML, we should probably conform as closely as possible to that (using a standard format like XML, JSON, etc.).
On Jan 21, 2010, at 6:33 AM, Peter wrote:
> Hi all,
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
More information about the Bioperl-l