[Bioperl-l] xml sequence download from ncbi

Ewan Birney birney@ebi.ac.uk
Thu, 24 Aug 2000 14:21:04 +0100 (GMT)

On Thu, 24 Aug 2000, Geer, Lewis (NLM) wrote:

> Hi, 
> Sequence download using an xml format derived from our asn.1 standard format
> is now available from Entrez.  For an example, try
> http://www.ncbi.nlm.nih.gov/entrez/viewer.cgi?cmd&save=on&view=xml&val=18279
> 15  where val is the sequence gi number.  Note that this xml output is based
> on our asn.1 records which are both complete and complex -- we may end up
> making a genbank flatfile-like version, especially since there are small
> mismatches between the asn.1 and xml languages that make the xml a bit more
> complex than if xml was our native format.

Very interesting. Great stuff. I'm going to forward this onto the bioxml
lists as well.

The DTD is clearly very asn based (lots of nesting). How stable are parts
of this XML? What is the "going forward" view of XML from NCBI (is
there one?) Hmmmm. 

I would actually suggest that there is not a tight coupling if possible
between the internal ASN.1 model and the actual XML dumped - I'm trying to
prevent the foreseeable problems of NCBI wanting to move their ASN.1
model: however, of course, someone has to write the code for that and
manage it, and there are arguments to say that this is possibly better
done in, say, bioperl and NCBI just should make a clear, easy-to-parse
data dump...

Any volunteers for writing the bioperl parser for this (?). I suspect
we wont know really what comments we have on this until someone starts
bashing out a parser for it.

> We'd be interested in seeing comments!

Thanks for letting us comment. It is great to see a NIH person on this

> Lewis
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420