[Bioperl-l] Entrez Gene ASN parsers

Liu, Mingyi Mingyi.Liu at gpc-biotech.com
Sat Mar 12 23:12:37 EST 2005

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Saturday, March 12, 2005 7:33 PM
> To: Liu, Mingyi
> Cc: Stefan Kirov; bioperl-l at portal.open-bio.org
> Subject: Re: [Bioperl-l] Entrez Gene ASN parsers
> I kind of like this approach, i.e., have a general purpose low-level 
> parser that you have reasonable confidence in will never be the 
> bottleneck, and then build a bioperl parser on top of it that now can 
> focus its code on assembling the desired data structure as opposed to 
> the file format itself.

That was my intention too.  I saw plenty of requests that NCBI release Entrez Gene in XML format. But suppose that NCBI did release XML-formatted Entrez Gene files, then to build bioperl objects from the XML files one could take several approaches: 
1. write a module that directly deals with (parses) the XML tags and code everything including object instantiations along with parsing code.  

Or, more likely, 
2. write a module that utilizes the service of an XML parser, let it do its work and make a data structure, then create all objects using that data structure.  This way there's a clear code separation, and one only needs to worry about the data, not the parsing.

My parser does to NCBI's ASN.1 EntrezGene file what an XML parser does to a yet-to-exist XML-formatted EntrezGene file (or better than it, if NCBI decides to code Entrez Gene in the XML format that Eutils provide).  And it performs better than XML parsers.  So I really don't think there's any need for XML file from NCBI.

> And if course assembling that data structure will slow things down a 
> lot but hey, either you want an object hierarchy in (bio-)perl or you 
> don't.

I also agree that using external parser users could choose what they like: (bio)perl objects containing the Entrez Gene data, or just directly use the data structure to pick and choose data.  More flexible for both developers and users.

Just my two cents.


More information about the Bioperl-l mailing list