[Bioperl-l] ASN.1 and BioPerl ?

Peter.Robinson at t-online.de Peter.Robinson at t-online.de
Sat Feb 12 16:37:56 EST 2005

On Sat, Feb 12, 2005 at 01:20:30PM -0800, Hilmar Lapp wrote:
> The ASN.1	 parser would be very useful, in particular for implementing 
> the NCBI Gene parser I suppose.
> I do suggest though that you publish this as a separate module on CPAN, 
> as supposedly it is (or meant to be?) generically useful, so I 
> completely agree with Chris on this.

I also agree that it would be better to have the module on CPAN; if you 
been inspired to use the module to incorporate Entrez Gene into BioPerl I 
would be happy to help out as I can. My initial experiences with this suggest it will not be easy.

> I need an NCBI Gene parser implemented in the Bio::SeqIO framework 
> returning compatible Bio::SeqI objects within the next few weeks. The 
> speed needs to be at least several records per second, ideally 10/s or 
> higher.
> My understanding is that Peter has a grammar-based parser in Java 
> (speed I don't know), and Steve has a Parse::RecDescent-based parser in 
> perl (not bioperl) which is (expectedly) slow.
> I've seen Graham Barr's module on CPAN but haven't tried it yet; it 
> seemed to me that you need the ASN model definition to start with, 
> which I haven't seen at any obvious or not-so-obvious place on the NCBI 
> ftp site, so I either missed something or you have to download the 
> entire toolkit or something else.

You might want to take a look at this


note that there appear to be some inconsistencies between some Entrez Gene records and this specification (or I have misunderstood something).

After having played around with perl, bioperl, lec/yacc and more recently antlr, I have the impression that this is a doable task using antlr and a modest amount of Java code. (Doable meaning it is possible to extract the information one wants from a species-specific ASN.1 Gene file). Given my schedule I don't know when I will be able to finish this, but I will send the list a mail presuming there is no bioperl tool to do this by then.


More information about the Bioperl-l mailing list