[Bioperl-l] ASN1 and BioPerl (and XML too)
pierre_rioux at yahoo.com
Thu Feb 17 16:47:44 EST 2005
Thanks for all the replies. Here's my plan, based on your
comments. There won't be much arguing happening in this
thread, by the way, because I agree with everything that
- You've pointed out that the ASN.1 parser I was proposing
should probably be an independent CPAN module; I agree.
I am so used to manipulating ASN.1 files containing
biological data that I'd forgotten it's not a data format
specific to biology.
- The code I have right now works but it's quite primitive; it
used to be Perl 4 (FOUR!) code I wrote back in 1996, and
I spent some time last week upgrading it somewhat, but
its origins still show. I'll work a little bit more on it
before I think it's suitable for release. The parser is custom
and the API too, the ideal solution of course would be to look
at what exists in the Java world and implement an object model
with identical method names. But for sure, the first release
will be nowhere near complete: it will JUST read ASN.1 text and
provide a way to access the data fields. ASN.1 purists will
dislike it, I'm sure. :-)
- Eventual integration with BioPerl will probably wait until
some more design and cleanup work is performed. I am
willing to do this in my spare time, it's a nice project, but
I'm not sure how quickly I'll be able to progress.
- Hilmar asked me for the code I have right now, I'm about to
send it to him personally (with some doc and examples). If anyone
else is interested, don't hesitate to write to me.
- Some of you mentioned XML. Personally I like XML a whole
lot better than ASN.1. NCBI *does* provide a way to transform
any ASN.1 record into XML automatically. Their tool "asntool"
can be used as a data transformer, and it's quite straightforward.
For the curious, here's a way a filehandle can be opened to return
a XML version of an ASN.1 document (here, 'gc.prt'):
my $fh = new IO::File "asntool -m asn.all -v gc.prt -x stdout |"
or die "Can't open pipe to asntool: $!\n";
- Some of your replies mentioned a great many number of .asn module
files that NCBI provides and are often needed to deal with ASN.1
documents and applications. I'd like to point out that NCBI
also provides all the module definitions in a single file, it makes
handling ASN.1 documents much easier. As shown in the code snippet
above, the file with all the module definitions is called "asn.all".
I personally never bother figuring out which one I need for
the -m argument of asntool, I always supply "asn.all" and concentrate
of the other args.
- Yet another XML comment. Actually a quick plug. I worked for a
genomics company that has since shut down. There was some XML-handling
software I wrote while there that I felt would be useful to the
scientific community, but management would not release it because
they were afraid of liabilities. I rewrote the whole library (it's
a Perl module) and it's available on sourceforge. It's not a CPAN
module because I don't know how to package a CPAN module (yet). If
you like to design applications which create, load and manipulate
XML data, have a look: http://pirobject.sourceforge.net
It's fast and unlike most XML software layers out there the basic
data model philosophy aims for simplicity and elegance. The XML
looks good (IMNSHO), unlike NCBI's XML (which is awful).
Thanks everyone. Have a great day!
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
More information about the Bioperl-l