Mon, 04 Sep 2000 11:29:29 -0400
Hilmar Lapp wrote:
> reported through warn() instead of throw() in the genbank parser. Some of
> you may object to this, but I'm myself tired of being thrown out of loops
> over chunks of entries just because of one single misformatting. This
> raises again the issue of a switch that can be user-enabled and causes
> The only way this could be achieved with a reasonable effort is by
> mapping languages to a common meta-representation, like XML or ASN.1 (and
> anything the meta-format doesn't cover will still be lost).
I'm a lurker in this list and not really committed to BioPerl as an overall
approach, but I feel compelled to deposit my two cents here. I've been a
surprised at times by the willingness to make compromises to get results.
Possibly, this is because I'm using bioinformatics for hypothesis-
testing, which requires a different approach to data quality issues
and systematic biases than when one is using bioinformatics to generate
clues to guide drug discovery or experimentation.
Anyway, I would very much like to see a discussion about taking the
long view on developing rigorous data models. In response to NCBI's
data model, alot of people in bioinformatics have said that ASN.1 was not
a language that humans could read and that this made the whole excercise
pointless, because primary data entries had to be easily human-readable.
Furthermore, the complexity of NCBI's software toolbox in C is daunting.
So instead of using NCBI's API-- which effectively gives you totally
portable and direct access to the data model-- most people in bioinformatics
download GenBank flat-text files and then parse them with Perl scripts.
Are projects like BioPerl headed toward the daunting complexity of
NCBI's toolbox, but without the same level of rigor? And has the
scale of data interchange become so vast that meta-descriptions of data
in languages like ASN.1 and XML are the only way to prevent the Babel
effect? Is it time for a change in approach? Just five years ago most
people doing bioinformatics were self-taught as programmers, and this
made it easy to justify taking the path of least resistance when it
came to development. But now it seems that we have lots of highly
trained computing talent out there, and lots of ambition and enthusiasm
about systematic projects like BioPerl. Is it time to bite the bullet,
before things get worse?
Arlin Stoltzfus (email@example.com)
CARB (www.carb.nist.gov), 9600 Gudelsky Dr., Rockville, Md 20850
ph. 301 738-6208; fax 301 738-6255; www.molevol.org/camel