[Bioperl-l] genpept/swiss

Arlin Stoltzfus arlin@carb.nist.gov
Mon, 04 Sep 2000 11:29:29 -0400

Hilmar Lapp wrote:
> reported through warn() instead of throw() in the genbank parser. Some of
> you may object to this, but I'm myself tired of being thrown out of loops
> over chunks of entries just because of one single misformatting. This
> raises again the issue of a switch that can be user-enabled and causes
> The only way this could be achieved with a reasonable effort is by
> mapping languages to a common meta-representation, like XML or ASN.1 (and
> anything the meta-format doesn't cover will still be lost).

I'm a lurker in this list and not really committed to BioPerl as an overall 
approach, but I feel compelled to deposit my two cents here.  I've been a
surprised at times by the willingness to make compromises to get results.  
Possibly, this is because I'm using bioinformatics for hypothesis-
testing, which requires a different approach to data quality issues 
and systematic biases than when one is using bioinformatics to generate 
clues to guide drug discovery or experimentation.

Anyway, I would very much like to see a discussion about taking the 
long view on developing rigorous data models.  In response to NCBI's 
data model, alot of people in bioinformatics have said that ASN.1 was not 
a language that humans could read and that this made the whole excercise 
pointless, because primary data entries had to be easily human-readable.  
Furthermore, the complexity of NCBI's software toolbox in C is daunting.
So instead of using NCBI's API-- which effectively gives you totally 
portable and direct access to the data model-- most people in bioinformatics 
download GenBank flat-text files and then parse them with Perl scripts.  

Are projects like BioPerl headed toward the daunting complexity of 
NCBI's toolbox, but without the same level of rigor?  And has the
scale of data interchange become so vast that meta-descriptions of data 
in languages like ASN.1 and XML are the only way to prevent the Babel 
effect?  Is it time for a change in approach?  Just five years ago most 
people doing bioinformatics were self-taught as programmers, and this 
made it easy to justify taking the path of least resistance when it 
came to development.   But now it seems that we have lots of highly 
trained computing talent out there, and lots of ambition and enthusiasm 
about systematic projects like BioPerl.  Is it time to bite the bullet, 
before things get worse?

Arlin Stoltzfus (arlin@carb.nist.gov)
CARB (www.carb.nist.gov), 9600 Gudelsky Dr., Rockville, Md 20850
ph. 301 738-6208; fax 301 738-6255; www.molevol.org/camel