[Bioperl-l] Request for advice and pointers on a project to help biologists d o simple formatting and analysis

Andreas Kahari ak at ebi.ac.uk
Thu Mar 10 04:43:29 EST 2005

I'm not quite sure what this has to do with bioperl...

On Wed, Mar 09, 2005 at 01:46:17PM -0500, Amir Karger wrote:
> In a private mail, Richard Copley wrote:

Forwarding private emails to mailing lists are we?

> >Amir Karger wrote:
> >> I was thinking it would be useful to have a toolkit of outrageously
> simple
> >> Perl one-liners.  Here's one:


> >> 
> >>     # Merge two lists, removing duplicates (logical OR)
> >>     perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile
> >
> >sort -u file1 file2
> I know that many of the tasks proposed for the Scriptome can be done with
> grep, sed, cut, Word, or Excel. I'm planning on implementing head, sort,
> join, and lots of others. But how many experimental biologists are familiar
> with Unix cut? How many bother to learn even the least fancy Excel
> functions?  I think not many, because they have other things to worry about.

Hmmm, comparing 'cut' and 'sed' with Word and Excel?  Oh well.

The philosophy of Unix utilities is to do only one thing,
but to do it very well.  In the case with the 'sort' utility
for example, it will most likely use an out-of-core sorting
algorithm to cope with files larger than the available memory
of the machine, and will probably be a fair bit quicker and
flexible than your own implementation.

> One reason so many people have created integrated toolboxes is so that
> biologists only need to learn how to use one tool, rather than learning 30
> or whatever Unix commands.  The goal of Scriptome is that they only need to
> learn one tool AND that the learning curve for that tool is very small. And
> we make the learning curve small by using an extremely lightweight interface
> (most of solving a problem involves searching on a website) rather than by
> trying to create an intuitive GUI.  After all, how many folks  other than
> Apple have created GUIs that are intuitive for more than a small subset of
> people?

The reason why so many people are creating integrating toolboxes
(really, are they?) is probably just because so many people
before them didn't do it right.  Mind you, doing it "right" is
not possible.

I do understand that there is a need for integrated utilities
with easy-to-press buttons, and I won't try to put you off
working on those kind of projects, but...

What would an experimental biologists, who is not familiar with
'sort', 'cut' or 'join', do with a Perl script that implemented
those functionalities?  Wouldn't it be better to provide a
high-level interface to common tasks, like parsing the output
from various programs and providing simple ways of accessing
and manipulating sequence features etc.  If you find ways to
expand the application area of BioPerl, or if you rationalize
and improve existing BioPerl code, then I'm sure the BioPerl
maintainers would be happy to consider commiting your code to
the project.


Andreas Kähäri


More information about the Bioperl-l mailing list