[Bioperl-l] Request for advice and pointers on a project to help biologists d o simple formatting and analysis

Stefan Kirov skirov at utk.edu
Tue Mar 8 13:20:04 EST 2005

I like a lot this idea.
First my answer to your first 2 questions: no, no.
But I bet may biologists would scream in pain just hearing the word 
console (as you mentioned). So I offer 0 step (bait to learn a little UNIX).
Imagine a simple web form that is hooked to the perl interpreter (might 
be tricky from a security point, still it could be restricted in several 
ways) and does (amazingly) what the biologist types in. This would have 
to include file uploads/downloads as well. Of course the capabilities 
will be quite restricted, but the appetite comes with eating as some say 
and suddenly the console might be not a bad idea (thus Mac shares would 
go up :-) ).

Amir Karger wrote:

>I've gotten the impression - in my short time in bioinformatics - that
>biologists get very frustrated with data formatting and analysis tasks.
>Which is too bad, because many of these tasks are trivial for someone with a
>bit of Perl knowledge. Then again, we can't force them to learn Perl, even
>if it would be For Their Own Good.
>I was thinking it would be useful to have a toolkit of outrageously simple
>Perl one-liners.  Here's one:
>    # Merge two lists, removing duplicates (logical OR)
>    perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile
>A biologist (call her Sue) would look through a website containing a bunch
>of (searchable, categorized, etc.) scripts, cut & paste the Perl into Unix
>(from a website), then backspace over the filenames and type in their own
>filenames, and end up with something like this on the command line:
>myhost>perl -ne '$seen{$_}++; END {print keys %seen}' genes1 genes2 >
>The biologist hits return & voilà! Instant data munging!
>Of course, I'm not the first one to identify this problem or try to solve
>it.  But I think I'm working on a slightly different problem than previous
>solutions, and my (complete lack of) interface is different too.  Here's the
>"prior art" I've seen in this area, compared and contrasted with my idea.
>- EMBOSS et al.: solving harder bioinformatics problems; Interface is Unix
>- Bioperl's bioscripts: harder problems; Perl executables
>- Taverna / myGrid: fancy GUI interface (but I do think of my scripts as
>I'm really aiming for the lowest of low-hanging fruit here. I don't want
>scripts that run Blast or do fancy analysis. Rather, we'll have scripts like
>the above to merge lists, or get the standard deviation of column 7 of
>tabular data, or get the GenBank IDs of the top 10 hits from a BLAST output,
>or whatever. These are all tasks that're trivial in (Bio)Perl - and some you
>can even do in Excel - but most biologists won't know either Perl or fancy
>Excel.  Think of it as pipelining software for your vterm100.
>Why one-liners?
>- really, really fast development of new tools (especially compared with GUI
>- no installation necessary, no dependencies (except Perl)
>- no download necessary; just cut and paste a tool from the web page
>- biologist doesn't need to learn an interface
>- if a biologist learns just a bit of Perl, they can tweak the one-liners:
>much easier than writing from scratch, but makes tools much more flexible
>- take advantage of existing tools' APIs: perl -MBio::Perl -e '...'
>Potential problems:
>- psychological barrier to using command line (I figure I'll aim first at
>the Unix-aware subset of biologists first, and leave complete World
>Domination to Phase 2.)
>- we can't fit error-handling into one-liners. Caveat scriptor
>So my questions for you bioperlers (finally!):
>- Are there other projects that have tried to solve this niche of problems
>i.e., allowing biologists to do simple formatting & analysis of biological
>or tabular data?
>- Are there at least discussions of this issue that I could read somewhere
>for ideas (e.g., bioperl-l archive)?
>- Does anyone have any free advice (positive or negative or both) to offer
>for this project?
>- Are there any other lists I should post these questions to?
>The working name for my toolbox of bio scripts is "Scriptome".  If it ever
>gets off the ground (and anyone cares), I'll post more info about it, along
>with a request for more advice, I'm sure.  
>-Amir Karger
>akarger at cgr.harvard.edu
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org

Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
tel +865 576 5120
fax +865-576-5332
e-mail: skirov at utk.edu
sao at ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"

More information about the Bioperl-l mailing list