[Bioperl-l] Request for advice and pointers on a project to h elp biologists d o simple formatting and analysis

Malay mbasu at mail.nih.gov
Thu Mar 10 10:41:16 EST 2005

Hello Amir:

Without going into any arguments, I'll put my two cents into it. The 
mentality to help out biologists is a fundamental mistake. Most of the 
biologists who come into this field already knows the tricks of the 
game, if not they hire someone who knows. But toolmakers in the fields 
believe they have to help biologists, that's why there are too many 
non-specialized tools in the field.

Toolmakers should now concentrate on tools for specialists. There are 
where the main dearth is and it requires a great effort to actually 
satisfy experts in the field. Create tools for the experts if you can.


Amir Karger wrote:
> [snipped throughout for "brevity"]
>>From: Andreas Kahari [mailto:ak at ebi.ac.uk] 
>>I'm not quite sure what this has to do with bioperl...
> 1. From http://www.bioperl.org: "The Bioperl server provides an online
> resource for modules, scripts, and web links for developers of Perl-based
> software for life science research." I assumed bioperl-l was for disucssions
> of doing Bio with Perl.  
> 2. I asked in my original mail: "Are there any other lists I should post
> these questions to?" but no one has suggested any lists or newsgroups yet.
> 3. My original mail also said, "take advantage of existing tools' APIs: perl
> -MBio::Perl -e '...'"  
>>On Wed, Mar 09, 2005 at 01:46:17PM -0500, Amir Karger wrote:
>>>>Amir Karger wrote:
>>>>>I was thinking it would be useful to have a 
>>>>>toolkit of outrageously simple
>>>>>Perl one-liners.  Here's one:
> How many biologists who don't use Perl will read the Perl cookbook? Or were
> you just making a suggestion of where I could take scripts from?
> Actually, looking through the table of contents, I see only a few recipes
> that would fit.  In any case, writing the scripts is not the hard part; it's
> knowing which scripts will be useful and helping biologists find the right
> ones to solve their particular problems.
>>>I know that many of the tasks proposed for the Scriptome 
>>>can be done with
>>>grep, sed, cut, Word, or Excel.  But how many experimental 
>>>biologists are familiar
>>>with Unix cut? I think not many, because they have other 
>>things to worry about.
>>Hmmm, comparing 'cut' and 'sed' with Word and Excel?  Oh well.
> I'm not comparing the quality of sed vs. Find/Replace. Most biologists (at
> least here) prefer Windows. They already use Excel to look at their data.
> Excel has functions to do simple data analysis, but my impression is that
> few biologists use those functions.
>>The philosophy of Unix utilities is to do only one thing,
>>but to do it very well.  In the case with the 'sort' utility
>>for example, it will most likely use an out-of-core sorting
>>algorithm to cope with files larger than the available memory
>>of the machine, and will probably be a fair bit quicker and
>>flexible than your own implementation.
> The Scriptome is not aiming at sorting gigabyte files; does a biologist want
> to sort an entire Genbank file? I think much more often they'll want to sort
> < 10 MB lists of genes or whatever.  On small files, the sorting algorithm
> doesn't matter. If they do try to sort too big a file, the script will
> break, and they'll need to try a different tool. I'm not claiming that my
> solution will solve every conceivable task, just the easy ones. 
>>I do understand that there is a need for integrated utilities
>>with easy-to-press buttons, and I won't try to put you off
>>working on those kind of projects, but...
>>What would an experimental biologists, who is not familiar with
>>'sort', 'cut' or 'join', do with a Perl script that implemented
>>those functionalities?
> sort, cut, or join files! I don't think I understand your question.
> An experimental biologist who knows just a little Unix can take a sorting
> script, paste it to the command line, and use it.  We're talking about use
> cases where the biologist knows exactly what they want to do - sort a file,
> merge files together, pull out the 8th column from the data into a new file,
> etc. - but not how to implement a solution.
> Who knows? Maybe eventually we'll decide to put "sort -u file1 file2" as a
> "script". But we wouldn't want to use *only* Unix commands because that
> ignores all the stuff Unix can't (easily) do.  
>> Wouldn't it be better to provide a
>>high-level interface to common tasks, like parsing the output
>>from various programs and providing simple ways of accessing
>>and manipulating sequence features etc.
> That's exactly what I want to do. My interface is searching for a tool on a
> website and pasting it onto the Unix command line.  
>> If you find ways to
>>expand the application area of BioPerl, or if you rationalize
>>and improve existing BioPerl code, then I'm sure the BioPerl
>>maintainers would be happy to consider commiting your code to
>>the project.
> I believe my project is complementary to Bioperl's bioscripts, but it aims
> at a different set of tasks, namely, tasks that are so simple that
> Bioperlers haven't bothered to commit the scripts to CVS. If I want to count
> how many microarray hits have names and how many just have CG numbers, I'll
> do it in a Perl one-liner that takes 3 minutes to write and maybe 10 for
> debugging and formatting. Why bother committing that to CVS? Well, an
> experimental biologist in my group gave me that exact example, and told me
> she spent 20 minutes counting and double-checking. If she had had 1000 hits
> instead of 100, she would have needed hours to count.  More likely, she
> would have just given up.
> To put it another way, I'm aiming to make hard things possible -
> specifically things that are hard for biologists who aren't programmers.
> Bioperl, on the other hand, is focusing on things that are hard (or hard to
> do right, or at least annoying) even for programmers.
> I am making at least a couple assumptions about the niche I'm aiming for:
> people who know how to use the command line but don't know Perl.
> 1. There are many such people (or enough to care about)
> 2. They will be able to put the "atomic" scripts together to solve real
> problems (first join two files with a script, sort with another script,
> remove duplicates with a third)
> I may be wrong about either of these.  It may be that even with the
> Scriptome tools, you have to "think like a programmer" to do these sorts of
> tasks, and that many biologists' brains just don't work that way. But I
> think it's worth trying.
> -Amir
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

More information about the Bioperl-l mailing list