[Bioperl-l] clustering algorithms in BioPerl

Frank Gibbons francis_gibbons@hms.harvard.edu
Fri, 23 Feb 2001 11:42:56 -0500


I've been lurking for about a month. I've checked out the BioPerl homepage, 
including the list of projects. I notice that the bias is heavily towards 
sequence analysis (naturally).

Right now I'm working on implementing a few clustering algorithms (priority #3 
on the list of projects) in Perl, for use with DNA microarray data (priority 
#4 on the list!). The algorithms themselves are quite general, and have been 
around for a while, but I can find few references to implementations of them 
in Perl. (I have seen mention of Jong Park's Geanfammer package, as a possible 
source for clustering, but as far as I can see he implements only 
single-linkage clustering there.) I think they would be quite useful to the 
Perl community as a whole, and I would like to write them in as generic a way 
as possible, which is why I'm writing to the list now, having implemented only 
one particular algorithm, before I write any more!

So, my questions are:

* Is this appropriate for BioPerl in the first place? Would it be more 
suitable for CPAN? The algorithms are general, but my focus is on 

* If so, does any one know of other work that may have been done in this area, 
on which I could build/integrate with?

* Do you have any suggestions? I'm thinking in terms of
	- Naming schemes
	- Particular algorithms which should be implemented as a priority
	- Other possible applications, which I should keep in mind
	- Pitfalls I should look out for

Thanks for your input,

Frank Gibbons

PhD, Computational Biologist, Harvard Medical School
Dept of Biological Chemistry and Molecular Pharmacology
240 Longwood Avenue, C-125, Boston, MA 02115