[Bioperl-l] seq_word and pattern counts

Torsten Seemann torsten.seemann at infotech.monash.edu.au
Tue Feb 28 16:45:16 EST 2006


Nick

> Does anyone know if Bio::Tools::SeqWords
> *count_words
> or
> count_overlap_words
> will do DNA pattern searches and honor ambiguity symbols
> like exist in some restriction enzyme pattern definitions,
> e.g. GGnnCC*

Examination of the code

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqWords.html#CODE4

suggests that all it does is count N-mers of any set of letters,
and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
the same N-mer.

So no it does not handle ambiguity symbols in any special manner.

What would you like it to do?
If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
it could be?
If it has 2 "N"s in it, does it count toward all 16 possible 
non-ambiguous N-mers?
And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


More information about the Bioperl-l mailing list