[Bioperl-l] alignable portion of a genome

Smithies, Russell Russell.Smithies at agresearch.co.nz
Mon May 11 23:55:39 EDT 2009

Perfect matches is easy:

	$seq = "atcgacgatcgaacgatcga";

	foreach ($seq =~ /(?=(\w{5}))/g){$h++; $hash{$_}++}
	foreach (keys %hash){ $singles++ if($hash{$_} eq 1)}
	print $singles/$h;

Could probably be done with map as well.
Counting the miss-matches might take a bit more thinking....
Any ideas MAJ?


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of fadista
> Sent: Monday, 11 May 2009 9:32 p.m.
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] alignable portion of a genome
> Hi,
> I would like to know of a good and fast way that could help me calculate the
> alignable portion of a genome (not human), given a reference sequence.
> When I say alignable portion I mean that I want to know all the positions of
> the genome that can be covered uniquely by reads of 36 bp and up to 2
> mismatches.
> Some have advised me to work with Perl using the following strategy but I am
> not a Perl user so if someone has already a script for this function, it
> would be nice:
> "you could approach it by walking along the genome in a sliding window of
> 36 nt, and hash the frequency of each 36 nt sequence that you encounter.
> Then count how many of the 36 nt sequences had a frequency of exactly
> one. Divide this by the total number of 36nt windows visited. This
> should be do-able in about 20 lines of Perl."
> Best regards and thanks in advance
> --
> View this message in context: http://www.nabble.com/alignable-portion-of-a-
> genome-tp23480025p23480025.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.

More information about the Bioperl-l mailing list