[Bioperl-l] Shuffling sequences

Jason Stajich jason at cgt.duhs.duke.edu
Tue May 25 15:16:13 EDT 2004

(untested code, but should work)

#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
use Bio::PrimarySeq;

my $in = new Bio::SeqIO(-format => 'fasta', -file => 'fastafile.fa');
my $seq = $in->next_seq;
my @seq_as_array = split(//,$seq->seq);
my @randomseqs;
for ( 1..1000 ) {
  my @temp = @seq_as_array;
  push @randomseqs, join('', at temp);

my $out = new Bio::SeqIO(-format => 'fasta', -file =>">shuffled.fa");
my $i = 1;
for my $s ( @randomseqs ) {
  my $newseq = new Bio::PrimarySeq(-display_id => "rand.$i",
	                           -seq        => $s);

# randomizer (Fisher-Yates shuffle)
sub fy_shuffle {
    my $array = shift;
    my $i;
    for( $i = @$array; $i--; ) {
        my $j = int rand($i+1);
        next if $i==$j;
        @$array[$i,$j] = @$array[$j,$i];

The randomizer code is from the perl cookbook.  You could probably make it
faster by avoiding the string -> array -> string part and using substr
method to operate directly on the string.
If someone wants to do that and add it to SeqUtils or somewhere in Bioperl
would be good.

On Tue, 25 May 2004, KHOUEIRY pierre wrote:

> Hi all,
> I'm searching for the bioperl Method that shuffle/randomize  a given
> protein sequence. I need to shuffle my fasta sequence 1000 times to make
> a statistics test on.
> thanks in advance

Jason Stajich
Duke University
jason at cgt.mc.duke.edu

