[Bioperl-l] How can I pull out all instances of a motif from a genome sequence and output them as a BED file?

John Cumbers johncumbers at gmail.com
Wed Jun 13 20:20:42 EDT 2007


I have a simple problem, I'm trying to search a genome sequence for a motif,
I then want to output a BED file to display all the locations of this motif
on the UCSC Genome Browser.  I could not find a script to do this, so I
started to write my own.   I'm new to perl and my code below was my attempt
to read the sequence string and output the index bp of the start of each
motif.  With this I could build the BED file myself, which requires start
and finish base pairs.

For the first motif I can output the start index, but when I try and read
the next one off the sequence it does not work.  Instead I just get an
output of a list of 1's.  I realise that this is more a request for some
simple perl help, but any help much appreciated.

Best wishes,

$seq_object = read_sequence("Drosophila.Chr3.test.AE014296.fasta");  #turn
my FASTA file into a seq object.
$sequence_as_a_string = $seq_object->seq();  #turn it into a string
# search $sequence_as_a_string  string for motif AAA as example
# if found, return the index that it is found at

while ($sequence_as_a_string =~ m/AAA/g) {
  print "Found '$&'.  Next attempt at character " .
pos($sequence_as_a_string)+1 . "\n";

John Cumbers,  Graduate Student
Biology and Medicine
Brown University, Box G-W
Providence, Rhode Island, 02912, USA
Tel USA: +1 401 523 8190,  Fax: +1 401 863-2166
UK to USA: 0207 617 7824

More information about the Bioperl-l mailing list