[Bioperl-l] Pattern recognition

Djodja j.s.soares at gmail.com
Fri Oct 19 09:10:18 EDT 2007

Hi all,

This is my first post. I am very puzzeled about a script that i wrote last
year, that was working fine, and all of sudden, with no change to the code
it just stopped doing what it was supposed to do. Basically I'm searching
for patterns on a whole genome database.

The pattern that I'm searching for at the moment is AACAAAG.

My script finds the pattern, no problem on regexps or anything. It also
searches the whole genome divided in fasta headers for each gene. The
problem is that I need to know the position of the pattern, relative to the
start of each gene. This is where the script goes weird. It doesn't reset
the counter to the start of each gene. It instead continues to count
patterns as if the genes aren't separated at all. Also I need the
information on how many repetitions of the pattern there are in each gene,
as above, it does not reset the counting of repetitions and instead gives me
the repitions in the whole genome.

here is the code:

#!/usr/lib/perl -w

use Bio::Perl;
use Bio::Seq ();
use Bio::SeqIO ();
use Bio::Tools::SeqPattern ();
use strict;
use warnings;

print 'Where is the file that you want to analise?';
chomp (my $Fasfile = <>) or die "No such file";
print "Where do you want to store the rough file?";
chomp (my $rough = <>);
print "Where do you want to store the final file?";
chomp (my $final = <>);

my ($pat, $patlen, $count_of_pat, $position) = ("AACAAAG", 7, 0, 0);
my $pattern = new Bio::Tools::SeqPattern(-SEQ => $pat, -TYPE =>'Dna');

if(my $in = Bio::SeqIO->new(-file => $Fasfile, -format => 'fasta' )) {
   	open (MOTIFCOUNT,">> $rough") or die "The gene list wasn't created";

        while (my $seq_obj = $in->next_seq) {

                #my $id1 = $seq_obj->display_id();
				my $seq_length = length($seq_obj);
				#print "$seq_length\n";
				foreach ($seq_obj->seq =~ m/$pat/g){
								my $id1 = $seq_obj->display_id();
								print $seq_obj->seq, "\n";							
								print MOTIFCOUNT "$id1\t$count_of_pat\tposition ",
								$position + $-[0] + 1, "\n";
								$position = $position + length($seq_obj) - $patlen + 1;
								#print "$id1\n";
								#print "$count_of_pat\n";
								#print "$position\n";
					#print "$seq_length\n";
					print $seq_obj->seq, "\n";
        close MOTIFCOUNT;

        open (SYM, "+< $rough") or die "The file doesn't exist";

        open (TAB, ">> $final") or die "The gene list wasn't transformed";

	print TAB "Motif: $pat\n";
        while (my $line = <SYM>){

        	foreach ($line){
                	$line =~ s/\|/\t/g;
        		print TAB "$line";
        close (SYM);

        print "The gene list was created and transformed successfully.\n\a"

Can anyone help me out? This is kind of in the way of my PhD progression.

All the best,

View this message in context: http://www.nabble.com/Pattern-recognition-tf4652923.html#a13293737
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

More information about the Bioperl-l mailing list