[Bioperl-l] Pattern recognition

Djodja j.s.soares at gmail.com
Sat Oct 20 03:11:17 EDT 2007


I will have to give you the credit here as well!

THank you so much Chris.

I have declared both $position and $count_of_pat inside the while loop.

Will rearrange the script and I'm going to make it a subroutine as I have
many other patterns to search for.

Thank you thank you thankyou.

Also, I am in England in Exeter. If you want I could look on ways to get you
here for a nice talk for you to expose your work.

Let me know.

Djodja

Chris Hemmerich wrote:
> 
> 
> 
> Djoja,
> 
> Perhaps this is a problem with scope. Since you declare $position outside 
> of
> 
>          while (my $seq_obj = $in->next_seq) {
> 
> it will not be reset for each sequence.  The line
> 
>  		$position = $position + length($seq_obj) - $patlen + 1;
> 
> will continue to increase $position for each sequence processed. From a 
> quick look, I don't think you need $position at all in this script and can 
> just print $-[0]+1.
> 
> Hope this helps,
> 
>   Chris
> 
> 
> On Fri, 19 Oct 2007, Djodja wrote:
> 
>>
>> Hi all,
>>
>> This is my first post. I am very puzzeled about a script that i wrote
>> last
>> year, that was working fine, and all of sudden, with no change to the
>> code
>> it just stopped doing what it was supposed to do. Basically I'm searching
>> for patterns on a whole genome database.
>>
>> The pattern that I'm searching for at the moment is AACAAAG.
>>
>> My script finds the pattern, no problem on regexps or anything. It also
>> searches the whole genome divided in fasta headers for each gene. The
>> problem is that I need to know the position of the pattern, relative to
>> the
>> start of each gene. This is where the script goes weird. It doesn't reset
>> the counter to the start of each gene. It instead continues to count
>> patterns as if the genes aren't separated at all. Also I need the
>> information on how many repetitions of the pattern there are in each
>> gene,
>> as above, it does not reset the counting of repetitions and instead gives
>> me
>> the repitions in the whole genome.
>>
>> here is the code:
>>
>> #!/usr/lib/perl -w
>>
>> use Bio::Perl;
>> use Bio::Seq ();
>> use Bio::SeqIO ();
>> use Bio::Tools::SeqPattern ();
>> use strict;
>> use warnings;
>>
>>
>>
>>
>> print 'Where is the file that you want to analise?';
>> chomp (my $Fasfile = <>) or die "No such file";
>> print "Where do you want to store the rough file?";
>> chomp (my $rough = <>);
>> print "Where do you want to store the final file?";
>> chomp (my $final = <>);
>>
>> my ($pat, $patlen, $count_of_pat, $position) = ("AACAAAG", 7, 0, 0);
>> my $pattern = new Bio::Tools::SeqPattern(-SEQ => $pat, -TYPE =>'Dna');
>>
>> if(my $in = Bio::SeqIO->new(-file => $Fasfile, -format => 'fasta' )) {
>>   	open (MOTIFCOUNT,">> $rough") or die "The gene list wasn't created";
>>
>>
>>        while (my $seq_obj = $in->next_seq) {
>>
>>                #my $id1 = $seq_obj->display_id();
>> 				my $seq_length = length($seq_obj);
>> 				#print "$seq_length\n";
>> 				foreach ($seq_obj->seq =~ m/$pat/g){
>> 								++$count_of_pat;
>>
>> 								my $id1 = $seq_obj->display_id();
>> 								print $seq_obj->seq, "\n";
>> 								print MOTIFCOUNT "$id1\t$count_of_pat\tposition ",
>> 								$position + $-[0] + 1, "\n";
>> 								$position = $position + length($seq_obj) - $patlen + 1;
>> 								#print "$id1\n";
>> 								#print "$count_of_pat\n";
>> 								#print "$position\n";
>>
>> 								}
>>
>> 					#print "$seq_length\n";
>> 					print $seq_obj->seq, "\n";
>> 					}
>>
>>
>>        close MOTIFCOUNT;
>>
>>        open (SYM, "+< $rough") or die "The file doesn't exist";
>>
>>        open (TAB, ">> $final") or die "The gene list wasn't transformed";
>>
>> 	print TAB "Motif: $pat\n";
>>        while (my $line = <SYM>){
>>
>>        	foreach ($line){
>>                	$line =~ s/\|/\t/g;
>>        		print TAB "$line";
>>        	}
>>        }
>>        close (SYM);
>>        close(TAB);
>>
>>        print "The gene list was created and transformed
>> successfully.\n\a"
>> }
>> exit;
>>
>>
>>
>> Can anyone help me out? This is kind of in the way of my PhD progression.
>>
>> All the best,
>>
>> Djodja
>> -- 
>> View this message in context:
>> http://www.nabble.com/Pattern-recognition-tf4652923.html#a13293737
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/Pattern-recognition-tf4652923.html#a13307407
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.



More information about the Bioperl-l mailing list