[Bioperl-l] Distance between residues
jurgen.pletinckx at algonomics.com
Fri Apr 30 08:47:40 EDT 2004
Is that linear distance (along the sequence, and measured in
residues) or 3D distance (across a structure, in angstrom)?
In either case, do you just need the minimum distance, a list
of all the distances, or some other metric?
If all you need to know is whether the sequence matches 'X
less than n residues distant from Z', regular expressions
will be the quickest solution. If, on the other hand, you
need to know the actual distance, some more work will be
This is a working example for minimum linear distance:
my $in = Bio::SeqIO->new('-file' => "all_proteins.txt",
'-format' => 'fasta');
while (my $seq = $in->next_seq)
my $string = $seq->seq;
push @pos1, pos($string) while $string =~ /R|P|K|T/g;
push @pos2, pos($string) while $string =~ /H|C|D|E/g;
# you may want to do something specific when either
# set is completely absent...
next unless @pos1 and @pos2;
my $minimum = abs($pos1 - $pos2);
for my $p1 (@pos1)
for my $p2 (@pos2)
my $d = abs($p1-$p2);
$minimum = $d if $d < $minimum;
print $minimum, "\n";
Optimisation may be necessary - this takes 17 seconds (on my creaky
SGI machine) to process 800 sequences. Fortunately, there's an
obvious optimisation to be done: check first for the common cases
where residues from your sets occur next to each other, or with one
other residue inbetween.
More information about the Bioperl-l