[Bioperl-l] Distance between residues

Jurgen Pletinckx jurgen.pletinckx at algonomics.com
Fri Apr 30 08:47:40 EDT 2004

Is that linear distance (along the sequence, and measured in
residues) or 3D distance (across a structure, in angstrom)?

In either case, do you just need the minimum distance, a list 
of all the distances, or some other metric?

If all you need to know is whether the sequence matches 'X
less than n residues distant from Z', regular expressions
will be the quickest solution. If, on the other hand, you 
need to know the actual distance, some more work will be 

This is a working example for minimum linear distance:

#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;

my $in = Bio::SeqIO->new('-file' => "all_proteins.txt",
                         '-format' => 'fasta');

while (my $seq = $in->next_seq)
        my $string = $seq->seq;

        my @pos1;
        push @pos1, pos($string) while $string =~ /R|P|K|T/g;

        my @pos2;
        push @pos2, pos($string) while $string =~ /H|C|D|E/g;

        # you may want to do something specific when either
        # set is completely absent...
        next unless @pos1 and @pos2;

        my $minimum = abs($pos1[0] - $pos2[0]);

        for my $p1 (@pos1)
                for my $p2 (@pos2)
                        my $d = abs($p1-$p2);
                        $minimum = $d if $d < $minimum;

        print $string,"\n";
        print $minimum, "\n";

Optimisation may be necessary - this takes 17 seconds (on my creaky 
SGI machine) to process 800 sequences. Fortunately, there's an 
obvious optimisation to be done: check first for the common cases 
where residues from your sets occur next to each other, or with one
other residue inbetween. 

Jurgen Pletinckx
AlgoNomics NV 

