[Bioperl-l] [RFC] Interolog::Walk

Giuseppe Gallone G.Gallone at sms.ed.ac.uk
Wed Aug 18 10:57:01 EDT 2010

Hello BioPerl community - I've written a new module called 
Interolog::Walk that I'm planning to put on CPAN. I would be grateful if 
you might take a look at the brief description I attached and tell me 
what you think. I'll be more than happy to post further details should 
the module be of some interest for someone.

Also, I am not totally sure about having the correct name for it. This 
is my first module and It would be great if you could advise on naming 
it appropriately. Hopefully the following description will give an idea 
on what it does.


     Interolog::Walk - Retrieve, score and visualize putative 
Protein-Protein Interactions through the orthology-walk method

     A common activity in computational biology is to mine 
protein-protein interactions from publicly available databases in order 
to build Protein-Protein Interaction (PPI) datasets.
In many instances, however, the number of experimentally obtained 
annotated PPIs is very scarce and it would be helpful to enrich the 
experimental dataset with high-quality, computationally-inferred PPIs. 
Such computationally-obtained dataset can extend, support or enrich 
experimental PPI datasets, and are of crucial importance in 
high-throughput gene prioritization studies, i.e. to drive hypotheses 
and restrict the dimensionality of many gene functional discovery problems.
This Perl Module, Interolog::Walk, is aimed at building putative PPI 
datasets on the basis of a number of comparative biology paradigms: the 
module implements a collection of computational biology algorithms based 
on the concept of "orthology projection". If interacting proteins A and 
B in organism X have orthologs A' and B' in organism Y, under certain 
conditions one can assume that the interaction will be conserved in 
organism Y, i.e. the A-B interaction can be "projected through the 
orthologies" to obtain a putative A'-B' interaction. The pair of 
interactions (A-B) and (A'-B') are named "Interologs" (see for instance 
[1] and [2]).

Interolog::Walk collects, analyses and collates gene orthology data 
provided by the Ensembl Consortium (www.ensembl.org) as well as PPI data 
provided by EBI Intact (http://www.ebi.ac.uk/intact/). It provides the 
user with the possibility of rating the quality and reliability of the 
putative interactions collected, by means of confidence scores, and 
optionally outputs network representations of the datasets, compatible 
with the biological network representation standard, Cytoscape.

In order to carry out an interolog walk we start with a set of gene 
identifiers in one organism of interest. We query those ids against a 
number of comparative biology databases to retrieve a list of 
orthologues for each gene id of interest, in one or more species.
In the following step we rely  on PPI databases to retrieve the list of 
available interactors for the protein ids obtained. The output at this 
stage consists of a list of interactors of the orthologues of the 
initial gene set, plus several fields of ancillary data.
In the last step of the process we  project the interactions - again 
using orthology data - back to the original species of interest. The 
output of the process is a list of PUTATIVE INTERACTORS of the initial 
gene set, plus several fields of ancillary data.


Given the scope and the focus of the project, I would imagine that 
viable alternatives for the namespace might be


or maybe

There are no similar projects as far as I could see so I shouldn't run 
the risk of overlapping namespaces. Still I would love to know your 
informed opinion about it.


[1] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, 
Vidal M, Gerstein M. Annotation transfer between genomes: 
protein-protein interologs and protein-DNA regulogs. Genome Research 
2004 Jun;14(6):1107-18.

[2]Wiles AM, Doderer M, Ruan J, Gu T-T, Ravi D, Blackman BA, Bishop AJR. 
"Building and Analyzing Protein Interactome Networks by Cross-species 
Comparisons." BMC Systems Biology 2010, 4:36 - PMID: 20353594

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

More information about the Bioperl-l mailing list