[Bioperl-l] Generalized reciprocal blast
mmokrejs at ribosome.natur.cuni.cz
Mon Dec 7 15:33:48 EST 2009
I just stumbled across this older posting ... maybe you want to exploit
SIMAP (http://webclu.bio.wzw.tum.de/portal/web/simap/). I think it has
remote API available.
Robert Bradbury wrote:
> I would like to know whether or not anyone has attempted to create a
> "generalized" reciprocal blast component for BioPerl?
> One sees papers all the time where they discuss running reciprocal blasts to
> compare a new species to an old "standard" species or a set of species or
> running an all-to-all set of comparisons to match up all of the "known"
> proteins from species and determine which are outliers (and therefore
> "novel"). There are also accumulating merged sets in NCBI HomoloGene (which
> seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes)
> and Ensembl (which seems to be working with a much larger set of 40-50
> genomes some of which may be somewhat incomplete and are certainly poorly
> I have, I believe, seen code "fragments" from various authors, perhaps some
> on the BioPerl list, which perform some major subset of a typical
> "reciprocal blast".
> Now what I am looking for is a relatively generalizable some-to-some
> reciprocal blast utility. I want to be able to specify the genes (or gene
> family), e.g. some of the ~150 known DNA repair genes. It would be helpful
> to also specify how "tolerant" the blast "true reciprocal" criteria are.
> There are some genes where there is a very strict 1-to-1 relationship across
> many genomes. But for genes which involve relatively standard domains, e.g.
> "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for
> example its more like 5-to-5 and it would be really nice to be able to
> specify the strictness or quality level  for "matching" genes (and even
> which genes are to be excluded because they are known to be false
> Then to top this off I want to be able to combine known public e.g.
> (HomoloGene / Uniigene / Ensembl) databases with perhaps local private
> databases or database subsets (e.g. emerging or specialized genomes).
> The goal here of course to determine the precise phylogenetic relationships
> between all of the DNA repair genes and how there may be gain / loss /
> evolution of function that can be related to species characteristics (size,
> longevity, etc.).
> Is there a generalized reciprocal blast component in BioPerl? Or is it a
> "build-it-yourself" situation (that I have to believe has been built
> probably a few dozen times by various researchers / organizations /
> Robert Bradbury
> 1. This would be handled in BioPerl with a customizable user function which
> could be tailored to handle specific cases -- for example a function which
> when handed a set of 100 potential "matches" could go through those 100
> matches, identify common domains, and then "re-rate" matches based on
> considerations such as the type and number of common domains, domains being
> in the same order, etc. I.e. criteria which may be difficult to completely
> generalize across entire genomes but are fairly obvious if you are looking
> at a graphical replication of a gene set in HomoloGene.
More information about the Bioperl-l