[Bioperl-l] A perl regex query
cjfields at uiuc.edu
Tue Sep 18 11:24:52 EDT 2007
On Sep 18, 2007, at 8:26 AM, Roy Chaudhuri wrote:
>> My actual problem is a bit more complicated.
>> It is not just one string, nut lakhs of them, they are actually
>> names of
>> chemical compounds.
>> THe problem is there are 2 different data sources, I need to match
>> compond names between them, but the problem is though the compound
>> be the same in the two, they use different naming formats for them.
> Unless you can define in simple and precise terms exactly which
> parts of
> the string you need then there is no way that you will be able to
> code a
> solution in Perl.
> Maybe you could look for a database that contains the synonyms for
> molecule? A quick Google finds ChEBI (http://www.ebi.ac.uk/chebi),
> is available to download as flat files.
> Dr. Roy Chaudhuri
> Department of Veterinary Medicine
> University of Cambridge, U.K.
D'oh! Roy beat me to it; that's what I was going to suggest. I
agree; don't trust simple word munging to always get you the correct
answer in this case, it's just too complicated to try and catch every
ChEBI is a good choice; Stefan's suggestion of OpenBabel is also a
good one. I would also try not to reinvent the wheel; there may be
some modules available via CPAN which do what you need, such as these:
More information about the Bioperl-l