[Bioperl-l] string comparision mismatches and matches
Mark A. Jensen
maj at fortinbras.us
Thu Feb 11 09:22:07 EST 2010
You are right Bernd-- I have used this without pack before,
and I'm sure that's faster that a call to pack(),
but for some strange reason my perl was not behaving;
I was getting $mask == 0 for $in ^ $tgt, as if the operation
were logical and not bitwise.
...further investigation reveals that my cygwin perl (5.10) is
evidently buggy in this regard, for ActiveState perl performs
the bitwise xor directly on the strings as you describe.
(yes, I know...the solution is to get a real operating system....)
----- Original Message -----
From: "Bernd Web" <bernd.web at gmail.com>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Torsten Seemann" <torsten.seemann at infotech.monash.edu.au>;
<bioperl-l at lists.open-bio.org>; "Roopa Raghuveer" <rtbio.2009 at gmail.com>
Sent: Thursday, February 11, 2010 8:59 AM
Subject: Re: [Bioperl-l] string comparision mismatches and matches
> Hi Mark,
> Indeed nice.
> Just one question
> Why is pack used? It is faster? ^ works on strings too.
> $mask = $in ^ $tgt;
> $matches = $mask =~ tr/\x0/\x0/;
> (btw I had to remove the "" around \x01 in tr)
> On Thu, Feb 11, 2010 at 2:43 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>> Perfectly described, Torsten. Yes, I confess a certain pride in this
>> Roopa reports that it sped up her script 3X. cheers MAJ
>> ----- Original Message ----- From: "Torsten Seemann"
>> <torsten.seemann at infotech.monash.edu.au>
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Cc: "Roopa Raghuveer" <rtbio.2009 at gmail.com>; <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, February 11, 2010 6:52 AM
>> Subject: Re: [Bioperl-l] string comparision mismatches and matches
>>>> $in = 'ACCTCCTCCTCGAGTATGTG';
>>>> $tgt = 'TATCTTGCGCCGGAGATAAT';
>>>> $mask = pack("A*",$in)^pack("A*",$tgt);
>>>> $matches = $mask =~ tr/"\x0"/"\x0"/;
>>> Impressive! Not often you see pack() let alone exclusive-or with a
>>> scalar context tr// thrown in for good measure!
>>> For those who don't follow what it is doing, here is my (possibly
>>> wrong) interpretation: The pack() is converting each of the two (equal
>>> length) strings into a byte set. A bit-wise exclusive-or (XOR) is
>>> performed between these two byte sets. This will create bytes of value
>>> zero (0) where they were the same, and non-zero where they were
>>> different. The tr// then counts how many of the bytes were zero (\x0
>>> is ascii zero).
>>> I'll just assume it is more efficient than for/substr/eq :-)
>>> --Torsten Seemann
>>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash
>>> University, AUSTRALIA
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
More information about the Bioperl-l