Amihood Amir Benny Porat Moskva River Confluence of 4 Streams Palindrome Recognition Approximate Matching Interchange Matching Online Algorithms CPM 2014 Palindrome Recognition Vozmika slovo ID: 359286
Download Presentation The PPT/PDF document "Approximate On-line Palindrome Recogniti..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Approximate On-line Palindrome Recognition, and Applications
Amihood AmirBenny PoratSlide2
Moskva RiverSlide3
Confluence of 4 Streams
Palindrome Recognition
Approximate Matching
Interchange Matching
Online Algorithms
CPM 2014Slide4
Palindrome Recognition
- Voz'mi-ka slovo
ropot
, - govoril Cincinnatu ego shurin,
ostriak, -- I prochti obratno. A? Smeshno poluchaetsia?
Vladimir Nabokov,
Invitation to a Beheading (1)
"Take the word
ropot
[murmur]," Cincinnatus' brother-in-law,
the wit, was saying to him, "and read it backwards. Eh? Comes out funny, doesn't it?" [--›
topor: the axe] A palindrome
is a string that is the same whether read from right to left or from left to right: Examples: доход
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal-Panama! Slide5
Palindrome Example
Ibn Ezra:
Medieval Jewish philosopher, poet, Biblical commentator, and mathematician.
Was asked:
"
אבי אל חי שמך למה מלך משיח לא יבא
"
[ My Father, the Living God, why does the king messiah not arrive?]
His response:
"
דעו מאביכם כי לא בוש אבוש, שוב אשוב אליכם כי בא מועד"[ Know you from your Father that I will not be delayed. I will return to you when the time will come ]Slide6
Palindromes in Computer Science
Great programming exercise in CS 101.
Example of a problem that can be solved by a RAM in
linear time
, but
not
by a 1-tape Turing machine.
(Can be done in linear time by a 2-tape TM)Slide7
Palindrome Concatenation
We may be interested
in finding out whether a string is a concatenation of palindromes of length > 1.
Example:
ABCCBABBCCBCAACB
Why would we be interested in such a funny problem?
– we’ll soon see
Exercise:
Do this in linear time…
ABCCBA
BB
CC
BCAACBSlide8
Stream 2 - Approximations
As in exact matching, there may be errors. Find the
minimum
number of errors that, if fixed, will give a string that is a concatenation of palindromes of length > 1
Example:
ABCCBCBBCCBCABCB
For Hamming distance:
A-Porat [ISAAC 13]:
Algorithm of time
O(n
2
)
ABCCBA
BB
CCBCAACBSlide9
Stream 3 - Reversals
Why is this funny problem interesting?
Sorting by reversals:
In the evolutionary process a substring may “detach” and “reconnect” in reverse:
ABCA
BCDAABC
BAD
CBAADCB
ABCA
BCDAABC
BADSlide10
Sorting by Reversals
What is the
minimum
number of reversals that, when applied to string A, result in string B?
History:
Introduced:
Bafna & Pevzner [95]
NP-hard:
Carpara [97]
Approximations:
Christie [98]
Berman, Hannenhalli, Karpinski [02]
Hartman [03]Slide11
Sorting by Reversals – Polynomial time Relaxations
Signed reversals:
Hannenhalli & Pevzner [99]
Kaplan, Shamir, Tarjan [00]
Tannier & Sagot [04]
. . .
Disjointness:
Swap Matching
Muthu [96]
Two constraints:
The
length
of the reversed substring is limited to 2
. All swaps are disjoint.Slide12
Reversal Distance (RD):The RD between s
1 and s2 is the minimum number k, such that there exist s2’ , where HAM(s1
,s2’) =k, and s
1
reversal match s
2
.
A
B
D
E
A
B
C
D
A
E
CB
A
B
A
A
D
A
S
1
:
S
2
:
RD(S
1
,S
2
) = 2
Pattern Matching with Disjoint ReversalsSlide13
Interleave Strings:
A
B
D
E
A
B
C
D
A
E
D
B
A
B
A
A
D
C
S
1
:
S
2
:
Connection between Reversal Matching and Palindrome Matching
A C D D C A B A A B E A D B B D A ESlide14
On-line Input
Suppose that we get the input a byte at a time:
For the palindrome problem:
A
C
D
A
C
A
B
B
A
A
E
B
B
A
A
A
E
A
D
D
DSlide15
On-line Input
Suppose that we get the input a byte at a time:
For the reversal problem:
AC
CA
BA
AB
EA
BD
A
A
A
AE
DD
DBSlide16
Main Idea – Palindrome Fingerprint
s0,s1
,s2,…sm-1
Φ
R
(S)=r
-1
s
0
+ r
-2
s
1
+… r
-msm-1
mod (p)
Φ(S)=r1s
0+ r2s
1
+… r
m
s
m-1
mod (p)
The Rabin Karp
Fingerprint
If r
m+1
Φ
R
(S) =
Φ
(S) => S is a palindrome.
w.h.p.
The Reversal
FingerprintSlide17
Palindrome Fingerprint
If rm+1ΦR
(S) = Φ(S) => S is a palindrome.
Example:
S =
A B C B A
r
6
Φ
R
(S)=
r
6 (1/r A + 1/r2 B + 1/r3 C + 1/r4 B + 1/r
5 A) =r5 A + r4 B + r3 C + r
2 B + r A = Φ(S)
Φ
R(S)=r
-1s0+ r
-2
s
1
+… r
-m
s
m-1
mod (p)
Φ
(S)=r
1
s
0
+ r
2
s
1
+… r
m
s
m-1
mod (p)Slide18
Simple Online Algorithm for Finding a Palindrome in a Text
t1,t2,t
3, … t
i
,t
i+1
,t
i+2
,
…
t
i+m
, ti+m+1 , …
tn
ΦR=r-1t
i+ r-2ti+1
+… r-mt
i+m mod (p)
Φ
=r
1
t
i
+ r
2
t
i+1
+… r
m
t
i+m
mod (p)
If
not
, then for the next position:
If
r
m+1
Φ
R
=
Φ
=
>
there is a palindrome starting in
the i-th position
.
Φ
=
Φ
+ r
m+1
t
i+m+1
mod (p)
Φ
R
=
Φ
R
+ r
-(m+1)
t
i+m+1
mod (p)
Note:
This algorithm finds online whether the prefix of a text is a permutation. For finding online whether the text is a concatenation of permutations, assume even-length permutations, otherwise, every text is a concatenation of length-1 permutations.Slide19
Palindrome with mismatches
Start with 1 mismatch case.Slide20
1-Mismatch
s0,s1,s2,
… sm-1
S=
Choose
l
prime numbers
q
1
,…,q
l
<
m
such that Slide21
1-Mismatch
s0,s1,s2,
… sm-1
s
0
,s
2
,s
4
…
s
1
s3,s5
…
s0
,s3,s6
…
s
1
,s
4
,s
7
…
s
2
,s
5
,s
8
…
mod 2
mod 3
S=
S
2,0
=
S
2,1
=
S
3,0
=
S
3,1
=
S
3,2
=
For each
q
i
construct
q
i
subsequences of
S
as follows: subsequence
S
q
i
,j
is all elements of S whose index is
j
mod
q
i
.
Examples:
q
1
=2, q
2
=3Slide22
Example
s0,s1,s2, s
3,s4,s
5
s
0
,s
2
,s
4
s
1
s
3
,s
5
s0
,s3
s
1
,s
4
s
2
,s
5
mod 2
mod 3
S=
S
2,0
=
S
2,1
=
S
3,0
=
S
3,1
=
S
3,2
=Slide23
1-Mismatch
We need to compare:We prove that in the partitions strings:
s
0
, s
1
, s
2
,
…
s
m-2
,s
m-1
sm-1, sm-2
, sm-3 … s
1 , s0
S
q,j
= S
R
q,(m-1-j)mod qSlide24
Example
s0,s1,s2,s
3,s4,s
5
s
0
,s
2
,s
4
s
0
,s
3
s
1
,s4
S=
s
5
,s
4
,s
3
,s
2
,s
1
,s
0
S
R
=
s
5
s
3
,s
1
s
5
,s
2
s
4
,s
1
s
0
,s
2
,s
4
s
1
s
3
,s
5
s
0
,s
3
s
1
,s
4
s
2
,s
5
S
2,0
=
S
2,1
=
S
3,0
=
S
3,1
=
S
3,2
=S2,0=SR2,1=S3,0=SR3,2=S3,1=SR3,1=Slide25
Exact Matching
Lemma: S=SR Sq,j
= SRq,(m-1-j) mod
q
for all q and all
0 ≤ j
≤
q.Slide26
1-Mismatch
Lemma: S is a palindrome with 1-mismatch for each q, there is exactly
one
j such that:
Φ
(S
q,j
) ≠ r
|Sq,j|
Φ
R(SR
q,(m-1-j)mod q)Slide27
1-Mismatch
Lemma:There is exactly one mismatchThere is exactly one subpattern in each group that does not match.
C.R.TSlide28
Chinese Remainder Theorem
Let n and m two positive integers.
In our case:
if two different indices,
i
and
j
, have an error, and only one subsequence is erroneous, since the product of all q’s > m, it means that
i=j
.Slide29
Complexity
There exists a constant c such that, for any x<m, there are at least
x/log m
prime numbers between
x
and
cx
.
Therefore, choose prime numbers between
log
m
and
c
log
m
.Slide30
Complexity
For each qi we compute 2qi different fingerprints:Overall space
:
Each character participates in
exactly two
fingerprints (the regular and the reverse).
Overall time:Slide31
Online
All fingerprint calculations can be done onlineWe know the m
at every input character, to compute the comparisons.
Conclude:
Our algorithm is online.Slide32
k-Mismatches
Use Group testing…Slide33
k-Mismatches
Group TestingGiven n items with some
positive ones, identify all positive ones by a small number of tests
.
Each test is on a
subset of items
.
Test outcome is positive iff there is a positive item in the subset. Slide34
k-Mismatch
Group: partition of the text.Test: distinguish between:
(using the 1-mismatch algorithm)match 1-mismatch
more then 1-mismatchSlide35
k-Mismatches
s0,s1,s2,
… sm-1
s
0
,s
2
,s
4
…
s
1
s3,s5
…
s0
,s3,s6
…
s
1
,s
4
,s
7
…
s
2
,s
5
,s
8
…
mod 2
mod 3
S=
S
2,0
=
S
2,1
=
S
3,0
=
S
3,1
=
S
3,2
=
Similar to the 1-mismatch algorithm just with more prime numbers…
Each S
q,j
is a
group
in our group testingSlide36
Our tests
We define The reversal pair of Sq,j to be S
Rq,(m-1-j)mod q
Each partition is “tested against” its reversal pair.Slide37
Correctness
s0,s1,s
2, … sj …. s
m-1
For any group of k character i
1
,i
2
,..i
k
There exists a partition where s
j
appears alone
i
2
i
5
i
7
i
9
i
C.R.TSlide38
Correctness
s0,s1,s
2, … sj …. s
m-1
If s
j
invokes a mismatch we will catch it.
i
2
i
5
i
7
i
9
iSlide39
Complexity
Overall space:Overall time:Slide40
Approximate Reversal Distance
Using the palindrome up to k-mismatches algorithm, can be solved in
time, and
space.Slide41
спасибо