/
KMP algorithm KMP algorithm KMP algorithm KMP algorithm

KMP algorithm KMP algorithm - PowerPoint Presentation

gabriella
gabriella . @gabriella
Follow
67 views
Uploaded On 2023-07-23

KMP algorithm KMP algorithm - PPT Presentation

public KMPString pat thisR 256 thispat pat build DFA from pattern int m patlength dfa new int Rm ID: 1010858

pat dfa state int dfa pat int state charat restart length txt build string current return amp pattern match

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "KMP algorithm KMP algorithm" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. KMP algorithm

2. KMP algorithm public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

3. General ideaAvoid backing up in the text string on a mismatchFor exampleText: 00000000000000000000000000000000000001Pattern: 000000001When we find a mismatch, how could we move forward in the text?Cleverer way than Brute force ?How to analyze the pattern?

4. How ? Build a DFA DFA – Deterministic finite-state automataDFA = States + TransitionsStatesFor a pattern with m characters, there are (m + 1) states in the DFA`At state j` means the first (j – 1) characters in the pattern are matched The last state indicates ACCEPT (AC), i.e all characters in the pattern are matchedBut we do not allocate entry for this state

5. How ? Build a DFA DFA – Deterministic finite-state automataDFA = States + TransitionsTransitionsAt each state, there are R possible transitions, in which R is the number of all possible charactersFormalize transitions as dfa[next_char][current_state] = next_state

6. How ? Build a DFA Explanation: dfa[next_char][current_state] = next_stateSuppose we are now at current_stateIf we see that the next character is next_char, then we should transit to next_stateTherefore, dfa[R][m] is a 2-dimensional table exhaustively enumerates all possible casesm – we do not allocate entry for the accept state

7. How ? Build a DFA Explanation: dfa[next_char][current_state] = next_statePattern: ABABAC (assume R=3 and the only characters are A,B,C)2D array representationDirected graph representation01(A)2(B)3(A)4(B)5(A)6(C)A113151B020404C000006

8. How to use DFA ?ExampleText: ABCABABABACAPattern: ABABAC

9. How to use DFA ?State with state 0ABCABABABACAGoto state 1public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

10. How to use DFA ?Current state 1ABCABABABACAGoto state 2public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

11. How to use DFA ?Current state 2ABCABABABACAGoto state 0public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

12. How to use DFA ?Current state 0ABCABABABACAGoto state 1public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

13. How to use DFA ?Current state 1ABCABABABACAGoto state 2public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

14. How to use DFA ?Current state 2ABCABABABACAGoto state 3public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

15. How to use DFA ?Current state 3ABCABABABACAGoto state 4public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

16. How to use DFA ?Current state 4ABCABABABACAGoto state 5public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

17. How to use DFA ?Current state 5ABCABABABACAGoto state 4public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

18. How to use DFA ?Current state 4ABCABABABACAGoto state 5public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

19. How to use DFA ?Current state 5ABCABABABACAGoto state 6public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

20. How to use DFA ?Current state 6 (ACCEPT)ABCABABABACAj == m, we are now at the (6+1)th statepublic int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }

21. How to build DFA ?If we could match the next character, If we see expected character, go to the next statePattern: ABABAC (assume R=3 and the only characters are A,B,C)01(A)2(B)3(A)4(B)5(A)6(C)A1-3-5-B-2-4--C-----60123456ABABACWe only need dfa[R][m] since there is no transition information for the last state

22. How to build DFA ?If we could match the next character0123456ABABAC public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }

23. How to build DFA ?If we failed to match the next characterCopy data from column xMimic the transitions of state xSimilar to `I am now in state x` or `restart from state x`x is a restart stateUpdate restart state xx stateRestart state, if we failed matching the j-th character, we restart from state xHow to restart? Since we copied the entries from x for failed cases, it is equivalent to restart from x.The x state is one state behind our DFA building process at the very beginning.The x state is updated based on the partially built DFA! It tries to find information in the pattern. public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }

24. How to build DFA ? j=0An example (ABABAC)j (current state): 0 public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }01(A)2(B)3(A)4(B)5(A)6(C)A1B0C0jc

25. How to build DFA ? j=1An example (ABABAC)j (current state): 1x (restart state): 0ProcessCopy dfa[][0] to dfa[][1]dfa[`B`][1] 2x dfa[`B`][0] = 0  public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }0 [x]1(A)2(B)3(A)4(B)5(A)6(C)A11B02C00jc

26. How to build DFA ? j=1An example (ABABAC)Understand restart state xYou are actually at state 1, but if you see next character is A or C, just suppose you are currently at state 0. Recall the meaning of states in DFA, state 0 means you have matched nothing. public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }0 [x]1(A)2(B)3(A)4(B)5(A)6(C)A11B02C00jc

27. How to build DFA ? j=2An example (ABABAC)j (current state): 2x (restart state): 0ProcessCopy dfa[][0] to dfa[][2]dfa[`A`][2] 3x dfa[`A`][0] = 1  public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }0 [x]1(A)2(B)3(A)4(B)5(A)6(C)A113B020C000jc

28. How to build DFA ? j=2An example (ABABAC)j (current state): 2x (restart state): 0Understand restart state xAt state 2, you have matched `AB`, but if you see next character is `B` or `C`, you have to start from very beginning (state 0). public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }0 [x]1(A)2(B)3(A)4(B)5(A)6(C)A113B020C000jc

29. How to build DFA ? j=2An example (ABABAC)j (current state): 2x (restart state): 0Understand restart state xx dfa[`A`][0] = 1, why ? At current state 2, the expect char is `A`, which means if we failed to match at next state 3, we do not need start from the very beginning, since at least we have `A` matched (x=1).  public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }0 [x]1(A)2(B)3(A)4(B)5(A)6(C)A113B020C000jc

30. How to build DFA ? j=3An example (ABABAC)j (current state): 3x (restart state): 1ProcessCopy dfa[][1] to dfa[][3]dfa[`B`][3] 4x dfa[`B`][1] = 2  public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }01(A)[x]2(B)3(A)4(B)5(A)6(C)A1131B0204C0000jc

31. How to build DFA ? j=3An example (ABABAC)j (current state): 3x (restart state): 1Understand restart state xWe restart from state 1 if we failed to match the expected `B`. The reason is that we know we have at least a `A` already matched (x=1). Restart state x was set in the previous step. public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }01(A)[x]2(B)3(A)4(B)5(A)6(C)A1131B0204C0000jc

32. How to build DFA ? j=3An example (ABABAC)j (current state): 3x (restart state): 1Understand restart state xx dfa[`B`][1] = 2, why ? At current state 3, the expect char is `B` and restart state 1 tells us `A` is already matched in the pattern. Thus if we failed at next state 4, `AB` are already matched, i.e. we could update restart state x to 2.  public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }01(A)[x]2(B)3(A)4(B)5(A)6(C)A1131B0204C0000jc

33. How to build DFA ? j=4An example (ABABAC)j (current state): 4x (restart state): 2ProcessCopy dfa[][2] to dfa[][4]dfa[`A`][4] x dfa[`A`][2] = 3  public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }01(A)2(B)[x]3(A)4(B)5(A)6(C)A11315B02040C00000jc

34. How to build DFA ? j=4An example (ABABAC)j (current state): 4x (restart state): 2Understand restart state xExplanation: at state 4, you already matched `ABAB`, if you failed to match next `A`, you assume you still matched `AB` since restart state is 2. This assumption is achieved by copying the column of 2 for failed cases (`B` and `C`). public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }01(A)2(B)[x]3(A)4(B)5(A)6(C)A11315B02040C00000jc

35. How to build DFA ? j=5An example (ABABAC)j (current state): 5x (restart state): 3ProcessCopy dfa[][3] to dfa[][5]dfa[`C`][5] 6x dfa[`C`][3] = 0  public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }01(A)2(B)3(A)[x]4(B)5(A)6(C)A113151B020404C000006jc

36. How to build DFA ? j=5An example (ABABAC)j (current state): 5x (restart state): 3Understand restart state xExplanation: we have already matched `ABABA`, if we failed to match the expected `C`, we assume we have matched `ABA` since restart state is 3 public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }01(A)2(B)3(A)[x]4(B)5(A)6(C)A113151B020404C000006jc

37. Understand state x public KMP(String pat) { this.R = 256; this.pat = pat; // build DFA from pattern int m = pat.length(); dfa = new int[R][m]; dfa[pat.charAt(0)][0] = 1; for (int x = 0, j = 1; j < m; j++) { for (int c = 0; c < R; c++) dfa[c][j] = dfa[c][x]; // Copy mismatch cases. dfa[pat.charAt(j)][j] = j+1; // Set match case. x = dfa[pat.charAt(j)][x]; // Update restart state. } }public int search(String txt) { // simulate operation of DFA on text int m = pat.length(); int n = txt.length(); int i, j; for (i = 0, j = 0; i < n && j < m; i++) { j = dfa[txt.charAt(i)][j]; } if (j == m) return i - m; // found return n; // not found }Update x when we build the DFA is similar to the state transition when we match pattern in the text

38. Understand state xThe transition of state x: match the pattern itself using partially constructed DFA tableBuild the next state of the DFA: we need to know the info of restart state xAn example (ABABA)x 0x dfa[`B`][0] = 0x dfa[`A`][0] = 1x dfa[`B`][1] = 2x dfa[`A`][2] = 3x dfa[`C`][3] = 0 

39. ConclusionUpdate x when we build the DFA is similar to the state transition when we match pattern in the textUnderstand that the process of building DFA is the same as matching the pattern to itself.By analyzing the pattern, we know how to move forward when we see failed matching characters.