Problem Statement: You are given a string “s” and a pattern ‘p’. You need to find if the pattern is present in the string “s”.
Usually we can solve this by brute force approach. i.e comparing one letter after another, till we find the sub pattern or we reach end of the string.
This can be visualized as below:
The above approach is not efficient. Hence we choose an efficient way to solve this problem.
There are total of 3 pattern matching algorithms. KMP is the first algorithm in them.
In this tutorial we shall see how to solve using KMP algorithm.
KMP algorithm is bit complex/difficult to understand, when compared to next 2 algorithms. I have made sure that the explanation is simple to understand and follow.
KMP algorithm has 2 parts:
1. Partial Match table
2. String Matching
High level working of the algorithm:
By some mechanism [we shall look at it next] we create a partial match table. This table will help us to skip the number of characters when there is a mismatch. Thus eliminating, checking of all the characters one by one.
1. Creating a Partial Match Table
Partial match table is the length of characters of longest proper prefix and proper suffix.
Now what is proper prefix and proper suffix?
We shall have a look at it below:
Proper Prefix:
It is combination of all the characters except the last character. Prefix will be taken from left to right order.
For example:
For the string “ABCD”,
Proper prefix will be:
A
AB
ABC
We cannot take “D” making it “ABCD”, essentially making it as original string.
Proper Suffix:
It is the combination of all the characters except the first character. Suffix will be taken from right to left order.
For example:
For the string “ABCD”
Proper suffix will be:
D
CD
BCD
Similarly, we cannot take “A” making it “ABCD”, essentially making it as original string.
Now that we have understood what is proper prefix and proper suffix, we shall build a Partial Match Table [PMT].
PMT will be created for the “pattern” not for the “string”.
Consider the pattern “ABABAB”.
As there are 6 elements, the length of our PMT will be 6.
Now let’s fill a[0], for a[0] we concentrate on “a”.
As there is only 1 element, proper prefix = 0, proper suffix = 0. Hence a[0] = 0.
Now let’s fill a[1], for a[1] we concentrate on “ab”.
Here proper prefix = a, proper suffix = b. As there are no matching characters, a[1] = 0.
Now let’s fill a[2], for a[2] we concentrate on “aba”.
Here proper prefix = a, proper suffix = a. As there is 1 match, length is 1 characters, a[2] = 1
Now let’s fill a[3], for a[3] we concentrate on “abab”.
Here proper prefix = a, ab, aba, proper suffix = b, ab, bab. As there is 1 match, and the length is 2 characters, a[3] = 2.
Now let’s fill a[4], for a[4] we concentrate on “ababa”.
Here proper prefix = a, ab, aba, abab proper suffix = b, ba, aba, baba. As there is 1 match, and the length is 3 characters, a[4] = 3.
Now let’s fill a[5], for a[5] we concentrate on “ababab”.
Here proper prefix = a, ab, aba, abab, ababa proper suffix = b, ab, bab, abab, babab. As there is 1 match, and the length is 4 characters, a[5] = 4.
Hence our final PMT will be as below:
Let’s come to 2ndpart of the algorithm. Searching pattern in the string.
To search the string, we follow below steps:
Step 1: Take 2 variables i and j
i = string [str[0]]
j = pattern[0]
Step 2: Compare str[i] with pattern[j+1] <- important
1. If match is found, increment the index of both I and j.
2. If there is a mismatch, move j to the location as per PMT.
3. If j = 0, then increment i index.
Now we shall test the above steps with help of an example:
String = ababcacabababacad
Here if you observe, we are starting the index of string from 1 and also pattern index is 1.
We shall see step by step working of algorithm.
Initial table will be as below:
As per the algorithm, match str[i] with pattern [j+1].
i.e str[1] with pattern[0+1] as shown below:
It is a match. Hence increment “i” and “j”.
Now str[2] pattern[2] it is a match, move forward.
Now str[3] pattern[3] it is a match, move forward.
Now str[4] pattern[4] it is a match, move forward.
Now str[5] pattern[5] it is NOT a match.
Now we shall look at PMT[5], it is 4. Hence move j to 4th location again there is a mismatch.
Now again see PMT at index 4, at index 4, the value is 2. Hence move “j” to 2 index, again there is a mismatch.
Now we check the value at index 2 in PMT. It is 0.
Hence move “j” to 0 and move “i” by 1 as below:
Again str[6] matches with pattern[1]
Str[7] does not match with pattern[2]
Now check the value at index 2 of PMT, it is 0.
Here again move “j = 0” and i to next value.
Now check again:
Str[8] == pattern[1]
Str[9] == pattern[2]
Str[10] == pattern[3]
Str[11] == pattern[4]
Str[12] == pattern[5]
Str[13] == pattern[6]
Hence we got our substring.
Implementation of KMP algorithm:
#include <iostream>
#include <string>
using namespace std;
//creates partial match table base on pattern
void partialMatchTable(string p, int pi[])
{
int m = p.length();
int q;
pi[0] = -1;
for (int i = 1; i < m; ++i)
{
q = pi[i-1];
while (q >= 0)
{
if (p[q] == p[i-1])
break;
else
q = pi[q];
}
pi[i] = q + 1;
}
//print partial match table
cout << "PMT: ";
for (int i = 0; i < m; ++i)
{
cout << pi[i] << " ";
}
cout << endl;
}
//runs KMP algorithm pattern p on target t, returns a boolean if or not in target
bool KMP(string p, string t)
{
int n = t.length();
int m = p.length();
int pi[m];
partialMatchTable(p, pi);
int i = 0;
int q = 0;
while (i < n)
{
if (q == -1)
{
++i;
q = 0;
}
else if (t[i] == p[q])
{
++i;
++q;
if (q == m)
return true;
}
else
q = pi[q];
}
return false;
}
int main()
{
string target = "abac";
string pattern = "abababajdcdjabac";
if ( KMP(pattern, target))
cout << "\n Target found in pattern";
else
cout<<"\n Did not find target in pattern"
}