Boyer Moore algorithm

Introduction:

Boyer Moore algorithm is used for pattern searching inside a string. This is the 3rdalgorithm in this pattern search series.

This algorithm is very easy to understand. You need to remember below 3 points only.

  1. Given a string ‘s’ and pattern ‘p’ we start searching from right most element.
  2. When there is a mismatch occurred, one of below 2 rules to be followed:
  3. If the mismatch character is not in the pattern p, then we skip the whole word in the string s.
  4. If the mismatched character is present in pattern ‘p’, then we skip till the character in pattern ‘p’ matches a character in string ‘s’.

Don’t worry if you are not able to relate or visualize. We shall understand with help of an example.

Consider the string S = “prodeveloper”
 
And pattern P = lope.

We start searching from right most character as highlighted in the image below

Boyer Moore algorithm

If there is mismatch and the character ‘d; from the string ‘s’ is not present in the pattern ‘p’, we shall skip till ‘d’. Hence in the next pass will look as below

Boyer Moore algorithm

Again there is a mismatch. But the letter ‘l’ from the string is present in the pattern ‘p’. Hence we skip till the letter ‘l’ from the string. Hence in the next pass will look like below:

Boyer Moore algorithm

Now again from right most, check ‘e’ from the pattern ‘p’ and string ‘s’. It is a match. Similarly we do it for ‘p’ ‘o’ ‘l’ all are match, Hence we got our result, and the substring is present in string s.

Implementation of Boyer Moore Algorithm

#include <bits/stdc++.h> 
using namespace std; 
# define NO_OF_CHARS 256  

void badCharHeuristic( string str, int size,  
                        int badchar[NO_OF_CHARS])  
{  
    int i;  
  
    for (i = 0; i < NO_OF_CHARS; i++)  
        badchar[i] = -1;  
  
    for (i = 0; i < size; i++)  
        badchar[(int) str[i]] = i;  
}  
  
void search( string txt, string pat)  
{  
    int m = pat.size();  
    int n = txt.size();  
  
    int badchar[NO_OF_CHARS];  
  
    badCharHeuristic(pat, m, badchar);  
  
    int s = 0; 
    while(s <= (n - m))  
    {  
        int j = m - 1;  
  
        while(j >= 0 && pat[j] == txt[s + j])  
            j--;  
  
        if (j < 0)  
        {  
            cout << "pattern occurs at shift = " <<  s << endl;  
  
            s += (s + m < n)? m-badchar[txt[s + m]] : 1;  
  
        }
        else
            s += max(1, j - badchar[txt[s + j]]);  
    }  
}  
  
int main()  
{  
    string txt= "ABAAABCD";  
    string pat = "ABC";  
    search(txt, pat);  
    return 0;  
}
Write a Comment

Leave a Comment

Your email address will not be published. Required fields are marked *