KMP Algorithm for Pattern Searching

Last Updated on August 3, 2023 by Mayank Dham

The KnuthMorrisPratt, KMP algorithm is a linear time string matching technique developed by Donald Knuth, Vaughan Pratt, and James H. Morris. It deals with the problem of locating instances of a pattern ‘p’ within a larger string ‘S.’ By avoiding unnecessary comparisons with elements of ‘S’ that have already been involved in comparisons with the pattern ‘p,’ the algorithm achieves a matching time complexity of O(n). This eliminates the need for backtracking on the string ‘S’ and improves efficiency significantly.

Components of the KMP Algorithm

1. The Prefix Function (Π)
The prefix function, denoted as, contains information about how the pattern ‘p’ matches its shifted versions. This information is critical for avoiding redundant pattern shifts during the matching process, which prevents backtracking on the string ‘S.’

2. The KMP Matcher
The KMP Matcher uses ‘S,’ ‘p,’ and the prefix function ” as inputs to find occurrences of ‘p’ in ‘S’ and returns the number of shifts of ‘p’ required to find these occurrences.

The Prefix Function (Π)

The prefix function is computed using the following pseudocode:

The running time analysis reveals that the prefix function takes O(m) time to compute, where m is the length of the pattern ‘p.’

String Matching of KMP Algorithm

We will look at an example of how the KMP algorithm works in the case of string matching patterns.

Problem Statement
Write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[] given a text txt[0… N1] and a pattern pat[0… M1]. You may assume that N > M.

For example:
txt = “AABACDEFG” and pat = “AABA”

Output
Pattern found at index 0

Algorithm for KMP Algorithm

Preprocessing (Building the LPS array):
Create an array LPS (Longest Prefix Suffix) of size n (length of the pattern).
Initialize two pointers, i and j, where i starts from 0 and j starts from 1.
Building the LPS array:
Set LPS[0] to 0 since there is no proper prefix or suffix for a single character.
Repeat the following steps while j < n:
If pattern[i] == pattern[j], set LPS[j] = i + 1, increment both i and j.
If pattern[i]!= pattern[j]:
If i is not at the beginning (i > 0), set i = LPS[i1].
If i is at the beginning (i = 0), set LPS[j] = 0, increment j.
Pattern Matching:
Initialize two pointers, i (index for the text) and j (index for the pattern), both starting from 0.
Repeat the following steps while i < m (length of the text) and j < n (length of the pattern):
If text[i] == pattern[j], increment both i and j.
If j == n, a match is found at index i j in the text, and j = LPS[j1].
If text[i]!= pattern[j]:
If j is not at the beginning (j > 0), set j = LPS [j1].
If j is at the beginning (j = 0), increment i.
Repeat step 3 until i reach the end of the text or a match is found.

The KMP algorithm optimizes pattern matching by utilizing the information in the LPS array, which represents the longest proper prefix of the pattern that is also a proper suffix. This allows the algorithm to avoid unnecessary comparisons during the search, making it more efficient than naive pattern matching algorithms.

Code Implementation of KMP Algorithm

#include <iostream>
#include <vector>
using namespace std;

// Function to build the longest prefix suffix (lps) array for the pattern
void computeLPSArray(const string& pattern, vector<int>& lps) {
    int patternLen = pattern.length();
    int len = 0; // Length of the previous longest prefix suffix

    lps[0] = 0; // lps[0] is always 0

    int i = 1;
    while (i < patternLen) {
        if (pattern[i] == pattern[len]) {
            len++;
            lps[i] = len;
            i++;
        } else {
            if (len != 0) {
                len = lps[len - 1];
            } else {
                lps[i] = 0;
                i++;
            }
        }
    }
}

// Function to perform pattern searching using the KMP algorithm
void KMPSearch(const string& text, const string& pattern) {
    int textLen = text.length();
    int patternLen = pattern.length();

    // Create lps array to store the longest prefix suffix values
    vector<int> lps(patternLen, 0);

    // Preprocess the pattern to build the lps array
    computeLPSArray(pattern, lps);

    int i = 0; // Index for text[]
    int j = 0; // Index for pattern[]

    while (i < textLen) {
        if (pattern[j] == text[i]) {
            i++;
            j++;
        }

        if (j == patternLen) {
            cout << "Pattern found at index " << i - j << endl;
            j = lps[j - 1];
        } else if (i < textLen && pattern[j] != text[i]) {
            if (j != 0)
                j = lps[j - 1];
            else
                i++;
        }
    }
}

int main() {
    string text, pattern;
    cout << "Enter the text: ";
    cin >> text;

    cout << "Enter the pattern to search: ";
    cin >> pattern;

    KMPSearch(text, pattern);

    return 0;
}

Input :
Text: "ABABDABACDABABCABAB"
Pattern: "ABABCABAB"

Output

Pattern found at index 10

Conclusion
We use the KMP Algorithm to determine whether ‘P’ appears in ‘T.’ The pattern ‘P’ is discovered to occur in the string ‘T’ after 14 iterations. The total number of shifts required to find the match is 13 7 = 6 shifts. Finally, when compared to naive approaches, the KnuthMorrisPratt algorithm provides an efficient and powerful solution to the string matching problem. The KMP algorithm allows for fast and accurate string matching in a variety of applications by utilizing the prefix function and avoiding unnecessary comparisons.

Frequently Asked Questions (FAQs)

Here are some of the frequently asked questions about the KMP algorithm

Q1: What is the KMP algorithm used for?
The KMP algorithm is a string matching algorithm used to find occurrences of a pattern within a text efficiently. It is particularly useful when you want to find multiple occurrences of the same pattern in a large text without unnecessary comparisons, making it more efficient than naive pattern matching algorithms.

Q2: How does the KMP algorithm achieve better efficiency than naive pattern matching?
The KMP algorithm achieves better efficiency by utilizing the information stored in the Longest Prefix Suffix (LPS) array. This array is precomputed based on the pattern and helps the algorithm avoid redundant character comparisons by skipping over known non-matching regions, leading to improved performance.

Q3: What is the time complexity of the Knuth-Morris-Pratt algorithm?
The time complexity of the KMP algorithm is O(m + n), where ‘m’ is the length of the text and ‘n’ is the length of the pattern. It is a linear time algorithm, making it very efficient even for large texts and patterns.

Q4: When to consider using the KMP algorithm over other string matching techniques?
You should consider using the KMP algorithm when you need to search for a specific pattern multiple times within a large text. It excels in scenarios where efficiency is crucial because of its linear time complexity, making it more suitable for larger datasets compared to naive approaches.

KMP Algorithm for Pattern Searching

Components of the KMP Algorithm

The Prefix Function (Π)

String Matching of KMP Algorithm

Algorithm for KMP Algorithm

Code Implementation of KMP Algorithm

Code Implementation of KMP Algorithm

Frequently Asked Questions (FAQs)

Leave a Reply Cancel reply

Difference between Big Oh, Big Omega, and Big Theta

Basics of Analysis of Algorithms

Types of Algorithm Analysis

Why Analysis of Algorithms is important?

What is meant by Algorithm Analysis?

Design and Analysis of Algorithms

Sign in to your account

Login via OTP

Login via OTP

Register with PrepBytes

Components of the KMP Algorithm

The Prefix Function (Π)

String Matching of KMP Algorithm

Algorithm for KMP Algorithm

Code Implementation of KMP Algorithm

Code Implementation of KMP Algorithm

Frequently Asked Questions (FAQs)

Leave a Reply Cancel reply