Minimum Window Substring

Strings, Hashing

Hard

Problem Statement (Simplified):

Find the minimum size of the substring of string S which contains all the character from a given string T. Print the substring with minimum length.

See original problem statement here

Test Case

Input:
1
prepbytes
pyte
ccbabcab
bbab

Output:
pbyte
babcab

Explanation:
Case 1: All characters in string 'pyte' are present in 'pbyte' and this is the smallest substring.
Case 2: All characters in string 'bbab' are present in strings 'babcab' and this is the smallest substring containing all characters.

Solving Approach :

Bruteforce Approach :

1) We can generate all substrings of string S.
2) For each substring we check if it contains all character from the string T.
3) We print the string with minimum length.
4) You can see this approach is not efficient and takes O(T\times{N^{2}}) time as it takes O(N^{2}) time for calculating all substrings and takes O(T) time for checking if all characters from T exists or not in the substring.
5) Do you have any other approach which is efficient than this approach? Try that out, if that one goes beyond the time limit, proceed to the following efficient approach.

In the above approach, we found all substring containing all characters from our pattern string, we can also observe that all discarded substring contain or minimum window substring, and the largest such substring starts from 0. Hence we can find a substring from the start of the string containing all characters from the pattern, and then discard characters from beginning one by one, until we reach a condition that violates our requirements. We discuss this approach our efficient approach

Efficient Approach:

1) First refer some websites to learn coding and check if the length of string S is less than the length of the given string T, if yes print Not Found.
2) If no, then store the occurrences of characters from string T in a hash table.
3) Start matching the characters of string T with the characters of string i.e. increment count if a character matches.
4) Check if the count is equal to the length of string T, if yes we have found a window containing all characters from the string T.
5) If such a window is found, try to minimize it by removing extra characters from the beginning of the current window.
6) Update the minimum length on each character removal.
7) Print the window with minimum length when no character can be removed.

Example

• Let us take an example and understand above approach, let string S be 'qabaaab' and pattern P be 'baaab'.
• We maintain a hash table hash_str to store characters from string. We also maintain a hash table hash_pat to store characters from pattern string.
• We find all appearances of characters in hash_pat, so final hash_pat for pattern is [3,2] where index 0 corresponds for character a and index 1 corresponds for character b.
• We now find the largest window containing all character from pattern string, this can be found by starting window from 0 and increasing end of window one by one until whole window contains all chracters from pattern string.
Hash table for string is initialised with 0, so default hash table is [0,0,0] where index 0 corresponds to 'a', index 1 corresponds to 'b' and index 2 corresponds to 'q'.

Finding Window :

We also maintain a counter count, which when equal to length of pattern, we exit because that is last point of window.

For end = 0:
> * Current character is 'q', hence hash_str becomes [0,0,1], 'q' does not appear in pattern, so we don't increase count.

For end = 1:
> * Current character is 'a', hence hash_str becomes [1,0,1], hash_str value is less than hash_pat value of 'a', so we increase counter by 1, so count=1.

For end = 2:
> * Current character is 'b', hence hash_str becomes [1,1,1], hash_str value is less than hash_pat value of 'a', so we increase counter by 1, so count=2.

For end = 3:
> * Current character is 'a', hence hash_str becomes [2,1,1], hash_str value is less than hash_pat value of 'a', so we increase counter by 1, so count=3.

For end = 4:
> * Current character is 'a', hence hash_str becomes [3,1,1], hash_str value is equal to hash_pat value of 'a', so we increase counter by 1, so count=4.

For end = 5:
> * Current character is 'a', hence hash_str becomes [4,1,1], hash_str value is greater than hash_pat value of 'a', so we dont increase counter, so count=4.

For end = 6:
> * Current character is 'b', hence hash_str becomes [4,2,1], hash_str value is less than hash_pat value of 'a', so we increase counter by 1, so count=5.

• Hence count is now equal to length of pattern, so we have found the window of maximum length, now we need to short this window for minimum size of window.

Minimizing The Window :

Here, we move start pointer of window to minimize the size, we decrease the count of hash_str value of current character if hash_str value is greater than hash_pat value of current character. If hash_str value becomes becomes less to hash_pat value of current character, we stop iterating, current window is the minimum window.

For start = 0:
> * Current character is 'q', hence hash_str becomes [4,2,0], hash_str value is not in pattern so we move further.

For start = 1:
> * Current character is 'a', hence hash_str becomes [3,2,0], hash_str value of 'a' is greater than hash_pat value, so we move window further.

For start = 2:
> * Current character is 'b', hence hash_str becomes [2,2,0], hash_str value of 'a' is equal to hash_pat value, so we have minimized the window.

• We print the final window starting from start to end pointer. Hence we print 'baaab'.

Solutions

#include<stdio.h>
#include<string.h>
int main()
{
int test;
scanf("%d",&test);
while(test--){
char str[1000000], pat[1000000];
scanf("%s%s",str,pat);

int len1 = strlen(str);
int len2 = strlen(pat);

if (len1 < len2)
{
}else{
int hash_pat[256] = {0};
int hash_str[256] = {0};

for (int i = 0; i < len2; i++)
hash_pat[pat[i]]++;

int start = 0, start_index = -1, min_len = 9999999;

int count = 0;
for (int j = 0; j < len1 ; j++)
{
hash_str[str[j]]++;

if (hash_pat[str[j]] != 0 &&
hash_str[str[j]] <= hash_pat[str[j]] )
count++;

if (count == len2)
{
while ( hash_str[str[start]] > hash_pat[str[start]]
|| hash_pat[str[start]] == 0)
{

if (hash_str[str[start]] > hash_pat[str[start]])
hash_str[str[start]]--;
start++;
}

int len_window = j - start + 1;
if (min_len > len_window)
{
min_len = len_window;
start_index = start;
}
}
}

if (start_index == -1)
else{
for(int i=start_index; i<min_len+start_index; i++)
printf("%c",str[i]);
printf("\n");
}
}
}
}

#include<bits/stdc++.h>
using namespace std;

const int no_of_chars = 256;

int main()
{
int test;
cin>>test;
while(test--){
string str, pat;
cin>>str>>pat;

int len1 = str.length();
int len2 = pat.length();

if (len1 < len2)
{
}else{

int hash_pat[no_of_chars] = {0};
int hash_str[no_of_chars] = {0};

for (int i = 0; i < len2; i++)
hash_pat[pat[i]]++;

int start = 0, start_index = -1, min_len = INT_MAX;

int count = 0;
for (int j = 0; j < len1 ; j++)
{
hash_str[str[j]]++;

if (hash_pat[str[j]] != 0 &&
hash_str[str[j]] <= hash_pat[str[j]] )
count++;

if (count == len2)
{
while ( hash_str[str[start]] > hash_pat[str[start]]
|| hash_pat[str[start]] == 0)
{

if (hash_str[str[start]] > hash_pat[str[start]])
hash_str[str[start]]--;
start++;
}

int len_window = j - start + 1;
if (min_len > len_window)
{
min_len = len_window;
start_index = start;
}
}
}

if (start_index == -1)
else{
for(int i=start_index; i<min_len+start_index; i++)
cout<<str[i];
cout<<endl;
}
}
}
}

import java.util.*;
import java.io.*;
import java.lang.Math;

public class Main {
public static void main(String args[]) throws IOException {

Scanner sc= new Scanner(System.in);
int test = sc.nextInt();
while(test != 0){

String str = sc.next();
String pat= sc.next();

int len1 = str.length();
int len2 = pat.length();

if (len1 < len2)
{
}else{

int hash_pat[] = new int[256];
int hash_str[] = new int[256];

for (int i = 0; i < len2; i++)
hash_pat[pat.charAt(i) - 'a']++;

int start = 0, start_index = -1, min_len = 9999999;

int count = 0;
for (int j = 0; j < len1 ; j++)
{
hash_str[str.charAt(j) - 'a']++;

if (hash_pat[str.charAt(j) - 'a'] != 0 &&
hash_str[str.charAt(j) - 'a'] <= hash_pat[str.charAt(j) - 'a'] )
count++;

if (count == len2)
{
while ( hash_str[str.charAt(start) - 'a'] > hash_pat[str.charAt(start) - 'a']
|| hash_pat[str.charAt(start) - 'a'] == 0)
{

if (hash_str[str.charAt(start) - 'a'] > hash_pat[str.charAt(start) - 'a'])
hash_str[str.charAt(start) - 'a']--;
start++;
}

int len_window = j - start + 1;
if (min_len > len_window)
{
min_len = len_window;
start_index = start;
}
}
}

if (start_index == -1)
else{
for(int i=start_index; i<min_len+start_index; i++)
System.out.print(str.charAt(i));
System.out.println();
}
}

test--;
}
}
}


Space Complexity of this approach would be O(1).