Find the shortest substring of s1 that contains s2 as a subsequence using dynamic programming.
Problem Restatement
We are given two strings, s1 and s2.
We need to return the shortest contiguous substring of s1 such that s2 is a subsequence of that substring.
A subsequence means the characters appear in the same order, but they do not need to be adjacent.
If multiple shortest substrings exist, return the one with the left-most starting index. If no valid substring exists, return an empty string.
Input and Output
| Item | Meaning |
|---|---|
| Input | Two strings s1 and s2 |
| Output | The minimum window substring of s1 |
| Valid window | A substring where s2 appears as a subsequence |
| Tie rule | Return the left-most shortest window |
| No answer | Return "" |
Example function shape:
def minWindow(s1: str, s2: str) -> str:
...Examples
Example 1
s1 = "abcdebdde"
s2 = "bde"The substring "bcde" contains "bde" as a subsequence:
b c d e
| | |
b d eThe substring "bdde" also contains "bde" as a subsequence.
Both have length 4, but "bcde" starts earlier.
So the answer is:
"bcde"Example 2
s1 = "jmeqksfrsdcmsiwvaovztaqenprpvnbstl"
s2 = "u"The character "u" does not appear in s1.
So the answer is:
""First Thought: Brute Force
The direct solution is to try every substring of s1.
For each substring, check whether s2 is a subsequence.
If it is valid, update the best answer when the substring is shorter.
This works, but it is too slow.
There are O(n^2) substrings, and checking each substring can take O(n) time.
So the brute force cost is:
O(n^3)We need to reuse work across overlapping subsequence matches.
Key Insight
We scan s1 from left to right.
For every prefix of s2, we store where the best matching window starts.
Let:
dp[j]mean:
the start index of the shortest window found so far
that matches s2[0:j+1] as a subsequence
and ends at the current position when updatedIf s1[i] == s2[j], then this character can extend a match for:
s2[0:j]into a match for:
s2[0:j+1]So we can set:
dp[j] = dp[j - 1]For j == 0, the first character of s2 starts a new window:
dp[0] = iWe update j from right to left so the same character in s1 cannot be used more than once in the same subsequence.
Algorithm
Let n = len(s1) and m = len(s2).
Create an array:
dp = [-1] * mEach value stores a start index.
Then scan every character s1[i].
For each i, scan s2 backward from m - 1 to 0.
If s1[i] == s2[j]:
- If
j == 0, setdp[0] = i. - Otherwise, if
dp[j - 1] != -1, setdp[j] = dp[j - 1].
Whenever we update dp[m - 1], we have found a valid window ending at i.
The window is:
s1[dp[m - 1] : i + 1]Update the answer if this window is shorter than the best one.
For ties, do not update. Since we scan left to right, keeping the earlier answer preserves the left-most rule.
Correctness
The array dp records valid subsequence matches by prefix length.
For j == 0, when s1[i] == s2[0], the one-character prefix of s2 can start at index i. Therefore dp[0] = i is correct.
For j > 0, when s1[i] == s2[j], this character can serve as the last character of the prefix s2[0:j+1]. The earlier prefix s2[0:j] must already be matched before this character. If dp[j - 1] != -1, then such a prefix match exists, and its start index is dp[j - 1]. Therefore assigning dp[j] = dp[j - 1] creates a valid match for the longer prefix.
The backward update order is necessary. It ensures dp[j - 1] comes from characters before i, not from the current character. So one character in s1 cannot match two positions in s2.
Whenever dp[m - 1] != -1 is updated, all of s2 has been matched as a subsequence inside s1[dp[m - 1]:i+1]. Therefore this substring is a valid window.
The algorithm checks every position where a valid window can end. It keeps the shortest valid window found. Since it only replaces the answer when the new window is strictly shorter, equal-length ties keep the earlier window. This satisfies the left-most tie rule.
Complexity
Let n = len(s1) and m = len(s2).
| Metric | Value | Why |
|---|---|---|
| Time | O(n * m) | For each character in s1, we scan s2 backward |
| Space | O(m) | The DP array stores one value per character in s2 |
The constraints allow this approach because s2.length <= 100.
Implementation
class Solution:
def minWindow(self, s1: str, s2: str) -> str:
m = len(s2)
dp = [-1] * m
best_start = -1
best_len = float("inf")
for i, ch in enumerate(s1):
for j in range(m - 1, -1, -1):
if ch != s2[j]:
continue
if j == 0:
dp[0] = i
elif dp[j - 1] != -1:
dp[j] = dp[j - 1]
if j == m - 1 and dp[j] != -1:
start = dp[j]
length = i - start + 1
if length < best_len:
best_start = start
best_len = length
if best_start == -1:
return ""
return s1[best_start:best_start + best_len]Code Explanation
We store one DP value for each prefix of s2:
dp = [-1] * mA value of -1 means that prefix has not been matched yet.
We track the best window separately:
best_start = -1
best_len = float("inf")Then we scan s1:
for i, ch in enumerate(s1):For each character, we scan s2 backward:
for j in range(m - 1, -1, -1):This avoids reusing the same character for multiple positions in s2.
If the characters do not match, nothing changes:
if ch != s2[j]:
continueIf this is the first character of s2, it starts a new possible window:
if j == 0:
dp[0] = iOtherwise, the current character extends a previous prefix match:
elif dp[j - 1] != -1:
dp[j] = dp[j - 1]When j == m - 1, we just matched the final character of s2.
That gives us a full valid window:
start = dp[j]
length = i - start + 1We update only when the new window is shorter:
if length < best_len:
best_start = start
best_len = lengthUsing < instead of <= preserves the left-most answer when two valid windows have the same length.
Testing
def run_tests():
s = Solution()
assert s.minWindow("abcdebdde", "bde") == "bcde"
assert s.minWindow("jmeqksfrsdcmsiwvaovztaqenprpvnbstl", "u") == ""
assert s.minWindow("abc", "abc") == "abc"
assert s.minWindow("abc", "ac") == "abc"
assert s.minWindow("abcbcde", "bde") == "bcde"
assert s.minWindow("bbbbde", "bde") == "bde"
assert s.minWindow("a", "a") == "a"
assert s.minWindow("a", "b") == ""
print("all tests passed")
run_tests()| Test | Why |
|---|---|
"abcdebdde", "bde" | Standard case with left-most tie |
Long string with "u" | No valid window |
"abc", "abc" | Whole string is the window |
"abc", "ac" | Subsequence skips characters |
"abcbcde", "bde" | Later start gives shorter window |
"bbbbde", "bde" | Repeated first character |
"a", "a" | Minimum valid input |
"a", "b" | Minimum invalid input |