A detailed explanation of matching a full string against a simplified regular expression with dot and star using dynamic programming.
Problem Restatement
We are given two strings:
s
ps is the input string.
p is the pattern.
The pattern supports two special characters:
| Pattern symbol | Meaning |
|---|---|
. | Matches any single character |
* | Matches zero or more of the preceding element |
The match must cover the entire input string, not only part of it. This is the key rule in the problem statement.
For example:
s = "aa"
p = "a"This returns false, because "a" matches only one character, not the whole string "aa".
But:
s = "aa"
p = "a*"returns true, because a* can match "aa".
Input and Output
| Item | Meaning |
|---|---|
| Input | A string s and a pattern p |
| Output | true if p matches the whole string s, otherwise false |
. | Matches any one character |
* | Matches zero or more copies of the previous pattern element |
| Match type | Full-string match |
Example function shape:
def isMatch(s: str, p: str) -> bool:
...Examples
Example 1:
s = "aa"
p = "a"Output:
falseThe pattern only matches one a.
The string has two characters.
Example 2:
s = "aa"
p = "a*"Output:
truea* means zero or more a characters.
So it can match "aa".
Example 3:
s = "ab"
p = ".*"Output:
true. can match any single character.
.* can match any sequence of characters, including "ab".
First Thought: Compare Characters One by One
If the pattern had only normal characters and ., the problem would be simple.
We could compare s[i] and p[j].
They match if:
s[i] == p[j] or p[j] == "."Then move both pointers forward.
But * changes the problem.
For example:
s = "aaa"
p = "a*"The pattern element a* can match:
""
"a"
"aa"
"aaa"So one pattern position can match many string positions.
That branching is why we need dynamic programming.
Key Insight
Think of the problem as matching suffixes.
Let:
dp(i, j)mean:
Does s[i:] match p[j:]?Then the answer is:
dp(0, 0)At each state, we inspect the current pattern character p[j].
There are two main cases.
Case 1: No Star After Current Pattern Character
Suppose the next pattern character is not *.
Then the current characters must match once.
They match if:
i < len(s) and (s[i] == p[j] or p[j] == ".")If they match, continue with:
dp(i + 1, j + 1)If they do not match, the answer is false.
Case 2: Star After Current Pattern Character
Suppose:
p[j + 1] == "*"Then p[j] * gives us two choices.
First, use it zero times.
That means we skip this pattern block:
dp(i, j + 2)Second, use it one or more times.
This is allowed only if the current string character matches p[j].
Then we consume one character from s, but keep the same pattern position:
dp(i + 1, j)We keep j unchanged because * may match more copies.
So the recurrence is:
dp(i, j) = dp(i, j + 2) or (first_match and dp(i + 1, j))This is the core of the solution.
Visual Walkthrough
Use:
s = "aa"
p = "a*"Start:
dp(0, 0)Pattern block:
a*There are two choices.
Choice 1: match zero a characters.
dp(0, 2)But pattern is exhausted while string still has "aa", so this is false.
Choice 2: match one a.
dp(1, 0)Again with a*.
Choice 1 from here: match zero more a.
dp(1, 2)Pattern exhausted while string still has "a", so false.
Choice 2: match one more a.
dp(2, 0)Now the string is exhausted.
From here, a* can match zero more a.
dp(2, 2)Both string and pattern are exhausted.
So this is true.
Therefore:
isMatch("aa", "a*") == trueAlgorithm
Use recursion with memoization.
For a state dp(i, j):
- If
j == len(p), return whetheri == len(s). - Compute whether the first current character matches.
- If
j + 1 < len(p)andp[j + 1] == "*", return:- zero occurrences:
dp(i, j + 2) - or one or more occurrences:
first_match and dp(i + 1, j)
- zero occurrences:
- Otherwise, return:
first_match and dp(i + 1, j + 1)
Memoization stores each (i, j) state so repeated branches do not recompute the same suffix match.
Correctness
Define dp(i, j) as whether s[i:] matches p[j:].
If j == len(p), no pattern remains. The match succeeds exactly when no string remains, so the base case is correct.
For a normal pattern character, the pattern must match exactly one string character. The algorithm checks that the current characters match, then recursively matches the remaining suffixes. This directly follows the meaning of normal characters and ..
For a starred pattern block p[j]*, there are only two possible categories:
- It matches zero copies of
p[j]. - It matches at least one copy of
p[j].
The first case skips the block with dp(i, j + 2).
The second case requires the current string character to match p[j], consumes one string character, and keeps the same pattern block with dp(i + 1, j).
These two cases cover every valid use of *.
Since the recurrence exactly follows the definition of the pattern language, and since the answer is dp(0, 0), the algorithm returns true exactly when the pattern matches the entire string.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(mn) | There are at most m * n suffix states |
| Space | O(mn) | Memoization stores states |
Here:
m = len(s)
n = len(p)Strictly, the number of states is (m + 1) * (n + 1).
Implementation
from functools import cache
class Solution:
def isMatch(self, s: str, p: str) -> bool:
@cache
def dp(i: int, j: int) -> bool:
if j == len(p):
return i == len(s)
first_match = (
i < len(s)
and (s[i] == p[j] or p[j] == ".")
)
if j + 1 < len(p) and p[j + 1] == "*":
return (
dp(i, j + 2)
or (first_match and dp(i + 1, j))
)
return first_match and dp(i + 1, j + 1)
return dp(0, 0)Code Explanation
The memoized function:
dp(i, j)means:
s[i:] matches p[j:]The base case:
if j == len(p):
return i == len(s)If the pattern is exhausted, the string must also be exhausted.
The current character match check:
first_match = (
i < len(s)
and (s[i] == p[j] or p[j] == ".")
)This handles both ordinary characters and ..
The star case:
if j + 1 < len(p) and p[j + 1] == "*":The zero-occurrence branch:
dp(i, j + 2)The one-or-more branch:
first_match and dp(i + 1, j)The normal case:
return first_match and dp(i + 1, j + 1)Finally:
return dp(0, 0)Testing
def run_tests():
s = Solution()
assert s.isMatch("aa", "a") is False
assert s.isMatch("aa", "a*") is True
assert s.isMatch("ab", ".*") is True
assert s.isMatch("aab", "c*a*b") is True
assert s.isMatch("mississippi", "mis*is*p*.") is False
assert s.isMatch("", "c*") is True
assert s.isMatch("", ".") is False
assert s.isMatch("ab", ".*c") is False
assert s.isMatch("aaa", "ab*a*c*a") is True
print("all tests passed")
run_tests()| Test | Why |
|---|---|
"aa", "a" | Pattern must match the whole string |
"aa", "a*" | * matches multiple characters |
"ab", ".*" | Dot with star can match any sequence |
"aab", "c*a*b" | Star can match zero or more copies |
"mississippi", "mis*is*p*." | Complex false case |
"", "c*" | Star can match empty string |
"", "." | Dot needs one character |
"ab", ".*c" | Full match prevents partial success |
"aaa", "ab*a*c*a" | Mixed zero and nonzero star use |