LeetCode 10: Regular Expression Matching

Problem Restatement

We are given two strings:

s
p

s is the input string.

p is the pattern.

The pattern supports two special characters:

Pattern symbol	Meaning
`.`	Matches any single character
`*`	Matches zero or more of the preceding element

The match must cover the entire input string, not only part of it. This is the key rule in the problem statement.

For example:

s = "aa"
p = "a"

This returns false, because "a" matches only one character, not the whole string "aa".

But:

s = "aa"
p = "a*"

returns true, because a* can match "aa".

Input and Output

Item	Meaning
Input	A string `s` and a pattern `p`
Output	`true` if `p` matches the whole string `s`, otherwise `false`
`.`	Matches any one character
`*`	Matches zero or more copies of the previous pattern element
Match type	Full-string match

Example function shape:

def isMatch(s: str, p: str) -> bool:
    ...

Examples

Example 1:

s = "aa"
p = "a"

Output:

false

The pattern only matches one a.

The string has two characters.

Example 2:

s = "aa"
p = "a*"

Output:

true

a* means zero or more a characters.

So it can match "aa".

Example 3:

s = "ab"
p = ".*"

Output:

true

. can match any single character.

.* can match any sequence of characters, including "ab".

First Thought: Compare Characters One by One

If the pattern had only normal characters and ., the problem would be simple.

We could compare s[i] and p[j].

They match if:

s[i] == p[j] or p[j] == "."

Then move both pointers forward.

But * changes the problem.

For example:

s = "aaa"
p = "a*"

The pattern element a* can match:

""
"a"
"aa"
"aaa"

So one pattern position can match many string positions.

That branching is why we need dynamic programming.

Key Insight

Think of the problem as matching suffixes.

Let:

dp(i, j)

mean:

Does s[i:] match p[j:]?

Then the answer is:

dp(0, 0)

At each state, we inspect the current pattern character p[j].

There are two main cases.

Case 1: No Star After Current Pattern Character

Suppose the next pattern character is not *.

Then the current characters must match once.

They match if:

i < len(s) and (s[i] == p[j] or p[j] == ".")

If they match, continue with:

dp(i + 1, j + 1)

If they do not match, the answer is false.

Case 2: Star After Current Pattern Character

Suppose:

p[j + 1] == "*"

Then p[j] * gives us two choices.

First, use it zero times.

That means we skip this pattern block:

dp(i, j + 2)

Second, use it one or more times.

This is allowed only if the current string character matches p[j].

Then we consume one character from s, but keep the same pattern position:

dp(i + 1, j)

We keep j unchanged because * may match more copies.

So the recurrence is:

dp(i, j) = dp(i, j + 2) or (first_match and dp(i + 1, j))

This is the core of the solution.

Visual Walkthrough

Use:

s = "aa"
p = "a*"

Start:

dp(0, 0)

Pattern block:

a*

There are two choices.

Choice 1: match zero a characters.

dp(0, 2)

But pattern is exhausted while string still has "aa", so this is false.

Choice 2: match one a.

dp(1, 0)

Again with a*.

Choice 1 from here: match zero more a.

dp(1, 2)

Pattern exhausted while string still has "a", so false.

Choice 2: match one more a.

dp(2, 0)

Now the string is exhausted.

From here, a* can match zero more a.

dp(2, 2)

Both string and pattern are exhausted.

So this is true.

Therefore:

isMatch("aa", "a*") == true

Algorithm

Use recursion with memoization.

For a state dp(i, j):

If j == len(p), return whether i == len(s).
Compute whether the first current character matches.
If j + 1 < len(p) and p[j + 1] == "*", return:
- zero occurrences: dp(i, j + 2)
- or one or more occurrences: first_match and dp(i + 1, j)
Otherwise, return:
- first_match and dp(i + 1, j + 1)

Memoization stores each (i, j) state so repeated branches do not recompute the same suffix match.

Correctness

Define dp(i, j) as whether s[i:] matches p[j:].

If j == len(p), no pattern remains. The match succeeds exactly when no string remains, so the base case is correct.

For a normal pattern character, the pattern must match exactly one string character. The algorithm checks that the current characters match, then recursively matches the remaining suffixes. This directly follows the meaning of normal characters and ..

For a starred pattern block p[j]*, there are only two possible categories:

It matches zero copies of p[j].
It matches at least one copy of p[j].

The first case skips the block with dp(i, j + 2).

The second case requires the current string character to match p[j], consumes one string character, and keeps the same pattern block with dp(i + 1, j).

These two cases cover every valid use of *.

Since the recurrence exactly follows the definition of the pattern language, and since the answer is dp(0, 0), the algorithm returns true exactly when the pattern matches the entire string.

Complexity

Metric	Value	Why
Time	`O(mn)`	There are at most `m * n` suffix states
Space	`O(mn)`	Memoization stores states

Here:

m = len(s)
n = len(p)

Strictly, the number of states is (m + 1) * (n + 1).

Implementation

from functools import cache

class Solution:
    def isMatch(self, s: str, p: str) -> bool:
        @cache
        def dp(i: int, j: int) -> bool:
            if j == len(p):
                return i == len(s)

            first_match = (
                i < len(s)
                and (s[i] == p[j] or p[j] == ".")
            )

            if j + 1 < len(p) and p[j + 1] == "*":
                return (
                    dp(i, j + 2)
                    or (first_match and dp(i + 1, j))
                )

            return first_match and dp(i + 1, j + 1)

        return dp(0, 0)

Code Explanation

The memoized function:

dp(i, j)

means:

s[i:] matches p[j:]

The base case:

if j == len(p):
    return i == len(s)

If the pattern is exhausted, the string must also be exhausted.

The current character match check:

first_match = (
    i < len(s)
    and (s[i] == p[j] or p[j] == ".")
)

This handles both ordinary characters and ..

The star case:

if j + 1 < len(p) and p[j + 1] == "*":

The zero-occurrence branch:

dp(i, j + 2)

The one-or-more branch:

first_match and dp(i + 1, j)

The normal case:

return first_match and dp(i + 1, j + 1)

Finally:

return dp(0, 0)

Testing

def run_tests():
    s = Solution()

    assert s.isMatch("aa", "a") is False
    assert s.isMatch("aa", "a*") is True
    assert s.isMatch("ab", ".*") is True
    assert s.isMatch("aab", "c*a*b") is True
    assert s.isMatch("mississippi", "mis*is*p*.") is False
    assert s.isMatch("", "c*") is True
    assert s.isMatch("", ".") is False
    assert s.isMatch("ab", ".*c") is False
    assert s.isMatch("aaa", "ab*a*c*a") is True

    print("all tests passed")

run_tests()

Test	Why
`"aa"`, `"a"`	Pattern must match the whole string
`"aa"`, `"a*"`	`*` matches multiple characters
`"ab"`, `".*"`	Dot with star can match any sequence
`"aab"`, `"cab"`	Star can match zero or more copies
`"mississippi"`, `"misisp*."`	Complex false case
`""`, `"c*"`	Star can match empty string
`""`, `"."`	Dot needs one character
`"ab"`, `".*c"`	Full match prevents partial success
`"aaa"`, `"abac*a"`	Mixed zero and nonzero star use