A clear explanation of solving Distinct Subsequences II using dynamic programming and last occurrence tracking.
Problem Restatement
We are given a string:
sWe need to count how many distinct non-empty subsequences exist.
A subsequence is formed by deleting zero or more characters without changing the order of the remaining characters.
For example, from:
"abc"the subsequences include:
"a"
"b"
"c"
"ab"
"ac"
"bc"
"abc"The empty subsequence does not count.
Since the answer can become large, return it modulo:
10**9 + 7The official statement asks for the number of distinct non-empty subsequences of s, modulo 10^9 + 7. (leetcode.com)
Input and Output
| Item | Meaning |
|---|---|
| Input | A string s |
| Output | Number of distinct non-empty subsequences |
| Empty subsequence | Not counted |
| Modulo | 10^9 + 7 |
Function shape:
class Solution:
def distinctSubseqII(self, s: str) -> int:
...Examples
Example 1:
s = "abc"Distinct non-empty subsequences:
"a"
"b"
"c"
"ab"
"ac"
"bc"
"abc"Count:
7Example 2:
s = "aba"Distinct subsequences:
"a"
"b"
"aa"
"ab"
"ba"
"aba"Count:
6Notice that subsequence "a" appears multiple ways, but it counts only once.
Example 3:
s = "aaa"Distinct subsequences:
"a"
"aa"
"aaa"Count:
3First Thought: Generate Every Subsequence
A string of length n has:
2^npossible subsequences.
One direct solution is:
- Generate every subsequence.
- Insert each subsequence into a set.
- Return the set size minus one for the empty subsequence.
This works only for very small strings.
The number of subsequences grows exponentially, so we need dynamic programming.
Key Insight
Suppose we process the string left to right.
When we read a new character c, every existing subsequence can either:
- Ignore
c - Append
c
So the number of subsequences seems to double.
But duplicates appear when the same character occurred before.
Example:
s = "aba"After processing "ab":
"", "a", "b", "ab"Now process the final "a".
Appending "a" creates:
"a", "aa", "ba", "aba"But "a" already existed.
We need to remove duplicates created by earlier occurrences of the same character.
DP Interpretation
Let:
dpbe the number of distinct subsequences including the empty subsequence.
Initially:
dp = 1because only the empty subsequence exists.
When reading a character:
call current subsequences can append c.
So the total doubles:
new_dp = 2 * dpBut if c appeared before, some subsequences are duplicated.
We subtract the number of subsequences that existed before the previous occurrence of c.
Algorithm
Maintain:
| Variable | Meaning |
|---|---|
dp | Current count including empty subsequence |
last[c] | DP value before the previous occurrence of c |
Initialize:
dp = 1
last = {}For each character c:
- Save the old value:
old_dp = dp- Double the subsequence count:
dp = dp * 2- If
cappeared before:
dp -= last[c]- Record:
last[c] = old_dpAt the end, subtract one to remove the empty subsequence.
Walkthrough
Use:
s = "aba"Start:
dp = 1The empty subsequence exists.
Process 'a':
double: 1 -> 2Subsequences:
"", "a"Store:
last["a"] = 1Process 'b':
double: 2 -> 4Subsequences:
"", "a", "b", "ab"Store:
last["b"] = 2Process final 'a':
double: 4 -> 8But duplicates are created.
Subtract:
last["a"] = 1So:
8 - 1 = 7Including empty subsequence:
"", "a", "b", "ab", "aa", "ba", "aba"Remove empty subsequence:
7 - 1 = 6Answer:
6Correctness
Assume before processing a character c, there are dp distinct subsequences including the empty subsequence.
Every existing subsequence can either:
- Stay unchanged
- Append
c
So naively the count doubles.
However, if c appeared before, appending c now recreates subsequences that were already created when the previous occurrence of c was processed.
Suppose the previous occurrence of c happened when the DP count before processing that character was last[c].
All subsequences that existed at that earlier moment already generated appended versions ending in c. Appending the current c to those same earlier subsequences creates duplicates.
Therefore, subtracting last[c] removes exactly the duplicated subsequences.
The algorithm stores the correct DP value before each character occurrence, so duplicate removal is exact.
By induction, after processing the entire string, dp equals the number of distinct subsequences including the empty subsequence.
Subtracting one removes the empty subsequence, leaving exactly the number of distinct non-empty subsequences.
Complexity
Let:
n = len(s)| Metric | Value | Why |
|---|---|---|
| Time | O(n) | Each character is processed once |
| Space | O(1) or O(k) | k distinct characters |
For lowercase English letters, space is effectively constant.
Implementation
class Solution:
def distinctSubseqII(self, s: str) -> int:
MOD = 10**9 + 7
dp = 1
last = {}
for c in s:
old_dp = dp
dp = (dp * 2) % MOD
if c in last:
dp = (dp - last[c]) % MOD
last[c] = old_dp
return (dp - 1) % MODCode Explanation
dp stores the number of distinct subsequences including the empty subsequence:
dp = 1last[c] stores the DP value before the previous occurrence of c:
last = {}Before changing dp, we save its old value:
old_dp = dpThen every subsequence either ignores or appends the current character:
dp = (dp * 2) % MODIf the character already appeared, duplicates exist:
if c in last:
dp = (dp - last[c]) % MODFinally, update the stored value for this character:
last[c] = old_dpThe final subtraction removes the empty subsequence:
return (dp - 1) % MODTesting
def run_tests():
s = Solution()
assert s.distinctSubseqII("abc") == 7
assert s.distinctSubseqII("aba") == 6
assert s.distinctSubseqII("aaa") == 3
assert s.distinctSubseqII("a") == 1
assert s.distinctSubseqII("ab") == 3
assert s.distinctSubseqII("abab") == 11
print("all tests passed")
run_tests()| Test | Why |
|---|---|
"abc" | All subsequences unique |
"aba" | Duplicate handling |
"aaa" | Many repeated duplicates |
"a" | Smallest non-empty case |
"ab" | Simple doubling behavior |
"abab" | Multiple repeated characters |