LeetCode 940: Distinct Subsequences II

Problem Restatement

We are given a string:

We need to count how many distinct non-empty subsequences exist.

A subsequence is formed by deleting zero or more characters without changing the order of the remaining characters.

For example, from:

"abc"

the subsequences include:

"a"
"b"
"c"
"ab"
"ac"
"bc"
"abc"

The empty subsequence does not count.

Since the answer can become large, return it modulo:

10**9 + 7

The official statement asks for the number of distinct non-empty subsequences of s, modulo 10^9 + 7. (leetcode.com)

Input and Output

Item	Meaning
Input	A string `s`
Output	Number of distinct non-empty subsequences
Empty subsequence	Not counted
Modulo	`10^9 + 7`

Function shape:

class Solution:
    def distinctSubseqII(self, s: str) -> int:
        ...

Examples

Example 1:

s = "abc"

Distinct non-empty subsequences:

"a"
"b"
"c"
"ab"
"ac"
"bc"
"abc"

Count:

Example 2:

s = "aba"

Distinct subsequences:

"a"
"b"
"aa"
"ab"
"ba"
"aba"

Count:

Notice that subsequence "a" appears multiple ways, but it counts only once.

Example 3:

s = "aaa"

Distinct subsequences:

"a"
"aa"
"aaa"

Count:

First Thought: Generate Every Subsequence

A string of length n has:

2^n

possible subsequences.

One direct solution is:

Generate every subsequence.
Insert each subsequence into a set.
Return the set size minus one for the empty subsequence.

This works only for very small strings.

The number of subsequences grows exponentially, so we need dynamic programming.

Key Insight

Suppose we process the string left to right.

When we read a new character c, every existing subsequence can either:

Ignore c
Append c

So the number of subsequences seems to double.

But duplicates appear when the same character occurred before.

Example:

s = "aba"

After processing "ab":

"", "a", "b", "ab"

Now process the final "a".

Appending "a" creates:

"a", "aa", "ba", "aba"

But "a" already existed.

We need to remove duplicates created by earlier occurrences of the same character.

DP Interpretation

Let:

dp

be the number of distinct subsequences including the empty subsequence.

Initially:

dp = 1

because only the empty subsequence exists.

When reading a character:

all current subsequences can append c.

So the total doubles:

new_dp = 2 * dp

But if c appeared before, some subsequences are duplicated.

We subtract the number of subsequences that existed before the previous occurrence of c.

Algorithm

Maintain:

Variable	Meaning
`dp`	Current count including empty subsequence
`last[c]`	DP value before the previous occurrence of `c`

Initialize:

dp = 1
last = {}

For each character c:

Save the old value:

old_dp = dp

Double the subsequence count:

dp = dp * 2

If c appeared before:

dp -= last[c]

Record:

last[c] = old_dp

At the end, subtract one to remove the empty subsequence.

Walkthrough

Use:

s = "aba"

Start:

dp = 1

The empty subsequence exists.

Process 'a':

double: 1 -> 2

Subsequences:

"", "a"

Store:

last["a"] = 1

Process 'b':

double: 2 -> 4

Subsequences:

"", "a", "b", "ab"

Store:

last["b"] = 2

Process final 'a':

double: 4 -> 8

But duplicates are created.

Subtract:

last["a"] = 1

So:

8 - 1 = 7

Including empty subsequence:

"", "a", "b", "ab", "aa", "ba", "aba"

Remove empty subsequence:

7 - 1 = 6

Answer:

Correctness

Assume before processing a character c, there are dp distinct subsequences including the empty subsequence.

Every existing subsequence can either:

Stay unchanged
Append c

So naively the count doubles.

However, if c appeared before, appending c now recreates subsequences that were already created when the previous occurrence of c was processed.

Suppose the previous occurrence of c happened when the DP count before processing that character was last[c].

All subsequences that existed at that earlier moment already generated appended versions ending in c. Appending the current c to those same earlier subsequences creates duplicates.

Therefore, subtracting last[c] removes exactly the duplicated subsequences.

The algorithm stores the correct DP value before each character occurrence, so duplicate removal is exact.

By induction, after processing the entire string, dp equals the number of distinct subsequences including the empty subsequence.

Subtracting one removes the empty subsequence, leaving exactly the number of distinct non-empty subsequences.

Complexity

Let:

n = len(s)

Metric	Value	Why
Time	`O(n)`	Each character is processed once
Space	`O(1)` or `O(k)`	`k` distinct characters

For lowercase English letters, space is effectively constant.

Implementation

class Solution:
    def distinctSubseqII(self, s: str) -> int:
        MOD = 10**9 + 7

        dp = 1
        last = {}

        for c in s:
            old_dp = dp

            dp = (dp * 2) % MOD

            if c in last:
                dp = (dp - last[c]) % MOD

            last[c] = old_dp

        return (dp - 1) % MOD

Code Explanation

dp stores the number of distinct subsequences including the empty subsequence:

dp = 1

last[c] stores the DP value before the previous occurrence of c:

last = {}

Before changing dp, we save its old value:

old_dp = dp

Then every subsequence either ignores or appends the current character:

dp = (dp * 2) % MOD

If the character already appeared, duplicates exist:

if c in last:
    dp = (dp - last[c]) % MOD

Finally, update the stored value for this character:

last[c] = old_dp

The final subtraction removes the empty subsequence:

return (dp - 1) % MOD

Testing

def run_tests():
    s = Solution()

    assert s.distinctSubseqII("abc") == 7
    assert s.distinctSubseqII("aba") == 6
    assert s.distinctSubseqII("aaa") == 3

    assert s.distinctSubseqII("a") == 1
    assert s.distinctSubseqII("ab") == 3
    assert s.distinctSubseqII("abab") == 11

    print("all tests passed")

run_tests()

Test	Why
`"abc"`	All subsequences unique
`"aba"`	Duplicate handling
`"aaa"`	Many repeated duplicates
`"a"`	Smallest non-empty case
`"ab"`	Simple doubling behavior
`"abab"`	Multiple repeated characters