# LeetCode 940: Distinct Subsequences II

## Problem Restatement

We are given a string:

```python
s
```

We need to count how many distinct non-empty subsequences exist.

A subsequence is formed by deleting zero or more characters without changing the order of the remaining characters.

For example, from:

```text
"abc"
```

the subsequences include:

```text
"a"
"b"
"c"
"ab"
"ac"
"bc"
"abc"
```

The empty subsequence does not count.

Since the answer can become large, return it modulo:

```python
10**9 + 7
```

The official statement asks for the number of distinct non-empty subsequences of `s`, modulo `10^9 + 7`. ([leetcode.com](https://leetcode.com/problems/distinct-subsequences-ii/?utm_source=chatgpt.com))

## Input and Output

| Item | Meaning |
|---|---|
| Input | A string `s` |
| Output | Number of distinct non-empty subsequences |
| Empty subsequence | Not counted |
| Modulo | `10^9 + 7` |

Function shape:

```python
class Solution:
    def distinctSubseqII(self, s: str) -> int:
        ...
```

## Examples

Example 1:

```python
s = "abc"
```

Distinct non-empty subsequences:

```text
"a"
"b"
"c"
"ab"
"ac"
"bc"
"abc"
```

Count:

```python
7
```

Example 2:

```python
s = "aba"
```

Distinct subsequences:

```text
"a"
"b"
"aa"
"ab"
"ba"
"aba"
```

Count:

```python
6
```

Notice that subsequence `"a"` appears multiple ways, but it counts only once.

Example 3:

```python
s = "aaa"
```

Distinct subsequences:

```text
"a"
"aa"
"aaa"
```

Count:

```python
3
```

## First Thought: Generate Every Subsequence

A string of length `n` has:

```text
2^n
```

possible subsequences.

One direct solution is:

1. Generate every subsequence.
2. Insert each subsequence into a set.
3. Return the set size minus one for the empty subsequence.

This works only for very small strings.

The number of subsequences grows exponentially, so we need dynamic programming.

## Key Insight

Suppose we process the string left to right.

When we read a new character `c`, every existing subsequence can either:

1. Ignore `c`
2. Append `c`

So the number of subsequences seems to double.

But duplicates appear when the same character occurred before.

Example:

```text
s = "aba"
```

After processing `"ab"`:

```text
"", "a", "b", "ab"
```

Now process the final `"a"`.

Appending `"a"` creates:

```text
"a", "aa", "ba", "aba"
```

But `"a"` already existed.

We need to remove duplicates created by earlier occurrences of the same character.

## DP Interpretation

Let:

```python
dp
```

be the number of distinct subsequences including the empty subsequence.

Initially:

```python
dp = 1
```

because only the empty subsequence exists.

When reading a character:

```python
c
```

all current subsequences can append `c`.

So the total doubles:

```text
new_dp = 2 * dp
```

But if `c` appeared before, some subsequences are duplicated.

We subtract the number of subsequences that existed before the previous occurrence of `c`.

## Algorithm

Maintain:

| Variable | Meaning |
|---|---|
| `dp` | Current count including empty subsequence |
| `last[c]` | DP value before the previous occurrence of `c` |

Initialize:

```python
dp = 1
last = {}
```

For each character `c`:

1. Save the old value:

```python
old_dp = dp
```

2. Double the subsequence count:

```python
dp = dp * 2
```

3. If `c` appeared before:

```python
dp -= last[c]
```

4. Record:

```python
last[c] = old_dp
```

At the end, subtract one to remove the empty subsequence.

## Walkthrough

Use:

```python
s = "aba"
```

Start:

```python
dp = 1
```

The empty subsequence exists.

Process `'a'`:

```text
double: 1 -> 2
```

Subsequences:

```text
"", "a"
```

Store:

```python
last["a"] = 1
```

Process `'b'`:

```text
double: 2 -> 4
```

Subsequences:

```text
"", "a", "b", "ab"
```

Store:

```python
last["b"] = 2
```

Process final `'a'`:

```text
double: 4 -> 8
```

But duplicates are created.

Subtract:

```python
last["a"] = 1
```

So:

```text
8 - 1 = 7
```

Including empty subsequence:

```text
"", "a", "b", "ab", "aa", "ba", "aba"
```

Remove empty subsequence:

```python
7 - 1 = 6
```

Answer:

```python
6
```

## Correctness

Assume before processing a character `c`, there are `dp` distinct subsequences including the empty subsequence.

Every existing subsequence can either:

1. Stay unchanged
2. Append `c`

So naively the count doubles.

However, if `c` appeared before, appending `c` now recreates subsequences that were already created when the previous occurrence of `c` was processed.

Suppose the previous occurrence of `c` happened when the DP count before processing that character was `last[c]`.

All subsequences that existed at that earlier moment already generated appended versions ending in `c`. Appending the current `c` to those same earlier subsequences creates duplicates.

Therefore, subtracting `last[c]` removes exactly the duplicated subsequences.

The algorithm stores the correct DP value before each character occurrence, so duplicate removal is exact.

By induction, after processing the entire string, `dp` equals the number of distinct subsequences including the empty subsequence.

Subtracting one removes the empty subsequence, leaving exactly the number of distinct non-empty subsequences.

## Complexity

Let:

```python
n = len(s)
```

| Metric | Value | Why |
|---|---|---|
| Time | `O(n)` | Each character is processed once |
| Space | `O(1)` or `O(k)` | `k` distinct characters |

For lowercase English letters, space is effectively constant.

## Implementation

```python
class Solution:
    def distinctSubseqII(self, s: str) -> int:
        MOD = 10**9 + 7

        dp = 1
        last = {}

        for c in s:
            old_dp = dp

            dp = (dp * 2) % MOD

            if c in last:
                dp = (dp - last[c]) % MOD

            last[c] = old_dp

        return (dp - 1) % MOD
```

## Code Explanation

`dp` stores the number of distinct subsequences including the empty subsequence:

```python
dp = 1
```

`last[c]` stores the DP value before the previous occurrence of `c`:

```python
last = {}
```

Before changing `dp`, we save its old value:

```python
old_dp = dp
```

Then every subsequence either ignores or appends the current character:

```python
dp = (dp * 2) % MOD
```

If the character already appeared, duplicates exist:

```python
if c in last:
    dp = (dp - last[c]) % MOD
```

Finally, update the stored value for this character:

```python
last[c] = old_dp
```

The final subtraction removes the empty subsequence:

```python
return (dp - 1) % MOD
```

## Testing

```python
def run_tests():
    s = Solution()

    assert s.distinctSubseqII("abc") == 7
    assert s.distinctSubseqII("aba") == 6
    assert s.distinctSubseqII("aaa") == 3

    assert s.distinctSubseqII("a") == 1
    assert s.distinctSubseqII("ab") == 3
    assert s.distinctSubseqII("abab") == 11

    print("all tests passed")

run_tests()
```

| Test | Why |
|---|---|
| `"abc"` | All subsequences unique |
| `"aba"` | Duplicate handling |
| `"aaa"` | Many repeated duplicates |
| `"a"` | Smallest non-empty case |
| `"ab"` | Simple doubling behavior |
| `"abab"` | Multiple repeated characters |