LeetCode 395: Longest Substring with At Least K Repeating Characters

Problem Restatement

We are given a string s and an integer k.

We need to return the length of the longest substring where every character in that substring appears at least k times.

A substring must be contiguous.

If no valid substring exists, return 0.

The string contains only lowercase English letters. The length of s is at most 10^4, and k can be larger than the string length.

Input and Output

Item	Meaning
Input	A string `s` and an integer `k`
Output	Length of the longest valid substring
Valid substring	Every character in it appears at least `k` times
Constraint	`1 <= s.length <= 10^4`
Constraint	`s` contains only lowercase English letters
Constraint	`1 <= k <= 10^5`

Example function shape:

def longestSubstring(s: str, k: int) -> int:
    ...

Examples

Example 1:

s = "aaabb"
k = 3

The substring:

"aaa"

is valid because a appears 3 times.

The full string "aaabb" is not valid because b appears only 2 times.

So the answer is:

Example 2:

s = "ababbc"
k = 2

The substring:

"ababb"

has:

Character	Count
`a`	`2`
`b`	`3`

Both counts are at least 2.

So the answer is:

First Thought: Check Every Substring

A direct solution is to try every substring.

For each start index and end index, count the characters inside that substring.

If every character count is at least k, update the answer.

This is correct, but too slow.

There are O(n^2) substrings, and checking each substring may cost up to O(n).

That gives O(n^3) time in the simplest form.

We need to use the frequency condition more directly.

Key Insight

If a character appears fewer than k times in a substring, then any larger valid substring cannot include that character.

For example:

s = "aaabb"
k = 3

In the whole string, b appears only 2 times.

So no valid substring that contains b can exist, because even the whole string does not have enough b.

Therefore, b acts as a separator.

We can split around bad characters and solve the smaller pieces.

For "aaabb":

aaa | bb

Only "aaa" can be valid.

This is the divide-and-conquer idea.

Algorithm

Define a helper function:

solve(left, right)

It returns the answer for the substring:

s[left:right]

For each call:

Count character frequencies inside s[left:right].
Find any character whose frequency is greater than 0 but less than k.
Such a character cannot be part of any valid substring in this range.
Split the range around that character.
Recursively solve each segment.
Return the maximum answer among those segments.
If no bad character exists, the whole range is valid, so return right - left.

Correctness

Consider one recursive range s[left:right].

If every character appearing in this range appears at least k times, then the whole substring is valid. Returning its length is correct.

Otherwise, suppose character c appears in this range fewer than k times.

Any substring inside s[left:right] that contains c also contains at most that many copies of c. Since that count is less than k, such a substring cannot be valid.

Therefore every valid substring in this range must lie completely between occurrences of c.

The algorithm splits around c and recursively checks exactly those candidate segments.

By applying the same reasoning to each segment, the recursion never discards a possible valid answer and never accepts an invalid substring.

So the maximum returned by the recursive calls is exactly the length of the longest valid substring.

Complexity

Let n = len(s).

Metric	Value	Why
Time	`O(26 * n)` average in practice	Each level counts lowercase letters over active segments
Worst case	`O(n^2)`	Repeated unbalanced splits can rescan large ranges
Space	`O(n)`	Recursion stack in the worst case

Since the alphabet has only 26 lowercase letters, this approach is usually fast enough for the given constraints.

Implementation

class Solution:
    def longestSubstring(self, s: str, k: int) -> int:
        def solve(left: int, right: int) -> int:
            if right - left < k:
                return 0

            count = [0] * 26

            for i in range(left, right):
                idx = ord(s[i]) - ord("a")
                count[idx] += 1

            for i in range(left, right):
                idx = ord(s[i]) - ord("a")

                if 0 < count[idx] < k:
                    best = 0
                    start = left

                    while start < right:
                        while start < right and s[start] == s[i]:
                            start += 1

                        end = start

                        while end < right and s[end] != s[i]:
                            end += 1

                        best = max(best, solve(start, end))
                        start = end

                    return best

            return right - left

        return solve(0, len(s))

Code Explanation

The helper solves one half-open range:

def solve(left: int, right: int) -> int:

If the range length is smaller than k, it cannot contain any valid non-empty substring:

if right - left < k:
    return 0

Then we count the characters in this range:

count = [0] * 26

for i in range(left, right):
    idx = ord(s[i]) - ord("a")
    count[idx] += 1

Next, we search for a bad character:

if 0 < count[idx] < k:

A bad character appears, but not enough times.

When we find one, we split the current range around that character.

The loop skips separators:

while start < right and s[start] == s[i]:
    start += 1

Then it finds the next segment without that separator:

while end < right and s[end] != s[i]:
    end += 1

Each segment is solved recursively:

best = max(best, solve(start, end))

If no bad character exists, the whole range is valid:

return right - left

Alternative: Sliding Window Over Unique Counts

There is also an iterative O(26 * n) solution.

The idea is to try each possible number of distinct characters from 1 to 26.

For each target number of distinct characters, use a sliding window and track:

Variable	Meaning
`unique`	Number of distinct characters in the window
`at_least_k`	Number of distinct characters whose count is at least `k`

When:

unique == target_unique and unique == at_least_k

the current window is valid.

class Solution:
    def longestSubstring(self, s: str, k: int) -> int:
        best = 0
        n = len(s)

        for target_unique in range(1, 27):
            count = [0] * 26
            left = 0
            unique = 0
            at_least_k = 0

            for right in range(n):
                idx = ord(s[right]) - ord("a")

                if count[idx] == 0:
                    unique += 1

                count[idx] += 1

                if count[idx] == k:
                    at_least_k += 1

                while unique > target_unique:
                    left_idx = ord(s[left]) - ord("a")

                    if count[left_idx] == k:
                        at_least_k -= 1

                    count[left_idx] -= 1

                    if count[left_idx] == 0:
                        unique -= 1

                    left += 1

                if unique == target_unique and unique == at_least_k:
                    best = max(best, right - left + 1)

        return best

Testing

def test_solution():
    s = Solution()

    assert s.longestSubstring("aaabb", 3) == 3
    assert s.longestSubstring("ababbc", 2) == 5
    assert s.longestSubstring("ababacb", 3) == 0
    assert s.longestSubstring("aaaaa", 1) == 5
    assert s.longestSubstring("aaaaa", 6) == 0
    assert s.longestSubstring("bbaaacbd", 3) == 3
    assert s.longestSubstring("weitong", 2) == 0

    print("all tests passed")

test_solution()

Test meaning:

Test	Why
`"aaabb"`, `k = 3`	Basic split around under-repeated character
`"ababbc"`, `k = 2`	Main example
`"ababacb"`, `k = 3`	No valid substring
`"aaaaa"`, `k = 1`	Whole string is valid
`"aaaaa"`, `k = 6`	`k` larger than length
`"bbaaacbd"`, `k = 3`	Valid substring in the middle
`"weitong"`, `k = 2`	All characters appear once