A clear explanation of finding the longest substring where every character appears at least k times using divide and conquer.
Problem Restatement
We are given a string s and an integer k.
We need to return the length of the longest substring where every character in that substring appears at least k times.
A substring must be contiguous.
If no valid substring exists, return 0.
The string contains only lowercase English letters. The length of s is at most 10^4, and k can be larger than the string length.
Input and Output
| Item | Meaning |
|---|---|
| Input | A string s and an integer k |
| Output | Length of the longest valid substring |
| Valid substring | Every character in it appears at least k times |
| Constraint | 1 <= s.length <= 10^4 |
| Constraint | s contains only lowercase English letters |
| Constraint | 1 <= k <= 10^5 |
Example function shape:
def longestSubstring(s: str, k: int) -> int:
...Examples
Example 1:
s = "aaabb"
k = 3The substring:
"aaa"is valid because a appears 3 times.
The full string "aaabb" is not valid because b appears only 2 times.
So the answer is:
3Example 2:
s = "ababbc"
k = 2The substring:
"ababb"has:
| Character | Count |
|---|---|
a | 2 |
b | 3 |
Both counts are at least 2.
So the answer is:
5First Thought: Check Every Substring
A direct solution is to try every substring.
For each start index and end index, count the characters inside that substring.
If every character count is at least k, update the answer.
This is correct, but too slow.
There are O(n^2) substrings, and checking each substring may cost up to O(n).
That gives O(n^3) time in the simplest form.
We need to use the frequency condition more directly.
Key Insight
If a character appears fewer than k times in a substring, then any larger valid substring cannot include that character.
For example:
s = "aaabb"
k = 3In the whole string, b appears only 2 times.
So no valid substring that contains b can exist, because even the whole string does not have enough b.
Therefore, b acts as a separator.
We can split around bad characters and solve the smaller pieces.
For "aaabb":
aaa | bbOnly "aaa" can be valid.
This is the divide-and-conquer idea.
Algorithm
Define a helper function:
solve(left, right)It returns the answer for the substring:
s[left:right]For each call:
- Count character frequencies inside
s[left:right]. - Find any character whose frequency is greater than
0but less thank. - Such a character cannot be part of any valid substring in this range.
- Split the range around that character.
- Recursively solve each segment.
- Return the maximum answer among those segments.
- If no bad character exists, the whole range is valid, so return
right - left.
Correctness
Consider one recursive range s[left:right].
If every character appearing in this range appears at least k times, then the whole substring is valid. Returning its length is correct.
Otherwise, suppose character c appears in this range fewer than k times.
Any substring inside s[left:right] that contains c also contains at most that many copies of c. Since that count is less than k, such a substring cannot be valid.
Therefore every valid substring in this range must lie completely between occurrences of c.
The algorithm splits around c and recursively checks exactly those candidate segments.
By applying the same reasoning to each segment, the recursion never discards a possible valid answer and never accepts an invalid substring.
So the maximum returned by the recursive calls is exactly the length of the longest valid substring.
Complexity
Let n = len(s).
| Metric | Value | Why |
|---|---|---|
| Time | O(26 * n) average in practice | Each level counts lowercase letters over active segments |
| Worst case | O(n^2) | Repeated unbalanced splits can rescan large ranges |
| Space | O(n) | Recursion stack in the worst case |
Since the alphabet has only 26 lowercase letters, this approach is usually fast enough for the given constraints.
Implementation
class Solution:
def longestSubstring(self, s: str, k: int) -> int:
def solve(left: int, right: int) -> int:
if right - left < k:
return 0
count = [0] * 26
for i in range(left, right):
idx = ord(s[i]) - ord("a")
count[idx] += 1
for i in range(left, right):
idx = ord(s[i]) - ord("a")
if 0 < count[idx] < k:
best = 0
start = left
while start < right:
while start < right and s[start] == s[i]:
start += 1
end = start
while end < right and s[end] != s[i]:
end += 1
best = max(best, solve(start, end))
start = end
return best
return right - left
return solve(0, len(s))Code Explanation
The helper solves one half-open range:
def solve(left: int, right: int) -> int:If the range length is smaller than k, it cannot contain any valid non-empty substring:
if right - left < k:
return 0Then we count the characters in this range:
count = [0] * 26
for i in range(left, right):
idx = ord(s[i]) - ord("a")
count[idx] += 1Next, we search for a bad character:
if 0 < count[idx] < k:A bad character appears, but not enough times.
When we find one, we split the current range around that character.
The loop skips separators:
while start < right and s[start] == s[i]:
start += 1Then it finds the next segment without that separator:
while end < right and s[end] != s[i]:
end += 1Each segment is solved recursively:
best = max(best, solve(start, end))If no bad character exists, the whole range is valid:
return right - leftAlternative: Sliding Window Over Unique Counts
There is also an iterative O(26 * n) solution.
The idea is to try each possible number of distinct characters from 1 to 26.
For each target number of distinct characters, use a sliding window and track:
| Variable | Meaning |
|---|---|
unique | Number of distinct characters in the window |
at_least_k | Number of distinct characters whose count is at least k |
When:
unique == target_unique and unique == at_least_kthe current window is valid.
class Solution:
def longestSubstring(self, s: str, k: int) -> int:
best = 0
n = len(s)
for target_unique in range(1, 27):
count = [0] * 26
left = 0
unique = 0
at_least_k = 0
for right in range(n):
idx = ord(s[right]) - ord("a")
if count[idx] == 0:
unique += 1
count[idx] += 1
if count[idx] == k:
at_least_k += 1
while unique > target_unique:
left_idx = ord(s[left]) - ord("a")
if count[left_idx] == k:
at_least_k -= 1
count[left_idx] -= 1
if count[left_idx] == 0:
unique -= 1
left += 1
if unique == target_unique and unique == at_least_k:
best = max(best, right - left + 1)
return bestTesting
def test_solution():
s = Solution()
assert s.longestSubstring("aaabb", 3) == 3
assert s.longestSubstring("ababbc", 2) == 5
assert s.longestSubstring("ababacb", 3) == 0
assert s.longestSubstring("aaaaa", 1) == 5
assert s.longestSubstring("aaaaa", 6) == 0
assert s.longestSubstring("bbaaacbd", 3) == 3
assert s.longestSubstring("weitong", 2) == 0
print("all tests passed")
test_solution()Test meaning:
| Test | Why |
|---|---|
"aaabb", k = 3 | Basic split around under-repeated character |
"ababbc", k = 2 | Main example |
"ababacb", k = 3 | No valid substring |
"aaaaa", k = 1 | Whole string is valid |
"aaaaa", k = 6 | k larger than length |
"bbaaacbd", k = 3 | Valid substring in the middle |
"weitong", k = 2 | All characters appear once |