A clear explanation of Longest Substring with At Most K Distinct Characters using a sliding window and character counts.
Problem Restatement
We are given a string s and an integer k.
Return the length of the longest substring of s that contains at most k distinct characters.
A substring must be contiguous.
For example:
s = "eceba"
k = 2The substring "ece" contains only two distinct characters:
e, cIts length is 3, so the answer is 3.
Input and Output
| Item | Meaning |
|---|---|
| Input | A string s and an integer k |
| Output | Length of the longest valid substring |
| Valid substring | Contains at most k distinct characters |
| Substring rule | Characters must be contiguous |
Function shape:
def lengthOfLongestSubstringKDistinct(s: str, k: int) -> int:
...Examples
Example 1:
Input: s = "eceba", k = 2
Output: 3The longest valid substring is:
"ece"It contains two distinct characters and has length 3.
Example 2:
Input: s = "aa", k = 1
Output: 2The whole string contains only one distinct character, so the answer is 2.
Example 3:
Input: s = "a", k = 0
Output: 0No non-empty substring can contain at most zero distinct characters.
So the answer is 0.
First Thought: Try Every Substring
A direct method is to enumerate every substring.
For each starting index i, extend the ending index j, track distinct characters, and update the answer when the substring has at most k distinct characters.
class Solution:
def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
ans = 0
for i in range(len(s)):
seen = set()
for j in range(i, len(s)):
seen.add(s[j])
if len(seen) <= k:
ans = max(ans, j - i + 1)
else:
break
return ansThis works, but in the worst case it still checks many substrings.
The time complexity is:
O(n^2)We need a linear scan.
Key Insight
Use a sliding window.
Maintain a window:
s[left : right + 1]that contains at most k distinct characters.
Move right forward to include new characters.
If the window becomes invalid, meaning it has more than k distinct characters, move left forward until the window is valid again.
A hash map stores character counts inside the current window.
The number of keys in the hash map is the number of distinct characters.
Algorithm
Use:
| Variable | Meaning |
|---|---|
left | Left boundary of the window |
right | Right boundary of the window |
count | Character frequency map inside the window |
best | Longest valid window length seen so far |
Steps:
- If
k == 0, return0. - Set
left = 0. - Scan
rightfrom0tolen(s) - 1. - Add
s[right]to the frequency map. - While the map has more than
kdistinct characters:- Remove
s[left]from the map. - If its count becomes zero, delete it.
- Move
leftright by one.
- Remove
- Update the answer with the current window length.
Walkthrough
Use:
s = "eceba"
k = 2Start:
left = 0
count = {}
best = 0Read e:
window = "e"
count = {e: 1}
best = 1Read c:
window = "ec"
count = {e: 1, c: 1}
best = 2Read e:
window = "ece"
count = {e: 2, c: 1}
best = 3Read b:
window = "eceb"
count = {e: 2, c: 1, b: 1}Now there are three distinct characters, which is more than k = 2.
Shrink from the left.
Remove e:
window = "ceb"
count = {e: 1, c: 1, b: 1}Still three distinct characters.
Remove c:
window = "eb"
count = {e: 1, b: 1}Now valid again.
Read a:
window = "eba"
count = {e: 1, b: 1, a: 1}Shrink again until valid:
window = "ba"
count = {b: 1, a: 1}The best length found is 3.
Correctness
The algorithm maintains this invariant after the inner while loop:
s[left : right + 1] contains at most k distinct characters.When a new character is added at right, the window may become invalid.
The only way to make it valid again is to remove characters from the left, because right is fixed for this step and the substring must remain contiguous.
The algorithm keeps moving left until the number of distinct characters is at most k.
At that point, the current window is the longest valid substring ending at right, because moving left any farther right would only make the window shorter.
The algorithm checks every possible right endpoint and records the maximum valid window length. Therefore, the final answer is the length of the longest substring with at most k distinct characters.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n) | Each character enters and leaves the window at most once |
| Space | O(k) | The map stores at most k + 1 distinct characters during shrinking |
Here, n is the length of s.
Implementation
from collections import defaultdict
class Solution:
def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
if k == 0:
return 0
count = defaultdict(int)
left = 0
best = 0
for right, ch in enumerate(s):
count[ch] += 1
while len(count) > k:
left_ch = s[left]
count[left_ch] -= 1
if count[left_ch] == 0:
del count[left_ch]
left += 1
best = max(best, right - left + 1)
return bestCode Explanation
Handle the zero case first:
if k == 0:
return 0No non-empty substring can have at most zero distinct characters.
The frequency map stores characters in the current window:
count = defaultdict(int)The left boundary starts at 0:
left = 0For every right boundary:
for right, ch in enumerate(s):add the new character:
count[ch] += 1If the window has too many distinct characters, shrink it:
while len(count) > k:Remove the leftmost character from the window:
left_ch = s[left]
count[left_ch] -= 1If its count reaches zero, delete it from the map:
if count[left_ch] == 0:
del count[left_ch]Then move the left boundary:
left += 1After the window is valid, update the best length:
best = max(best, right - left + 1)Testing
def run_tests():
s = Solution()
assert s.lengthOfLongestSubstringKDistinct("eceba", 2) == 3
assert s.lengthOfLongestSubstringKDistinct("aa", 1) == 2
assert s.lengthOfLongestSubstringKDistinct("a", 0) == 0
assert s.lengthOfLongestSubstringKDistinct("abc", 2) == 2
assert s.lengthOfLongestSubstringKDistinct("abaccc", 2) == 4
assert s.lengthOfLongestSubstringKDistinct("aaaa", 1) == 4
assert s.lengthOfLongestSubstringKDistinct("abcadcacacaca", 3) == 11
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
"eceba", 2 | Standard sample |
"aa", 1 | Entire string is valid |
"a", 0 | Zero distinct characters allowed |
"abc", 2 | Needs shrinking |
"abaccc", 2 | Longest window appears after shrink |
"aaaa", 1 | Repeated single character |
"abcadcacacaca", 3 | Larger mixed window |