A clear explanation of finding the longest substring with at most two distinct characters using a sliding window.
Problem Restatement
We are given a string s.
We need to return the length of the longest substring that contains at most two distinct characters.
A substring must be contiguous.
For example:
s = "eceba"The substring:
"ece"contains only two distinct characters:
"e", "c"Its length is:
3So the answer is:
3LeetCode gives examples such as s = "eceba" with output 3, and s = "ccaabbb" with output 5. The constraints state that 1 <= s.length <= 10^5 and s consists of English letters.
Input and Output
| Item | Meaning |
|---|---|
| Input | A string s |
| Output | Length of the longest valid substring |
| Valid substring | Contains at most two distinct characters |
| Substring rule | Characters must be contiguous |
Example function shape:
def lengthOfLongestSubstringTwoDistinct(s: str) -> int:
...Examples
Example 1:
s = "eceba"Valid substrings with at most two distinct characters include:
"e"
"ec"
"ece"The longest one is:
"ece"So the answer is:
3Example 2:
s = "ccaabbb"The longest valid substring is:
"aabbb"It contains only:
"a", "b"So the answer is:
5Example 3:
s = "aaaa"The whole string contains only one distinct character.
So the answer is:
4First Thought: Brute Force
The simplest solution is to check every substring.
For each starting index i, expand the ending index j.
Track the set of distinct characters. If the set size becomes greater than 2, stop expanding.
class Solution:
def lengthOfLongestSubstringTwoDistinct(self, s: str) -> int:
n = len(s)
best = 0
for i in range(n):
seen = set()
for j in range(i, n):
seen.add(s[j])
if len(seen) > 2:
break
best = max(best, j - i + 1)
return bestThis works, but it can be too slow.
Problem With Brute Force
For each starting position, we may scan many characters to the right.
In the worst case, this gives O(n^2) time.
Since s.length can be as large as 10^5, quadratic time is too large.
We need to reuse work between overlapping substrings.
Key Insight
This is a sliding window problem.
We keep a window:
s[left : right + 1]The window should always contain at most two distinct characters.
We expand the window by moving right.
If the window becomes invalid, meaning it has more than two distinct characters, we shrink it by moving left.
To know how many distinct characters are in the window, we store character counts in a hash map.
Algorithm
Initialize:
left = 0
count = {}
answer = 0Then scan the string with right.
For each character s[right]:
- Add it to the frequency map.
- While the map has more than two keys, shrink from the left.
- Update the best length using the current valid window.
The current window length is:
right - left + 1Walkthrough
Use:
s = "eceba"Start:
left = 0
count = {}
answer = 0Read e:
window = "e"
count = {"e": 1}
answer = 1Read c:
window = "ec"
count = {"e": 1, "c": 1}
answer = 2Read e:
window = "ece"
count = {"e": 2, "c": 1}
answer = 3Read b:
window = "eceb"
count = {"e": 2, "c": 1, "b": 1}Now the window has three distinct characters, so shrink from the left.
Remove the first e:
window = "ceb"
count = {"e": 1, "c": 1, "b": 1}Still three distinct characters.
Remove c:
window = "eb"
count = {"e": 1, "b": 1}Now the window is valid again.
Read a:
window = "eba"
count = {"e": 1, "b": 1, "a": 1}Shrink until only two distinct characters remain.
The best length remains:
3Correctness
The algorithm maintains this invariant:
At the end of each loop iteration, the window s[left : right + 1] contains at most two distinct characters.
When a new character is added, the window may become invalid. The inner loop moves left to the right and decreases character counts until only two distinct characters remain. Therefore, after shrinking, the invariant is restored.
For each fixed right, after the shrinking step, left is the smallest valid left boundary for a window ending at right. That means right - left + 1 is the longest valid substring ending at right.
The algorithm checks this best valid window for every possible right.
Since every substring ends at some index right, taking the maximum over all iterations gives the length of the longest substring with at most two distinct characters.
Complexity
Let n be the length of s.
| Metric | Value | Why |
|---|---|---|
| Time | O(n) | Each character enters and leaves the window at most once |
| Space | O(1) | At most three character counts are stored before shrinking |
Although the hash map is used, the problem limits the window to at most two distinct characters after shrinking.
Implementation
class Solution:
def lengthOfLongestSubstringTwoDistinct(self, s: str) -> int:
left = 0
count = {}
answer = 0
for right, ch in enumerate(s):
count[ch] = count.get(ch, 0) + 1
while len(count) > 2:
left_ch = s[left]
count[left_ch] -= 1
if count[left_ch] == 0:
del count[left_ch]
left += 1
answer = max(answer, right - left + 1)
return answerCode Explanation
We start the window at index 0:
left = 0The dictionary stores how many times each character appears in the current window:
count = {}We expand the window one character at a time:
for right, ch in enumerate(s):Add the new character:
count[ch] = count.get(ch, 0) + 1If there are more than two distinct characters:
while len(count) > 2:remove characters from the left side of the window:
left_ch = s[left]
count[left_ch] -= 1If a character count becomes zero, remove it from the map:
if count[left_ch] == 0:
del count[left_ch]Then move the left pointer:
left += 1Now the window is valid again, so update the answer:
answer = max(answer, right - left + 1)Testing
def run_tests():
sol = Solution()
assert sol.lengthOfLongestSubstringTwoDistinct("eceba") == 3
assert sol.lengthOfLongestSubstringTwoDistinct("ccaabbb") == 5
assert sol.lengthOfLongestSubstringTwoDistinct("aaaa") == 4
assert sol.lengthOfLongestSubstringTwoDistinct("ab") == 2
assert sol.lengthOfLongestSubstringTwoDistinct("abc") == 2
assert sol.lengthOfLongestSubstringTwoDistinct("abaccc") == 4
assert sol.lengthOfLongestSubstringTwoDistinct("aabbcc") == 4
print("all tests passed")
run_tests()| Test | Why |
|---|---|
"eceba" | Standard example |
"ccaabbb" | Longest answer appears near the end |
"aaaa" | Only one distinct character |
"ab" | Entire string is valid |
"abc" | Must shrink after third character |
"abaccc" | Best window after shrinking |
"aabbcc" | Multiple possible windows |