A clear explanation of finding the longest substring that can become all one letter using a sliding window.
Problem Restatement
We are given a string s and an integer k.
The string contains only uppercase English letters.
In one operation, we may choose any character and change it to any other uppercase English letter.
We may perform at most k operations.
Return the length of the longest substring that can be made of one repeated letter after at most k replacements.
The official examples include s = "ABAB", k = 2, whose answer is 4, and s = "AABABBA", k = 1, whose answer is 4. The constraints are 1 <= s.length <= 10^5, s contains only uppercase English letters, and 0 <= k <= s.length.
Input and Output
| Item | Meaning |
|---|---|
| Input | String s and integer k |
| Output | Maximum valid substring length |
| Operation | Replace one character with another uppercase English letter |
| Limit | At most k replacements |
| Substring rule | Must be contiguous |
Example function shape:
def characterReplacement(s: str, k: int) -> int:
...Examples
Example 1:
s = "ABAB"
k = 2The answer is:
4We can replace both A characters with B:
BBBBor both B characters with A:
AAAASo the whole string can become one repeated letter.
Example 2:
s = "AABABBA"
k = 1The answer is:
4One valid substring is:
"ABBA"Replace the A with B:
"BBBB"The length is 4.
First Thought: Check Every Substring
A direct approach is to check every substring.
For each substring, count the most frequent character.
If the substring length is L, and its most frequent character appears max_count times, then the number of replacements needed is:
L - max_countIf this value is at most k, the substring can become one repeated letter.
This works, but checking all substrings takes too long for s.length up to 10^5.
Key Insight
For any window, the best target letter is the most frequent letter in that window.
If a window has length:
window_sizeand its most common character appears:
max_freqthen the number of characters we must replace is:
window_size - max_freqThe window is valid when:
window_size - max_freq <= kSo we can use a sliding window.
Expand the right side one character at a time.
If the window becomes invalid, move the left side forward until it is valid again.
Algorithm
Use a frequency table for characters inside the current window.
Initialize:
left = 0
max_freq = 0
answer = 0For each right from 0 to len(s) - 1:
- Add
s[right]to the window. - Update
max_freq. - If the window needs more than
kreplacements, moveleftright by one. - Update the answer with the current window length.
The key condition is:
(right - left + 1) - max_freq > kIf true, the window is too expensive to fix.
Correctness
For any fixed window, changing all characters to the most frequent character gives the minimum possible number of replacements.
So the window is valid exactly when:
window_size - max_freq <= kThe algorithm maintains a window while scanning from left to right.
Whenever the current window requires more than k replacements, the left boundary moves right to remove characters and reduce the window size.
Every candidate window considered by the algorithm corresponds to a contiguous substring. The answer is updated only from windows whose size can be maintained under the replacement rule.
Since the right pointer visits every possible ending position, and the left pointer only moves forward when needed, the algorithm finds the largest substring that can be converted into one repeated letter.
About the Non-Decreasing max_freq
In the implementation below, max_freq is not decreased when the left pointer moves.
This is a standard optimization.
A stale max_freq may make the window look slightly better than it currently is, but it never causes the final answer to be too large. It only delays shrinking.
The value of answer represents the largest window length that was achievable when that max_freq was created. Since max_freq only increases when a real character count reaches that value, any recorded large window length has a valid basis from some earlier window.
This lets us keep the algorithm linear without recomputing the maximum frequency from all 26 letters after every shrink.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n) | Each pointer moves through the string once |
| Space | O(1) | At most 26 uppercase letters are counted |
Here n is the length of s.
Implementation
from collections import defaultdict
class Solution:
def characterReplacement(self, s: str, k: int) -> int:
count = defaultdict(int)
left = 0
max_freq = 0
answer = 0
for right, ch in enumerate(s):
count[ch] += 1
max_freq = max(max_freq, count[ch])
window_size = right - left + 1
if window_size - max_freq > k:
count[s[left]] -= 1
left += 1
answer = max(answer, right - left + 1)
return answerCode Explanation
We count characters in the current window:
count = defaultdict(int)The left side of the window starts at:
left = 0The variable:
max_freqstores the largest frequency of any single character seen in the current expanding process.
When we add the right character:
count[ch] += 1we update:
max_freq = max(max_freq, count[ch])The current window size is:
window_size = right - left + 1If the number of characters outside the majority character exceeds k:
if window_size - max_freq > k:then the window is too large, so we remove the leftmost character:
count[s[left]] -= 1
left += 1Finally, we update the best length:
answer = max(answer, right - left + 1)Testing
def test_character_replacement():
s = Solution()
assert s.characterReplacement("ABAB", 2) == 4
assert s.characterReplacement("AABABBA", 1) == 4
assert s.characterReplacement("AAAA", 0) == 4
assert s.characterReplacement("ABCDE", 1) == 2
assert s.characterReplacement("BAAAB", 2) == 5
assert s.characterReplacement("A", 0) == 1
assert s.characterReplacement("ABBB", 2) == 4
print("all tests passed")Test Notes
| Test | Why |
|---|---|
"ABAB", k = 2 | Standard full-string replacement |
"AABABBA", k = 1 | Standard partial-window example |
"AAAA", k = 0 | Already repeated |
"ABCDE", k = 1 | Only two-character window can become equal |
"BAAAB", k = 2 | Whole string can become AAAAA |
"A", k = 0 | Minimum input |
"ABBB", k = 2 | Majority character dominates |