A clear explanation of the longest substring problem using sliding window and a hash set.
Problem Restatement
We are given a string s.
We need to find the length of the longest substring that contains no duplicate characters.
A substring means a contiguous part of the string.
For example, in:
s = "pwwkew""wke" is a substring because its characters appear next to each other.
"pwke" is not a substring because it skips one character. It is a subsequence, not a substring.
The problem asks for the length, not the substring itself.
Input and Output
| Item | Meaning |
|---|---|
| Input | A string s |
| Output | The length of the longest substring without duplicate characters |
| Constraint | 0 <= s.length <= 5 * 10^4 |
| Characters | English letters, digits, symbols, and spaces |
Source checked against the LeetCode problem statement and examples.
Example function shape:
def lengthOfLongestSubstring(s: str) -> int:
...Examples
Input:
s = "abcabcbb"The substring "abc" has no repeated characters.
Its length is 3.
Other valid answers with the same length include "bca" and "cab".
Output:
3Another example:
s = "bbbbb"Every character is b.
The longest substring without repeated characters is just:
"b"Output:
1Another example:
s = "pwwkew"The answer is:
"wke"Its length is:
3Output:
3First Thought: Check Every Substring
The direct solution is to generate every substring.
For each starting index i, we try every ending index j.
Then we check whether s[i:j+1] has duplicate characters.
Code idea:
class Solution:
def lengthOfLongestSubstring(self, s: str) -> int:
best = 0
n = len(s)
for i in range(n):
seen = set()
for j in range(i, n):
if s[j] in seen:
break
seen.add(s[j])
best = max(best, j - i + 1)
return bestThis version improves slightly by stopping once a duplicate appears.
For each fixed i, once we see a duplicate at j, any longer substring starting at i will also contain that duplicate. So we can break.
Problem With Brute Force
The brute force solution may still check many substrings.
For a string with no repeated characters, the inner loop runs almost to the end for many starting positions.
That gives:
| Metric | Value |
|---|---|
| Time | O(n^2) |
| Space | O(min(n, m)) |
Here, n is the length of the string, and m is the number of possible distinct characters.
Because s.length can be up to 5 * 10^4, quadratic time is too slow.
Key Insight
We need to keep a window of characters with no duplicates.
A window is a substring between two indices:
s[left:right + 1]The window is valid when every character inside it appears only once.
We move right forward to include new characters.
When the new character creates a duplicate, we move left forward until the duplicate disappears.
This works because we only care about contiguous substrings. Moving the left boundary preserves the substring shape.
Visual Walkthrough
Use:
s = "abcabcbb"Start with an empty window.
left = 0
seen = {}
best = 0Read a:
window = "a"
best = 1Read b:
window = "ab"
best = 2Read c:
window = "abc"
best = 3Read the next a.
Now the window would contain two a characters:
"abca"So we move left until the old a is removed.
The window becomes:
"bca"The length is still 3.
Continue the same process.
The best length remains:
3Algorithm
Maintain:
| Variable | Meaning |
|---|---|
left | Left boundary of the current window |
right | Right boundary of the current window |
seen | Characters currently inside the window |
best | Longest valid window length found so far |
For each character s[right]:
- While
s[right]is already insideseen, removes[left]and moveleftforward. - Add
s[right]toseen. - Update
bestusing the current window length.
The current window length is:
right - left + 1Correctness
The algorithm keeps this invariant:
At the end of each iteration, the window s[left:right+1] contains no duplicate characters.
When s[right] has not appeared in the current window, adding it keeps the window valid.
When s[right] already appears in the current window, the algorithm moves left forward and removes characters until the old copy of s[right] is gone. After that, adding s[right] gives a valid window again.
So every time we update best, the current window is a valid substring without duplicate characters.
Now consider any longest valid substring ending at index right.
The algorithm keeps left as far left as possible while preserving validity after processing right. Therefore the current window has the maximum valid length among substrings ending at right.
Since the algorithm checks every possible right boundary, it eventually considers the right boundary of the optimal answer.
So best is the length of the longest substring without duplicate characters.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n) | Each character enters and leaves the window at most once |
| Space | O(min(n, m)) | The set stores characters in the current window |
Here:
nis the length ofsmis the size of the character set
Implementation
class Solution:
def lengthOfLongestSubstring(self, s: str) -> int:
seen = set()
left = 0
best = 0
for right, char in enumerate(s):
while char in seen:
seen.remove(s[left])
left += 1
seen.add(char)
best = max(best, right - left + 1)
return bestCode Explanation
We use a set to store the characters currently inside the sliding window:
seen = set()The left boundary starts at index 0:
left = 0best stores the longest valid window seen so far:
best = 0Then we scan the string:
for right, char in enumerate(s):If char already exists in the current window, the window would become invalid.
So we shrink from the left:
while char in seen:
seen.remove(s[left])
left += 1After the duplicate has been removed, we can safely add the current character:
seen.add(char)Then we update the answer:
best = max(best, right - left + 1)Finally:
return bestTesting
def run_tests():
s = Solution()
assert s.lengthOfLongestSubstring("abcabcbb") == 3
assert s.lengthOfLongestSubstring("bbbbb") == 1
assert s.lengthOfLongestSubstring("pwwkew") == 3
assert s.lengthOfLongestSubstring("") == 0
assert s.lengthOfLongestSubstring(" ") == 1
assert s.lengthOfLongestSubstring("au") == 2
assert s.lengthOfLongestSubstring("dvdf") == 3
assert s.lengthOfLongestSubstring("abba") == 2
print("all tests passed")
run_tests()| Test | Why |
|---|---|
"abcabcbb" | Standard example |
"bbbbb" | All characters repeated |
"pwwkew" | Substring versus subsequence distinction |
"" | Empty string |
" " | Space is a valid character |
"au" | Whole string is valid |
"dvdf" | Duplicate appears after a gap |
"abba" | Checks correct left-boundary movement |