Find all starting indices where an anagram of p appears in s using a fixed-size sliding window.
Problem Restatement
We are given two strings:
| Name | Meaning |
|---|---|
s | The string we search inside |
p | The pattern string |
We need to return all starting indices in s where an anagram of p begins.
An anagram uses the same characters with the same frequencies, but the order can be different.
For example:
"abc", "bac", "cba"are all anagrams of each other.
The official problem asks us to return all start indices of p’s anagrams in s, in any order. Both strings contain lowercase English letters, and their lengths are at most 3 * 10^4.
Input and Output
| Item | Meaning |
|---|---|
| Input | Strings s and p |
| Output | List of starting indices |
| Anagram rule | Same character frequencies as p |
| Window length | Must be exactly len(p) |
| Character set | Lowercase English letters |
Example function shape:
def findAnagrams(s: str, p: str) -> list[int]:
...Examples
Example 1:
s = "cbaebabacd"
p = "abc"Substrings of length 3 that are anagrams of "abc":
index 0: "cba"
index 6: "bac"Answer:
[0, 6]Example 2:
s = "abab"
p = "ab"Substrings of length 2:
index 0: "ab"
index 1: "ba"
index 2: "ab"All three are anagrams of "ab".
Answer:
[0, 1, 2]Example 3:
s = "aaaaa"
p = "aa"Every length-2 substring is "aa".
Answer:
[0, 1, 2, 3]First Thought: Sort Every Substring
A direct approach is:
- Sort
p. - For every substring of
swith lengthlen(p), sort that substring. - If the sorted strings match, record the starting index.
For example:
p = "abc"
sorted_p = "abc"At index 0:
substring = "cba"
sorted_substring = "abc"So index 0 is valid.
This works, but sorting every window is expensive.
If m = len(p) and n = len(s), this costs roughly:
O(n * m log m)We can do better by reusing work between adjacent windows.
Key Insight
Two neighboring windows of length len(p) are almost the same.
For example:
s = "cbaebabacd"
p = "abc"
window 0: c b a
window 1: b a eWhen the window moves one step:
| Change | Character |
|---|---|
| Leaves window | c |
| Enters window | e |
So instead of recounting the whole window, we update only two character counts.
Because s and p contain lowercase English letters, we can use arrays of length 26.
One array stores frequencies in p.
Another array stores frequencies in the current window of s.
If the arrays are equal, the current window is an anagram.
Algorithm
Let:
n = len(s)
m = len(p)If m > n, no substring of s can have length m, so return [].
Create two arrays:
p_count = [0] * 26
window_count = [0] * 26Fill both arrays for the first m characters:
- Count all characters in
p. - Count the first window
s[0:m].
If the counts match, append 0.
Then slide the window from left to right.
For each new right index right from m to n - 1:
- Add
s[right]to the window. - Remove
s[right - m]from the window. - If counts match, append the new start index:
right - m + 1Return the answer.
Correctness
A substring is an anagram of p exactly when it has the same character frequency for every lowercase English letter.
The algorithm maintains window_count for exactly the current substring of s with length len(p).
Initially, window_count represents s[0:len(p)].
When the window slides one position to the right, the algorithm removes the character that leaves the window and adds the character that enters the window. Therefore, after each slide, window_count again represents exactly the current length-len(p) substring.
At each position, the algorithm compares window_count with p_count.
If they are equal, the current substring has the same character frequencies as p, so its start index is valid.
If they are different, at least one character frequency differs, so the current substring cannot be an anagram.
Thus, the algorithm adds exactly all valid starting indices and no invalid ones.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n) | Each character enters and leaves the window at most once |
| Space | O(1) | Two arrays of size 26 |
The array comparison costs O(26), which is constant.
Implementation
class Solution:
def findAnagrams(self, s: str, p: str) -> list[int]:
n = len(s)
m = len(p)
if m > n:
return []
p_count = [0] * 26
window_count = [0] * 26
for i in range(m):
p_index = ord(p[i]) - ord('a')
s_index = ord(s[i]) - ord('a')
p_count[p_index] += 1
window_count[s_index] += 1
answer = []
if window_count == p_count:
answer.append(0)
for right in range(m, n):
enter_index = ord(s[right]) - ord('a')
leave_index = ord(s[right - m]) - ord('a')
window_count[enter_index] += 1
window_count[leave_index] -= 1
if window_count == p_count:
answer.append(right - m + 1)
return answerCode Explanation
We first compute lengths:
n = len(s)
m = len(p)If p is longer than s, no valid window exists:
if m > n:
return []We use frequency arrays for lowercase English letters:
p_count = [0] * 26
window_count = [0] * 26The character index is computed by:
ord(ch) - ord('a')We fill the count array for p and for the first window in s:
for i in range(m):Then we check the first window:
if window_count == p_count:
answer.append(0)After that, we slide the window.
The character at right enters:
window_count[enter_index] += 1The character at right - m leaves:
window_count[leave_index] -= 1After the update, the current window starts at:
right - m + 1If the count arrays match, that start index is added to the answer.
Testing
def run_tests():
s = Solution()
assert s.findAnagrams("cbaebabacd", "abc") == [0, 6]
assert s.findAnagrams("abab", "ab") == [0, 1, 2]
assert s.findAnagrams("aaaaa", "aa") == [0, 1, 2, 3]
assert s.findAnagrams("abc", "abcd") == []
assert s.findAnagrams("baa", "aa") == [1]
assert s.findAnagrams("af", "be") == []
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
"cbaebabacd", "abc" | Standard example |
"abab", "ab" | Overlapping anagrams |
| Repeated same character | Checks frequency counts |
| Pattern longer than string | Must return empty list |
| Match at the end | Checks final window |
| No shared counts | Checks no-result case |