A clear explanation of finding the first occurrence of one string inside another using direct string matching.
Problem Restatement
We are given two strings: haystack and needle.
We need to return the index where needle first appears inside haystack.
If needle does not appear inside haystack, return -1.
For this current version of the problem, both strings have length at least 1, and both contain only lowercase English letters.
Input and Output
| Item | Meaning |
|---|---|
| Input | Two strings: haystack and needle |
| Output | The first index where needle appears in haystack |
| Not found | Return -1 |
| Constraints | 1 <= haystack.length, needle.length <= 10^4 |
Function shape:
def strStr(haystack: str, needle: str) -> int:
...Examples
Example 1:
haystack = "sadbutsad"
needle = "sad"The string "sad" appears at index 0 and index 6.
The first occurrence is index 0.
Return:
0Example 2:
haystack = "leetcode"
needle = "leeto"The string "leeto" does not appear inside "leetcode".
Return:
-1Example 3:
haystack = "hello"
needle = "ll"The substring "ll" starts at index 2.
Return:
2First Thought: Brute Force
The direct idea is to try every possible starting index in haystack.
At each index i, compare the next len(needle) characters with needle.
For example:
haystack = "hello"
needle = "ll"The length of needle is 2.
We check:
| Start index | Substring | Match |
|---|---|---|
0 | "he" | No |
1 | "el" | No |
2 | "ll" | Yes |
So we return 2.
A simple Python version can use slicing:
class Solution:
def strStr(self, haystack: str, needle: str) -> int:
n = len(haystack)
m = len(needle)
for i in range(n - m + 1):
if haystack[i:i + m] == needle:
return i
return -1This is accepted and easy to read.
Problem With Brute Force
The slicing version creates a substring at each possible starting position.
That is fine for this problem size, but it hides the actual character comparison.
To understand string matching better, we can write the comparison manually.
This keeps the same algorithmic idea but makes the mechanics explicit.
Key Insight
needle can only start at positions where enough characters remain in haystack.
If:
n = len(haystack)
m = len(needle)then the last possible starting index is:
n - mSo the valid starting positions are:
0, 1, 2, ..., n - mAt each start position, compare characters one by one.
If all m characters match, we found the first occurrence.
Since we scan from left to right, the first match we find is the answer.
Algorithm
Let n be the length of haystack.
Let m be the length of needle.
If m > n, return -1.
Then for every possible start index i from 0 to n - m:
- Set
j = 0. - While
j < mandhaystack[i + j] == needle[j], increasej. - If
j == m, all characters matched, so returni. - Continue to the next start index.
If no start index matches, return -1.
Correctness
The algorithm checks every position where needle could fully fit inside haystack.
For a fixed start index i, the inner loop compares:
haystack[i + 0] with needle[0]
haystack[i + 1] with needle[1]
haystack[i + 2] with needle[2]
...If all m comparisons match, then the substring of haystack starting at i is exactly equal to needle. Returning i is correct.
If a mismatch occurs, then needle cannot start at that index, so the algorithm moves to the next possible index.
Because the outer loop scans indices from left to right, the first successful match is the smallest valid index. That is exactly what the problem asks for.
If the loop finishes without finding a match, then no possible starting position works, so returning -1 is correct.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n * m) | For each start index, we may compare up to m characters |
| Space | O(1) | We only use integer variables |
Here, n = len(haystack) and m = len(needle).
More precisely, there are n - m + 1 possible starting positions, and each can require up to m comparisons.
Implementation
class Solution:
def strStr(self, haystack: str, needle: str) -> int:
n = len(haystack)
m = len(needle)
if m > n:
return -1
for i in range(n - m + 1):
j = 0
while j < m and haystack[i + j] == needle[j]:
j += 1
if j == m:
return i
return -1Code Explanation
We first store both lengths:
n = len(haystack)
m = len(needle)If needle is longer than haystack, it cannot appear as a substring:
if m > n:
return -1Then we try every valid start position:
for i in range(n - m + 1):The + 1 matters because range stops before its ending value.
For each start position, j counts how many characters have matched:
j = 0Then we compare characters while they match:
while j < m and haystack[i + j] == needle[j]:
j += 1If j == m, then every character in needle matched:
if j == m:
return iIf no start position works, return -1.
Testing
def check(haystack: str, needle: str, expected: int) -> None:
actual = Solution().strStr(haystack, needle)
assert actual == expected, (haystack, needle, actual, expected)
def run_tests():
check("sadbutsad", "sad", 0)
check("leetcode", "leeto", -1)
check("hello", "ll", 2)
check("aaaaa", "bba", -1)
check("a", "a", 0)
check("abc", "c", 2)
check("mississippi", "issip", 4)
check("aaa", "aaaa", -1)
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
"sadbutsad", "sad" | Match appears more than once, return first |
"leetcode", "leeto" | No match |
"hello", "ll" | Match in the middle |
"aaaaa", "bba" | Repeated characters but no match |
"a", "a" | Minimum equal strings |
"abc", "c" | Match at the last valid start |
"mississippi", "issip" | Partial repeated matches |
"aaa", "aaaa" | Needle longer than haystack |