LeetCode 28: Find the Index of the First Occurrence in a String

Problem Restatement

We are given two strings: haystack and needle.

We need to return the index where needle first appears inside haystack.

If needle does not appear inside haystack, return -1.

For this current version of the problem, both strings have length at least 1, and both contain only lowercase English letters.

Input and Output

Item	Meaning
Input	Two strings: `haystack` and `needle`
Output	The first index where `needle` appears in `haystack`
Not found	Return `-1`
Constraints	`1 <= haystack.length, needle.length <= 10^4`

Function shape:

def strStr(haystack: str, needle: str) -> int:
    ...

Examples

Example 1:

haystack = "sadbutsad"
needle = "sad"

The string "sad" appears at index 0 and index 6.

The first occurrence is index 0.

Return:

Example 2:

haystack = "leetcode"
needle = "leeto"

The string "leeto" does not appear inside "leetcode".

Return:

-1

Example 3:

haystack = "hello"
needle = "ll"

The substring "ll" starts at index 2.

Return:

First Thought: Brute Force

The direct idea is to try every possible starting index in haystack.

At each index i, compare the next len(needle) characters with needle.

For example:

haystack = "hello"
needle = "ll"

The length of needle is 2.

We check:

Start index	Substring	Match
`0`	`"he"`	No
`1`	`"el"`	No
`2`	`"ll"`	Yes

So we return 2.

A simple Python version can use slicing:

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        n = len(haystack)
        m = len(needle)

        for i in range(n - m + 1):
            if haystack[i:i + m] == needle:
                return i

        return -1

This is accepted and easy to read.

Problem With Brute Force

The slicing version creates a substring at each possible starting position.

That is fine for this problem size, but it hides the actual character comparison.

To understand string matching better, we can write the comparison manually.

This keeps the same algorithmic idea but makes the mechanics explicit.

Key Insight

needle can only start at positions where enough characters remain in haystack.

If:

n = len(haystack)
m = len(needle)

then the last possible starting index is:

n - m

So the valid starting positions are:

0, 1, 2, ..., n - m

At each start position, compare characters one by one.

If all m characters match, we found the first occurrence.

Since we scan from left to right, the first match we find is the answer.

Algorithm

Let n be the length of haystack.

Let m be the length of needle.

If m > n, return -1.

Then for every possible start index i from 0 to n - m:

Set j = 0.
While j < m and haystack[i + j] == needle[j], increase j.
If j == m, all characters matched, so return i.
Continue to the next start index.

If no start index matches, return -1.

Correctness

The algorithm checks every position where needle could fully fit inside haystack.

For a fixed start index i, the inner loop compares:

haystack[i + 0] with needle[0]
haystack[i + 1] with needle[1]
haystack[i + 2] with needle[2]
...

If all m comparisons match, then the substring of haystack starting at i is exactly equal to needle. Returning i is correct.

If a mismatch occurs, then needle cannot start at that index, so the algorithm moves to the next possible index.

Because the outer loop scans indices from left to right, the first successful match is the smallest valid index. That is exactly what the problem asks for.

If the loop finishes without finding a match, then no possible starting position works, so returning -1 is correct.

Complexity

Metric	Value	Why
Time	`O(n * m)`	For each start index, we may compare up to `m` characters
Space	`O(1)`	We only use integer variables

Here, n = len(haystack) and m = len(needle).

More precisely, there are n - m + 1 possible starting positions, and each can require up to m comparisons.

Implementation

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        n = len(haystack)
        m = len(needle)

        if m > n:
            return -1

        for i in range(n - m + 1):
            j = 0

            while j < m and haystack[i + j] == needle[j]:
                j += 1

            if j == m:
                return i

        return -1

Code Explanation

We first store both lengths:

n = len(haystack)
m = len(needle)

If needle is longer than haystack, it cannot appear as a substring:

if m > n:
    return -1

Then we try every valid start position:

for i in range(n - m + 1):

The + 1 matters because range stops before its ending value.

For each start position, j counts how many characters have matched:

j = 0

Then we compare characters while they match:

while j < m and haystack[i + j] == needle[j]:
    j += 1

If j == m, then every character in needle matched:

if j == m:
    return i

If no start position works, return -1.

Testing

def check(haystack: str, needle: str, expected: int) -> None:
    actual = Solution().strStr(haystack, needle)
    assert actual == expected, (haystack, needle, actual, expected)

def run_tests():
    check("sadbutsad", "sad", 0)
    check("leetcode", "leeto", -1)
    check("hello", "ll", 2)
    check("aaaaa", "bba", -1)
    check("a", "a", 0)
    check("abc", "c", 2)
    check("mississippi", "issip", 4)
    check("aaa", "aaaa", -1)

    print("all tests passed")

run_tests()

Test meaning:

Test	Why
`"sadbutsad"`, `"sad"`	Match appears more than once, return first
`"leetcode"`, `"leeto"`	No match
`"hello"`, `"ll"`	Match in the middle
`"aaaaa"`, `"bba"`	Repeated characters but no match
`"a"`, `"a"`	Minimum equal strings
`"abc"`, `"c"`	Match at the last valid start
`"mississippi"`, `"issip"`	Partial repeated matches
`"aaa"`, `"aaaa"`	Needle longer than haystack