# LeetCode 28: Find the Index of the First Occurrence in a String

## Problem Restatement

We are given two strings: `haystack` and `needle`.

We need to return the index where `needle` first appears inside `haystack`.

If `needle` does not appear inside `haystack`, return `-1`.

For this current version of the problem, both strings have length at least `1`, and both contain only lowercase English letters.

## Input and Output

| Item | Meaning |
|---|---|
| Input | Two strings: `haystack` and `needle` |
| Output | The first index where `needle` appears in `haystack` |
| Not found | Return `-1` |
| Constraints | `1 <= haystack.length, needle.length <= 10^4` |

Function shape:

```python
def strStr(haystack: str, needle: str) -> int:
    ...
```

## Examples

Example 1:

```python
haystack = "sadbutsad"
needle = "sad"
```

The string `"sad"` appears at index `0` and index `6`.

The first occurrence is index `0`.

Return:

```python
0
```

Example 2:

```python
haystack = "leetcode"
needle = "leeto"
```

The string `"leeto"` does not appear inside `"leetcode"`.

Return:

```python
-1
```

Example 3:

```python
haystack = "hello"
needle = "ll"
```

The substring `"ll"` starts at index `2`.

Return:

```python
2
```

## First Thought: Brute Force

The direct idea is to try every possible starting index in `haystack`.

At each index `i`, compare the next `len(needle)` characters with `needle`.

For example:

```python
haystack = "hello"
needle = "ll"
```

The length of `needle` is `2`.

We check:

| Start index | Substring | Match |
|---|---|---|
| `0` | `"he"` | No |
| `1` | `"el"` | No |
| `2` | `"ll"` | Yes |

So we return `2`.

A simple Python version can use slicing:

```python
class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        n = len(haystack)
        m = len(needle)

        for i in range(n - m + 1):
            if haystack[i:i + m] == needle:
                return i

        return -1
```

This is accepted and easy to read.

## Problem With Brute Force

The slicing version creates a substring at each possible starting position.

That is fine for this problem size, but it hides the actual character comparison.

To understand string matching better, we can write the comparison manually.

This keeps the same algorithmic idea but makes the mechanics explicit.

## Key Insight

`needle` can only start at positions where enough characters remain in `haystack`.

If:

```python
n = len(haystack)
m = len(needle)
```

then the last possible starting index is:

```python
n - m
```

So the valid starting positions are:

```python
0, 1, 2, ..., n - m
```

At each start position, compare characters one by one.

If all `m` characters match, we found the first occurrence.

Since we scan from left to right, the first match we find is the answer.

## Algorithm

Let `n` be the length of `haystack`.

Let `m` be the length of `needle`.

If `m > n`, return `-1`.

Then for every possible start index `i` from `0` to `n - m`:

1. Set `j = 0`.
2. While `j < m` and `haystack[i + j] == needle[j]`, increase `j`.
3. If `j == m`, all characters matched, so return `i`.
4. Continue to the next start index.

If no start index matches, return `-1`.

## Correctness

The algorithm checks every position where `needle` could fully fit inside `haystack`.

For a fixed start index `i`, the inner loop compares:

```python
haystack[i + 0] with needle[0]
haystack[i + 1] with needle[1]
haystack[i + 2] with needle[2]
...
```

If all `m` comparisons match, then the substring of `haystack` starting at `i` is exactly equal to `needle`. Returning `i` is correct.

If a mismatch occurs, then `needle` cannot start at that index, so the algorithm moves to the next possible index.

Because the outer loop scans indices from left to right, the first successful match is the smallest valid index. That is exactly what the problem asks for.

If the loop finishes without finding a match, then no possible starting position works, so returning `-1` is correct.

## Complexity

| Metric | Value | Why |
|---|---|---|
| Time | `O(n * m)` | For each start index, we may compare up to `m` characters |
| Space | `O(1)` | We only use integer variables |

Here, `n = len(haystack)` and `m = len(needle)`.

More precisely, there are `n - m + 1` possible starting positions, and each can require up to `m` comparisons.

## Implementation

```python
class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        n = len(haystack)
        m = len(needle)

        if m > n:
            return -1

        for i in range(n - m + 1):
            j = 0

            while j < m and haystack[i + j] == needle[j]:
                j += 1

            if j == m:
                return i

        return -1
```

## Code Explanation

We first store both lengths:

```python
n = len(haystack)
m = len(needle)
```

If `needle` is longer than `haystack`, it cannot appear as a substring:

```python
if m > n:
    return -1
```

Then we try every valid start position:

```python
for i in range(n - m + 1):
```

The `+ 1` matters because `range` stops before its ending value.

For each start position, `j` counts how many characters have matched:

```python
j = 0
```

Then we compare characters while they match:

```python
while j < m and haystack[i + j] == needle[j]:
    j += 1
```

If `j == m`, then every character in `needle` matched:

```python
if j == m:
    return i
```

If no start position works, return `-1`.

## Testing

```python
def check(haystack: str, needle: str, expected: int) -> None:
    actual = Solution().strStr(haystack, needle)
    assert actual == expected, (haystack, needle, actual, expected)

def run_tests():
    check("sadbutsad", "sad", 0)
    check("leetcode", "leeto", -1)
    check("hello", "ll", 2)
    check("aaaaa", "bba", -1)
    check("a", "a", 0)
    check("abc", "c", 2)
    check("mississippi", "issip", 4)
    check("aaa", "aaaa", -1)

    print("all tests passed")

run_tests()
```

Test meaning:

| Test | Why |
|---|---|
| `"sadbutsad"`, `"sad"` | Match appears more than once, return first |
| `"leetcode"`, `"leeto"` | No match |
| `"hello"`, `"ll"` | Match in the middle |
| `"aaaaa"`, `"bba"` | Repeated characters but no match |
| `"a"`, `"a"` | Minimum equal strings |
| `"abc"`, `"c"` | Match at the last valid start |
| `"mississippi"`, `"issip"` | Partial repeated matches |
| `"aaa"`, `"aaaa"` | Needle longer than haystack |