# LeetCode 705: Design HashSet

## Problem Restatement

We need to design a class called `MyHashSet`.

The class should support three operations:

| Operation | Meaning |
|---|---|
| `add(key)` | Insert `key` into the hash set |
| `remove(key)` | Remove `key` from the hash set if it exists |
| `contains(key)` | Return whether `key` exists in the hash set |

We cannot use built-in hash table libraries.

The value range is bounded:

```python
0 <= key <= 1_000_000
```

At most `10,000` calls will be made to `add`, `remove`, and `contains`.

## Input and Output

| Item | Meaning |
|---|---|
| Constructor | Creates an empty hash set |
| `add(key)` | Returns nothing |
| `remove(key)` | Returns nothing |
| `contains(key)` | Returns `True` or `False` |
| Duplicate add | Does not create multiple copies |
| Missing remove | Does nothing |

The class shape is:

```python
class MyHashSet:

    def __init__(self):
        ...

    def add(self, key: int) -> None:
        ...

    def remove(self, key: int) -> None:
        ...

    def contains(self, key: int) -> bool:
        ...
```

## Examples

Example:

```python
myHashSet = MyHashSet()
myHashSet.add(1)
myHashSet.add(2)
myHashSet.contains(1)
myHashSet.contains(3)
myHashSet.add(2)
myHashSet.contains(2)
myHashSet.remove(2)
myHashSet.contains(2)
```

The returned values are:

```python
[None, None, True, False, None, True, None, False]
```

Step by step:

| Operation | Set After Operation | Return |
|---|---|---|
| `add(1)` | `{1}` | `None` |
| `add(2)` | `{1, 2}` | `None` |
| `contains(1)` | `{1, 2}` | `True` |
| `contains(3)` | `{1, 2}` | `False` |
| `add(2)` | `{1, 2}` | `None` |
| `contains(2)` | `{1, 2}` | `True` |
| `remove(2)` | `{1}` | `None` |
| `contains(2)` | `{1}` | `False` |

## First Thought: Direct Addressing

Because every key is between `0` and `1,000,000`, we can create a boolean array of size `1,000,001`.

Each index represents one possible key.

If key `x` exists in the set, store:

```python
data[x] = True
```

If key `x` does not exist, store:

```python
data[x] = False
```

Then all three operations are simple.

```python
add(key):      data[key] = True
remove(key):   data[key] = False
contains(key): return data[key]
```

This is the simplest valid solution for the given constraints.

## Problem With Direct Addressing

Direct addressing uses memory for every possible key, even if we only store a few keys.

The key range has `1,000,001` possible values.

So the space cost is:

```python
O(1_000_001)
```

For this LeetCode problem, that is acceptable.

For a real hash set, this is usually wasteful. A real hash set should use a smaller array of buckets and a hash function.

We will implement the bucket-based version because it better matches the idea of a hash set.

## Key Insight

A hash set maps a key to a bucket.

For an integer key, a simple hash function is:

```python
bucket_index = key % bucket_count
```

Each bucket stores all keys that map to that index.

For example, if:

```python
bucket_count = 1009
```

then these keys go to the same bucket:

```python
1
1010
2019
```

because they all have the same remainder modulo `1009`.

This is called a collision.

To handle collisions, each bucket can be a list.

When we add, remove, or check a key, we only scan the bucket where that key belongs.

## Algorithm

Use an array of buckets.

Each bucket is a list of keys.

During initialization:

1. Choose a bucket count.
2. Create that many empty lists.

For `add(key)`:

1. Compute the bucket index.
2. Check whether the key is already in that bucket.
3. If not, append it.

For `remove(key)`:

1. Compute the bucket index.
2. Search the bucket.
3. If the key exists, remove it.

For `contains(key)`:

1. Compute the bucket index.
2. Return whether the key exists in that bucket.

## Correctness

Each key is always stored in exactly one possible bucket: the bucket given by `key % bucket_count`.

The `add` method computes this bucket and only inserts the key if it is not already present. Therefore the structure stores unique keys.

The `remove` method computes the same bucket for the key. If the key exists, it must be in that bucket, so removing it from that bucket removes it from the whole set. If it is not in that bucket, then it is not in the set.

The `contains` method also computes the same bucket. Since every inserted key is placed in its hash bucket, checking that bucket is sufficient. If the key is found there, it is in the set. If it is not found there, it has never been inserted or has already been removed.

Therefore all three operations behave exactly like a hash set.

## Complexity

Let `b` be the number of buckets and `m` be the number of keys stored in one bucket.

| Operation | Average Time | Worst-case Time | Why |
|---|---|---|---|
| `add` | `O(1)` | `O(m)` | Scan one bucket before inserting |
| `remove` | `O(1)` | `O(m)` | Scan one bucket before removing |
| `contains` | `O(1)` | `O(m)` | Scan one bucket |
| Space | `O(b + n)` | `O(b + n)` | Buckets plus stored keys |

With a reasonable bucket count and at most `10,000` operations, the buckets stay small in normal cases.

## Implementation

```python
class MyHashSet:

    def __init__(self):
        self.size = 1009
        self.buckets = [[] for _ in range(self.size)]

    def _hash(self, key: int) -> int:
        return key % self.size

    def add(self, key: int) -> None:
        index = self._hash(key)
        bucket = self.buckets[index]

        if key not in bucket:
            bucket.append(key)

    def remove(self, key: int) -> None:
        index = self._hash(key)
        bucket = self.buckets[index]

        if key in bucket:
            bucket.remove(key)

    def contains(self, key: int) -> bool:
        index = self._hash(key)
        bucket = self.buckets[index]

        return key in bucket
```

## Code Explanation

The constructor creates the bucket array:

```python
self.size = 1009
self.buckets = [[] for _ in range(self.size)]
```

`1009` is a prime number. A prime bucket count is often used with modulo hashing because it can reduce some regular collision patterns.

The helper method maps a key to a bucket index:

```python
def _hash(self, key: int) -> int:
    return key % self.size
```

The `add` method gets the correct bucket:

```python
index = self._hash(key)
bucket = self.buckets[index]
```

Then it inserts only if the key is not already present:

```python
if key not in bucket:
    bucket.append(key)
```

The `remove` method also finds the correct bucket:

```python
index = self._hash(key)
bucket = self.buckets[index]
```

Then it removes the key only if it exists:

```python
if key in bucket:
    bucket.remove(key)
```

The `contains` method checks membership in the correct bucket:

```python
return key in bucket
```

## Alternative: Direct Addressing

Because the key range is small enough, we can also solve the problem with a boolean array.

```python
class MyHashSet:

    def __init__(self):
        self.data = [False] * 1_000_001

    def add(self, key: int) -> None:
        self.data[key] = True

    def remove(self, key: int) -> None:
        self.data[key] = False

    def contains(self, key: int) -> bool:
        return self.data[key]
```

This version is simpler and gives guaranteed `O(1)` time for every operation.

Its tradeoff is memory. It allocates space for all possible keys, even if only a few keys are used.

## Direct Addressing Complexity

| Operation | Time | Space |
|---|---|---|
| `add` | `O(1)` | `O(1_000_001)` |
| `remove` | `O(1)` | `O(1_000_001)` |
| `contains` | `O(1)` | `O(1_000_001)` |

## Testing

```python
def test_my_hash_set():
    hs = MyHashSet()

    hs.add(1)
    hs.add(2)

    assert hs.contains(1) is True
    assert hs.contains(3) is False

    hs.add(2)
    assert hs.contains(2) is True

    hs.remove(2)
    assert hs.contains(2) is False

    hs.remove(42)
    assert hs.contains(42) is False

    hs.add(1009)
    hs.add(2018)

    assert hs.contains(1009) is True
    assert hs.contains(2018) is True

    hs.remove(1009)

    assert hs.contains(1009) is False
    assert hs.contains(2018) is True

    print("all tests passed")

test_my_hash_set()
```

Test coverage:

| Test | Why |
|---|---|
| Add and contains | Confirms insertion |
| Missing key | Confirms false lookup |
| Duplicate add | Confirms set behavior |
| Remove existing key | Confirms deletion |
| Remove missing key | Confirms no error |
| Colliding keys | Confirms bucket collision handling |