A clear explanation of designing a hash map without using built-in hash table libraries.
Problem Restatement
We need to design a class called MyHashMap.
The class should support three operations:
| Operation | Meaning |
|---|---|
put(key, value) | Insert or update the value for key |
get(key) | Return the value for key, or -1 if missing |
remove(key) | Remove key and its value if it exists |
We cannot use built-in hash table libraries.
A hash map stores key-value pairs. Each key maps to one value.
If put is called with a key that already exists, we update the old value.
Input and Output
| Item | Meaning |
|---|---|
| Constructor | Creates an empty hash map |
put(key, value) | Returns nothing |
get(key) | Returns the mapped value, or -1 |
remove(key) | Returns nothing |
Existing key in put | Update its value |
Missing key in remove | Do nothing |
The class shape is:
class MyHashMap:
def __init__(self):
...
def put(self, key: int, value: int) -> None:
...
def get(self, key: int) -> int:
...
def remove(self, key: int) -> None:
...Examples
Example:
myHashMap = MyHashMap()
myHashMap.put(1, 1)
myHashMap.put(2, 2)
myHashMap.get(1)
myHashMap.get(3)
myHashMap.put(2, 1)
myHashMap.get(2)
myHashMap.remove(2)
myHashMap.get(2)The returned values are:
[None, None, None, 1, -1, None, 1, None, -1]Step by step:
| Operation | Map After Operation | Return |
|---|---|---|
put(1, 1) | {1: 1} | None |
put(2, 2) | {1: 1, 2: 2} | None |
get(1) | {1: 1, 2: 2} | 1 |
get(3) | {1: 1, 2: 2} | -1 |
put(2, 1) | {1: 1, 2: 1} | None |
get(2) | {1: 1, 2: 1} | 1 |
remove(2) | {1: 1} | None |
get(2) | {1: 1} | -1 |
First Thought: Direct Addressing
If the key range is small enough, we can use the key directly as an array index.
For example:
data[key] = valueFor missing keys, store:
-1Then:
put(key, value): data[key] = value
get(key): return data[key]
remove(key): data[key] = -1This is simple and fast.
Problem With Direct Addressing
Direct addressing allocates memory for every possible key, even if only a few keys are used.
If keys are in the range:
0 <= key <= 1_000_000then we need an array of size:
1_000_001For this problem, that is acceptable.
But a real hash map should avoid allocating space proportional to the maximum possible key. It should allocate buckets and store only keys that are actually inserted.
Key Insight
Use a bucket array.
A hash function maps each key to one bucket:
bucket_index = key % bucket_countEach bucket stores pairs:
[key, value]Different keys can map to the same bucket. This is called a collision.
For example, if:
bucket_count = 1009then keys 1, 1010, and 2019 all map to bucket 1.
To handle this, each bucket can be a list of key-value pairs.
When we operate on a key, we only scan its bucket.
Algorithm
Use an array of buckets.
Each bucket is a list of pairs.
During initialization:
- Choose a bucket count.
- Create that many empty lists.
For put(key, value):
- Compute the bucket index.
- Search the bucket for
key. - If found, update its value.
- If not found, append
[key, value].
For get(key):
- Compute the bucket index.
- Search the bucket for
key. - If found, return its value.
- Otherwise, return
-1.
For remove(key):
- Compute the bucket index.
- Search the bucket for
key. - If found, remove that pair.
- Otherwise, do nothing.
Correctness
Each key always maps to one deterministic bucket, computed by key % bucket_count.
When put inserts a new key, it places the key-value pair in that bucket. When put sees an existing key in the bucket, it updates the stored value instead of adding a duplicate pair. Therefore, each key has at most one active mapping.
When get searches for a key, it computes the same bucket. Since every inserted key is stored in its computed bucket, searching that bucket is sufficient. If the key is found, the method returns its latest value. If the key is missing from that bucket, then the key has no mapping in the map.
When remove searches for a key, it also computes the same bucket. If the key is found, removing its pair deletes the mapping. If the key is not found, no mapping exists, so doing nothing is correct.
Therefore, the data structure satisfies the required hash map behavior.
Complexity
Let b be the number of buckets and m be the number of pairs in one bucket.
| Operation | Average Time | Worst-case Time | Why |
|---|---|---|---|
put | O(1) | O(m) | Search one bucket, then update or append |
get | O(1) | O(m) | Search one bucket |
remove | O(1) | O(m) | Search one bucket, then remove |
| Space | O(b + n) | O(b + n) | Bucket array plus inserted pairs |
With a reasonable bucket count, most buckets stay small.
Implementation
class MyHashMap:
def __init__(self):
self.size = 1009
self.buckets = [[] for _ in range(self.size)]
def _hash(self, key: int) -> int:
return key % self.size
def put(self, key: int, value: int) -> None:
index = self._hash(key)
bucket = self.buckets[index]
for pair in bucket:
if pair[0] == key:
pair[1] = value
return
bucket.append([key, value])
def get(self, key: int) -> int:
index = self._hash(key)
bucket = self.buckets[index]
for stored_key, stored_value in bucket:
if stored_key == key:
return stored_value
return -1
def remove(self, key: int) -> None:
index = self._hash(key)
bucket = self.buckets[index]
for i, pair in enumerate(bucket):
if pair[0] == key:
bucket.pop(i)
returnCode Explanation
The constructor creates a fixed number of buckets:
self.size = 1009
self.buckets = [[] for _ in range(self.size)]The hash helper maps a key to a bucket index:
def _hash(self, key: int) -> int:
return key % self.sizeThe put method first finds the correct bucket:
index = self._hash(key)
bucket = self.buckets[index]Then it scans that bucket.
If the key already exists, update the value:
for pair in bucket:
if pair[0] == key:
pair[1] = value
returnIf the key does not exist, append a new pair:
bucket.append([key, value])The get method searches the same bucket:
for stored_key, stored_value in bucket:
if stored_key == key:
return stored_valueIf no pair matches, return:
-1The remove method finds the matching pair and deletes it:
for i, pair in enumerate(bucket):
if pair[0] == key:
bucket.pop(i)
returnReturning immediately after removal avoids modifying the list while continuing to iterate.
Alternative: Direct Addressing
Because the key range is bounded, direct addressing is also valid.
class MyHashMap:
def __init__(self):
self.data = [-1] * 1_000_001
def put(self, key: int, value: int) -> None:
self.data[key] = value
def get(self, key: int) -> int:
return self.data[key]
def remove(self, key: int) -> None:
self.data[key] = -1This version is shorter and gives guaranteed constant-time operations.
Its tradeoff is memory. It always allocates space for all possible keys.
Direct Addressing Complexity
| Operation | Time | Space |
|---|---|---|
put | O(1) | O(1_000_001) |
get | O(1) | O(1_000_001) |
remove | O(1) | O(1_000_001) |
Testing
def test_my_hash_map():
hm = MyHashMap()
hm.put(1, 1)
hm.put(2, 2)
assert hm.get(1) == 1
assert hm.get(3) == -1
hm.put(2, 1)
assert hm.get(2) == 1
hm.remove(2)
assert hm.get(2) == -1
hm.remove(42)
assert hm.get(42) == -1
hm.put(1009, 10)
hm.put(2018, 20)
assert hm.get(1009) == 10
assert hm.get(2018) == 20
hm.put(1009, 99)
assert hm.get(1009) == 99
hm.remove(1009)
assert hm.get(1009) == -1
assert hm.get(2018) == 20
print("all tests passed")
test_my_hash_map()Test coverage:
| Test | Why |
|---|---|
| Insert and get | Confirms basic mapping |
| Missing key | Confirms -1 result |
| Update existing key | Confirms overwrite behavior |
| Remove existing key | Confirms deletion |
| Remove missing key | Confirms no error |
| Colliding keys | Confirms bucket collision handling |
| Update colliding key | Confirms correct pair is updated |