A clear explanation of designing a time-based key-value store using a hash map and binary search.
Problem Restatement
We need to design a data structure called TimeMap.
It supports two operations:
| Operation | Meaning |
|---|---|
set(key, value, timestamp) | Store value for key at timestamp |
get(key, timestamp) | Return the value for key with the largest stored timestamp <= timestamp |
If no stored timestamp is less than or equal to the query timestamp, return an empty string.
The important guarantee is that, for each key, calls to set happen with strictly increasing timestamps. This means each key’s history is already sorted by timestamp. The official operations are TimeMap(), set, and get; get must return the value associated with the largest previous timestamp not greater than the query timestamp.
Input and Output
| Method | Input | Output |
|---|---|---|
TimeMap() | Nothing | Initializes the object |
set | key, value, timestamp | Nothing |
get | key, timestamp | A string value or "" |
Class shape:
class TimeMap:
def __init__(self):
...
def set(self, key: str, value: str, timestamp: int) -> None:
...
def get(self, key: str, timestamp: int) -> str:
...Examples
Example:
timeMap = TimeMap()
timeMap.set("foo", "bar", 1)
timeMap.get("foo", 1)
timeMap.get("foo", 3)
timeMap.set("foo", "bar2", 4)
timeMap.get("foo", 4)
timeMap.get("foo", 5)The outputs are:
"bar"
"bar"
"bar2"
"bar2"Why?
After this call:
timeMap.set("foo", "bar", 1)the history for "foo" is:
[(1, "bar")]So:
timeMap.get("foo", 1)returns "bar" because timestamp 1 exists exactly.
And:
timeMap.get("foo", 3)also returns "bar" because timestamp 1 is the largest stored timestamp less than or equal to 3.
After:
timeMap.set("foo", "bar2", 4)the history becomes:
[(1, "bar"), (4, "bar2")]So:
timeMap.get("foo", 4)
timeMap.get("foo", 5)both return "bar2".
First Thought: Store and Scan
For each key, we can store all (timestamp, value) pairs in a list.
Then for get, scan the list and keep the latest timestamp that is less than or equal to the query timestamp.
class TimeMap:
def __init__(self):
self.store = {}
def set(self, key: str, value: str, timestamp: int) -> None:
if key not in self.store:
self.store[key] = []
self.store[key].append((timestamp, value))
def get(self, key: str, timestamp: int) -> str:
if key not in self.store:
return ""
answer = ""
for saved_timestamp, saved_value in self.store[key]:
if saved_timestamp <= timestamp:
answer = saved_value
else:
break
return answerThis works because the timestamps for each key are stored in increasing order.
Problem With Linear Scan
The set operation is efficient.
But get may scan every stored value for a key.
If a key has many historical values, a single get can cost:
O(k)where k is the number of values stored for that key.
Since the timestamps are sorted, we should use binary search.
Key Insight
For each key, its stored timestamps form a sorted list.
For a query timestamp t, we need the rightmost stored timestamp that is:
saved_timestamp <= tThis is a classic binary search condition.
We can store:
key -> list of (timestamp, value)Then for get(key, timestamp), binary search that key’s list.
Algorithm
For set:
- If the key is new, create an empty list.
- Append
(timestamp, value)to the list.
For get:
- If the key does not exist, return
"". - Binary search the list for the largest timestamp less than or equal to the query timestamp.
- If such timestamp exists, return its value.
- Otherwise, return
"".
Correctness
For each key, set appends values in strictly increasing timestamp order. Therefore, the list for each key is sorted by timestamp.
For a query get(key, timestamp), the required result is the value whose stored timestamp is the largest timestamp less than or equal to the query timestamp.
The binary search maintains the best valid index found so far. Whenever it sees a stored timestamp less than or equal to the query timestamp, that entry is valid, so it records the index and searches to the right for a larger valid timestamp. Whenever it sees a stored timestamp greater than the query timestamp, that entry is invalid, so it searches to the left.
When the search ends, the recorded index is exactly the rightmost valid timestamp. If no valid index was recorded, no stored timestamp is less than or equal to the query timestamp, so returning "" is correct.
Complexity
Let k be the number of values stored for the queried key.
| Operation | Time | Space |
|---|---|---|
TimeMap() | O(1) | O(1) |
set | O(1) | O(1) per call |
get | O(log k) | O(1) |
Total space after all set calls is O(n), where n is the total number of stored entries.
Implementation
class TimeMap:
def __init__(self):
self.store = {}
def set(self, key: str, value: str, timestamp: int) -> None:
if key not in self.store:
self.store[key] = []
self.store[key].append((timestamp, value))
def get(self, key: str, timestamp: int) -> str:
if key not in self.store:
return ""
values = self.store[key]
left = 0
right = len(values) - 1
best_index = -1
while left <= right:
mid = (left + right) // 2
saved_timestamp, saved_value = values[mid]
if saved_timestamp <= timestamp:
best_index = mid
left = mid + 1
else:
right = mid - 1
if best_index == -1:
return ""
return values[best_index][1]Code Explanation
The dictionary stores one history list per key:
self.store = {}When setting a value, we append a timestamped record:
self.store[key].append((timestamp, value))This append is safe because timestamps for each key arrive in increasing order.
For get, we first handle unknown keys:
if key not in self.store:
return ""Then we binary search the list:
left = 0
right = len(values) - 1
best_index = -1The variable best_index stores the best valid timestamp found so far.
If the middle timestamp is valid:
if saved_timestamp <= timestamp:we record it and search farther right:
best_index = mid
left = mid + 1If it is too large, we search left:
right = mid - 1After the loop, best_index == -1 means no valid timestamp exists.
Otherwise, return the value at the best index:
return values[best_index][1]Testing
def run_tests():
timeMap = TimeMap()
timeMap.set("foo", "bar", 1)
assert timeMap.get("foo", 1) == "bar"
assert timeMap.get("foo", 3) == "bar"
timeMap.set("foo", "bar2", 4)
assert timeMap.get("foo", 4) == "bar2"
assert timeMap.get("foo", 5) == "bar2"
other = TimeMap()
other.set("love", "high", 10)
other.set("love", "low", 20)
assert other.get("love", 5) == ""
assert other.get("love", 10) == "high"
assert other.get("love", 15) == "high"
assert other.get("love", 20) == "low"
assert other.get("love", 25) == "low"
assert other.get("unknown", 100) == ""
print("all tests passed")
run_tests()| Test | Expected | Why |
|---|---|---|
get("foo", 1) | "bar" | Exact timestamp match |
get("foo", 3) | "bar" | Uses closest earlier timestamp |
get("foo", 5) | "bar2" | Uses latest timestamp before query |
get("love", 5) | "" | Query is earlier than first stored timestamp |
get("unknown", 100) | "" | Key does not exist |