A design solution for storing timestamped logs and retrieving IDs by inclusive time range at a chosen granularity.
Problem Restatement
We need to design a log storage system.
Each log has:
| Field | Meaning |
|---|---|
id | Unique log ID |
timestamp | Time string in Year:Month:Day:Hour:Minute:Second format |
The timestamp format looks like this:
"2017:01:01:23:59:59"All parts are zero-padded.
The system needs two operations:
| Operation | Meaning |
|---|---|
put(id, timestamp) | Store a log |
retrieve(start, end, granularity) | Return IDs whose timestamps are inside the inclusive range |
The granularity tells us how much of the timestamp to compare.
For example, if the granularity is "Day", then we only compare:
"YYYY:MM:DD"and ignore hour, minute, and second.
Input and Output
Class shape:
class LogSystem:
def __init__(self):
...
def put(self, id: int, timestamp: str) -> None:
...
def retrieve(self, start: str, end: str, granularity: str) -> list[int]:
...Granularity values:
| Granularity | Compared Prefix |
|---|---|
"Year" | YYYY |
"Month" | YYYY:MM |
"Day" | YYYY:MM:DD |
"Hour" | YYYY:MM:DD:HH |
"Minute" | YYYY:MM:DD:HH:MM |
"Second" | Full timestamp |
Example
Operations:
log = LogSystem()
log.put(1, "2017:01:01:23:59:59")
log.put(2, "2017:01:01:22:59:59")
log.put(3, "2016:01:01:00:00:00")
log.retrieve(
"2016:01:01:01:01:01",
"2017:01:01:23:00:00",
"Year",
)With granularity "Year", we compare only the year.
| Log ID | Timestamp | Compared Value |
|---|---|---|
| 1 | 2017:01:01:23:59:59 | 2017 |
| 2 | 2017:01:01:22:59:59 | 2017 |
| 3 | 2016:01:01:00:00:00 | 2016 |
The range is from 2016 to 2017, inclusive.
So the result can include:
[1, 2, 3]Another query:
log.retrieve(
"2016:01:01:01:01:01",
"2017:01:01:23:00:00",
"Hour",
)Now we compare up to the hour:
| Log ID | Compared Value |
|---|---|
| 1 | 2017:01:01:23 |
| 2 | 2017:01:01:22 |
| 3 | 2016:01:01:00 |
The end compared value is:
"2017:01:01:23"So IDs 1 and 2 are included.
ID 3 is before the start hour.
First Thought: Parse Timestamps Into Dates
One possible approach is to parse every timestamp into numeric fields:
year, month, day, hour, minute, secondThen for each query, compare only the fields needed by the granularity.
This works, but it is more code than needed.
The timestamp strings are already formatted in a useful way.
Because every field is zero-padded and arranged from largest unit to smallest unit, lexicographic string comparison matches chronological order.
For example:
"2017:01" < "2017:02"
"2016:12" < "2017:01"
"2017:01:01:22" < "2017:01:01:23"So we can compare timestamp prefixes as strings.
Key Insight
Each granularity corresponds to a prefix length.
| Granularity | Prefix Length | Example Prefix |
|---|---|---|
"Year" | 4 | 2017 |
"Month" | 7 | 2017:01 |
"Day" | 10 | 2017:01:01 |
"Hour" | 13 | 2017:01:01:23 |
"Minute" | 16 | 2017:01:01:23:59 |
"Second" | 19 | 2017:01:01:23:59:59 |
For a query, slice the start timestamp, end timestamp, and each stored timestamp to the same prefix length.
Then check:
start_prefix <= timestamp_prefix <= end_prefixThe range is inclusive.
Algorithm
Use a list to store all logs:
(id, timestamp)For put:
- Append
(id, timestamp)to the list.
For retrieve:
- Find the prefix length for the requested granularity.
- Slice
startandendto that length. - Scan all stored logs.
- Slice each log timestamp to the same length.
- If it lies between
startandend, include its ID. - Return all matching IDs.
Implementation
class LogSystem:
def __init__(self):
self.logs = []
self.length = {
"Year": 4,
"Month": 7,
"Day": 10,
"Hour": 13,
"Minute": 16,
"Second": 19,
}
def put(self, id: int, timestamp: str) -> None:
self.logs.append((id, timestamp))
def retrieve(self, start: str, end: str, granularity: str) -> list[int]:
size = self.length[granularity]
start_key = start[:size]
end_key = end[:size]
answer = []
for log_id, timestamp in self.logs:
key = timestamp[:size]
if start_key <= key <= end_key:
answer.append(log_id)
return answerCode Explanation
The constructor stores all logs in a list:
self.logs = []It also stores the prefix length for each granularity:
self.length = {
"Year": 4,
"Month": 7,
"Day": 10,
"Hour": 13,
"Minute": 16,
"Second": 19,
}The put operation is simple:
self.logs.append((id, timestamp))For retrieval, we first decide how much of the timestamp matters:
size = self.length[granularity]Then we truncate the query range:
start_key = start[:size]
end_key = end[:size]For each stored log, we compare the same prefix:
key = timestamp[:size]If it is inside the inclusive range, we keep the ID:
if start_key <= key <= end_key:
answer.append(log_id)Correctness
The timestamp format orders fields from largest to smallest: year, month, day, hour, minute, second. Each field is zero-padded, so lexicographic order is the same as chronological order for equal-length timestamp prefixes.
For a given granularity, the algorithm slices all timestamps to the exact prefix that represents that granularity. Therefore, two timestamps are compared only by the fields required by the query.
A log is returned exactly when its truncated timestamp is greater than or equal to the truncated start timestamp and less than or equal to the truncated end timestamp. This matches the problem’s inclusive range definition.
The put method stores every log without modification, so every inserted log is considered by future retrievals.
Therefore, retrieve returns exactly the IDs whose timestamps fall inside the requested range at the requested granularity.
Complexity
Let n be the number of stored logs.
| Operation | Time | Space |
|---|---|---|
LogSystem() | O(1) | O(1) |
put | O(1) | O(1) additional |
retrieve | O(n) | O(m) |
Here, m is the number of matching log IDs returned.
The timestamp prefix length is at most 19, so slicing and comparing are constant-time for this problem.
Alternative Solution: Store Sorted Timestamps
We can also store logs in sorted order by timestamp and use binary search.
That can make retrieval faster when there are many logs.
However, for this problem, the simple scan is usually enough because the timestamp length is tiny and the design is focused on correct granularity handling.
A sorted approach would need to normalize the start and end timestamps depending on the granularity, then search by timestamp range.
The list scan solution is shorter and less error-prone.
Testing
def run_tests():
log = LogSystem()
log.put(1, "2017:01:01:23:59:59")
log.put(2, "2017:01:01:22:59:59")
log.put(3, "2016:01:01:00:00:00")
assert sorted(log.retrieve(
"2016:01:01:01:01:01",
"2017:01:01:23:00:00",
"Year",
)) == [1, 2, 3]
assert sorted(log.retrieve(
"2016:01:01:01:01:01",
"2017:01:01:23:00:00",
"Hour",
)) == [1, 2]
assert log.retrieve(
"2017:01:01:23:59:59",
"2017:01:01:23:59:59",
"Second",
) == [1]
assert sorted(log.retrieve(
"2017:01:01:00:00:00",
"2017:01:01:23:59:59",
"Day",
)) == [1, 2]
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
"Year" query | Coarse granularity includes 2016 and 2017 |
"Hour" query | More precise comparison excludes earlier 2016 log |
"Second" query | Exact timestamp match |
"Day" query | Ignores hour, minute, and second |