Skip to content

LeetCode 635: Design Log Storage System

A design solution for storing timestamped logs and retrieving IDs by inclusive time range at a chosen granularity.

Problem Restatement

We need to design a log storage system.

Each log has:

FieldMeaning
idUnique log ID
timestampTime string in Year:Month:Day:Hour:Minute:Second format

The timestamp format looks like this:

"2017:01:01:23:59:59"

All parts are zero-padded.

The system needs two operations:

OperationMeaning
put(id, timestamp)Store a log
retrieve(start, end, granularity)Return IDs whose timestamps are inside the inclusive range

The granularity tells us how much of the timestamp to compare.

For example, if the granularity is "Day", then we only compare:

"YYYY:MM:DD"

and ignore hour, minute, and second.

Input and Output

Class shape:

class LogSystem:

    def __init__(self):
        ...

    def put(self, id: int, timestamp: str) -> None:
        ...

    def retrieve(self, start: str, end: str, granularity: str) -> list[int]:
        ...

Granularity values:

GranularityCompared Prefix
"Year"YYYY
"Month"YYYY:MM
"Day"YYYY:MM:DD
"Hour"YYYY:MM:DD:HH
"Minute"YYYY:MM:DD:HH:MM
"Second"Full timestamp

Example

Operations:

log = LogSystem()
log.put(1, "2017:01:01:23:59:59")
log.put(2, "2017:01:01:22:59:59")
log.put(3, "2016:01:01:00:00:00")

log.retrieve(
    "2016:01:01:01:01:01",
    "2017:01:01:23:00:00",
    "Year",
)

With granularity "Year", we compare only the year.

Log IDTimestampCompared Value
12017:01:01:23:59:592017
22017:01:01:22:59:592017
32016:01:01:00:00:002016

The range is from 2016 to 2017, inclusive.

So the result can include:

[1, 2, 3]

Another query:

log.retrieve(
    "2016:01:01:01:01:01",
    "2017:01:01:23:00:00",
    "Hour",
)

Now we compare up to the hour:

Log IDCompared Value
12017:01:01:23
22017:01:01:22
32016:01:01:00

The end compared value is:

"2017:01:01:23"

So IDs 1 and 2 are included.

ID 3 is before the start hour.

First Thought: Parse Timestamps Into Dates

One possible approach is to parse every timestamp into numeric fields:

year, month, day, hour, minute, second

Then for each query, compare only the fields needed by the granularity.

This works, but it is more code than needed.

The timestamp strings are already formatted in a useful way.

Because every field is zero-padded and arranged from largest unit to smallest unit, lexicographic string comparison matches chronological order.

For example:

"2017:01" < "2017:02"
"2016:12" < "2017:01"
"2017:01:01:22" < "2017:01:01:23"

So we can compare timestamp prefixes as strings.

Key Insight

Each granularity corresponds to a prefix length.

GranularityPrefix LengthExample Prefix
"Year"42017
"Month"72017:01
"Day"102017:01:01
"Hour"132017:01:01:23
"Minute"162017:01:01:23:59
"Second"192017:01:01:23:59:59

For a query, slice the start timestamp, end timestamp, and each stored timestamp to the same prefix length.

Then check:

start_prefix <= timestamp_prefix <= end_prefix

The range is inclusive.

Algorithm

Use a list to store all logs:

(id, timestamp)

For put:

  1. Append (id, timestamp) to the list.

For retrieve:

  1. Find the prefix length for the requested granularity.
  2. Slice start and end to that length.
  3. Scan all stored logs.
  4. Slice each log timestamp to the same length.
  5. If it lies between start and end, include its ID.
  6. Return all matching IDs.

Implementation

class LogSystem:

    def __init__(self):
        self.logs = []
        self.length = {
            "Year": 4,
            "Month": 7,
            "Day": 10,
            "Hour": 13,
            "Minute": 16,
            "Second": 19,
        }

    def put(self, id: int, timestamp: str) -> None:
        self.logs.append((id, timestamp))

    def retrieve(self, start: str, end: str, granularity: str) -> list[int]:
        size = self.length[granularity]

        start_key = start[:size]
        end_key = end[:size]

        answer = []

        for log_id, timestamp in self.logs:
            key = timestamp[:size]

            if start_key <= key <= end_key:
                answer.append(log_id)

        return answer

Code Explanation

The constructor stores all logs in a list:

self.logs = []

It also stores the prefix length for each granularity:

self.length = {
    "Year": 4,
    "Month": 7,
    "Day": 10,
    "Hour": 13,
    "Minute": 16,
    "Second": 19,
}

The put operation is simple:

self.logs.append((id, timestamp))

For retrieval, we first decide how much of the timestamp matters:

size = self.length[granularity]

Then we truncate the query range:

start_key = start[:size]
end_key = end[:size]

For each stored log, we compare the same prefix:

key = timestamp[:size]

If it is inside the inclusive range, we keep the ID:

if start_key <= key <= end_key:
    answer.append(log_id)

Correctness

The timestamp format orders fields from largest to smallest: year, month, day, hour, minute, second. Each field is zero-padded, so lexicographic order is the same as chronological order for equal-length timestamp prefixes.

For a given granularity, the algorithm slices all timestamps to the exact prefix that represents that granularity. Therefore, two timestamps are compared only by the fields required by the query.

A log is returned exactly when its truncated timestamp is greater than or equal to the truncated start timestamp and less than or equal to the truncated end timestamp. This matches the problem’s inclusive range definition.

The put method stores every log without modification, so every inserted log is considered by future retrievals.

Therefore, retrieve returns exactly the IDs whose timestamps fall inside the requested range at the requested granularity.

Complexity

Let n be the number of stored logs.

OperationTimeSpace
LogSystem()O(1)O(1)
putO(1)O(1) additional
retrieveO(n)O(m)

Here, m is the number of matching log IDs returned.

The timestamp prefix length is at most 19, so slicing and comparing are constant-time for this problem.

Alternative Solution: Store Sorted Timestamps

We can also store logs in sorted order by timestamp and use binary search.

That can make retrieval faster when there are many logs.

However, for this problem, the simple scan is usually enough because the timestamp length is tiny and the design is focused on correct granularity handling.

A sorted approach would need to normalize the start and end timestamps depending on the granularity, then search by timestamp range.

The list scan solution is shorter and less error-prone.

Testing

def run_tests():
    log = LogSystem()

    log.put(1, "2017:01:01:23:59:59")
    log.put(2, "2017:01:01:22:59:59")
    log.put(3, "2016:01:01:00:00:00")

    assert sorted(log.retrieve(
        "2016:01:01:01:01:01",
        "2017:01:01:23:00:00",
        "Year",
    )) == [1, 2, 3]

    assert sorted(log.retrieve(
        "2016:01:01:01:01:01",
        "2017:01:01:23:00:00",
        "Hour",
    )) == [1, 2]

    assert log.retrieve(
        "2017:01:01:23:59:59",
        "2017:01:01:23:59:59",
        "Second",
    ) == [1]

    assert sorted(log.retrieve(
        "2017:01:01:00:00:00",
        "2017:01:01:23:59:59",
        "Day",
    )) == [1, 2]

    print("all tests passed")

run_tests()

Test meaning:

TestWhy
"Year" queryCoarse granularity includes 2016 and 2017
"Hour" queryMore precise comparison excludes earlier 2016 log
"Second" queryExact timestamp match
"Day" queryIgnores hour, minute, and second