Facebook Pixel

635. Design Log Storage System 🔒

MediumDesignHash TableStringOrdered Set
Leetcode Link

Problem Description

You need to design a log storage system that can store logs with unique IDs and timestamps, and retrieve logs within a specific time range based on different granularity levels.

The system should support the following operations:

  1. Initialization: Create an empty log system.

  2. Put Operation: Store a log entry with a unique ID and timestamp. The timestamp follows the format Year:Month:Day:Hour:Minute:Second (e.g., 2017:01:01:23:59:59). All fields are zero-padded decimal numbers.

  3. Retrieve Operation: Find all log IDs whose timestamps fall within a given range [start, end] (inclusive) based on a specified granularity level. The granularity determines the precision of the comparison:

    • If granularity is "Year", only compare the year part
    • If granularity is "Month", compare up to the month part
    • If granularity is "Day", compare up to the day part
    • If granularity is "Hour", compare up to the hour part
    • If granularity is "Minute", compare up to the minute part
    • If granularity is "Second", compare the complete timestamp

For example, if you retrieve logs with:

  • start = "2017:01:01:23:59:59"
  • end = "2017:01:02:23:59:59"
  • granularity = "Day"

The system will return all logs from January 1st, 2017 to January 2nd, 2017, ignoring the hour, minute, and second components.

The solution uses string slicing to truncate timestamps according to the granularity level. A dictionary maps each granularity level to the appropriate string index position (Year→4, Month→7, Day→10, Hour→13, Minute→16, Second→19). When retrieving, it compares only the relevant portion of the timestamps by slicing the strings up to the index corresponding to the specified granularity.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is recognizing that the timestamp format Year:Month:Day:Hour:Minute:Second has a predictable structure where each component appears at a fixed position in the string. This makes string comparison naturally suitable for our needs.

When we need to compare timestamps at different granularity levels, we're essentially asking: "Should we consider the full timestamp or just a portion of it?" For instance, when comparing at the "Day" granularity, we only care about Year:Month:Day and can ignore everything after that.

Since the timestamp format uses zero-padding and follows a hierarchical structure (year > month > day > hour > minute > second), string comparison works correctly for determining chronological order. For example, "2017:01:01" comes before "2017:01:02" in both string comparison and actual time.

This leads us to the observation that we can simply truncate the timestamp strings at specific positions based on the granularity:

  • For "Year": take the first 4 characters ("2017")
  • For "Month": take the first 7 characters ("2017:01")
  • For "Day": take the first 10 characters ("2017:01:01")
  • And so on...

By storing these position indices in a dictionary ({"Year": 4, "Month": 7, ...}), we can easily slice the timestamp strings to the appropriate length during retrieval. Then we just need to check if the truncated log timestamp falls within the truncated range [start[:i], end[:i]].

This approach avoids the complexity of parsing timestamps into date objects or handling individual time components separately. The string slicing solution is elegant because it leverages the well-structured format of the timestamps to perform range queries with minimal code.

Solution Approach

The implementation uses a simple list to store logs and a dictionary to map granularity levels to string slice positions.

Data Structure:

  • self.logs: A list that stores tuples of (id, timestamp) for each log entry
  • self.d: A dictionary mapping granularity strings to their corresponding slice indices:
    • "Year": 4 (slice up to position 4 to get "YYYY")
    • "Month": 7 (slice up to position 7 to get "YYYY:MM")
    • "Day": 10 (slice up to position 10 to get "YYYY:MM:DD")
    • "Hour": 13 (slice up to position 13 to get "YYYY:MM:DD:HH")
    • "Minute": 16 (slice up to position 16 to get "YYYY:MM:DD:HH:MM")
    • "Second": 19 (slice up to position 19 to get the full timestamp)

Algorithm Implementation:

  1. __init__(): Initializes an empty list for logs and sets up the granularity-to-index mapping dictionary.

  2. put(id, timestamp): Simply appends the (id, timestamp) tuple to the logs list. This operation is O(1).

  3. retrieve(start, end, granularity):

    • First, get the slice index i from the dictionary using self.d[granularity]
    • Truncate the start and end strings by slicing them up to position i: start[:i] and end[:i]
    • Iterate through all logs and check if the truncated timestamp ts[:i] falls within the range [start[:i], end[:i]]
    • Use list comprehension to collect all matching log IDs: [id for id, ts in self.logs if start[:i] <= ts[:i] <= end[:i]]
    • Return the list of matching IDs

Time Complexity:

  • put(): O(1) - just appending to a list
  • retrieve(): O(n) where n is the number of logs, as we need to check each log against the range

Space Complexity:

  • O(n) where n is the number of logs stored

The beauty of this approach lies in its simplicity - by leveraging string slicing and the structured format of timestamps, we avoid complex date parsing or comparison logic while maintaining correct chronological ordering.

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to understand how the solution works.

Initial Setup:

logSystem = LogSystem()
# self.logs = []
# self.d = {"Year": 4, "Month": 7, "Day": 10, "Hour": 13, "Minute": 16, "Second": 19}

Step 1: Adding logs

logSystem.put(1, "2017:01:01:23:59:59")
logSystem.put(2, "2017:01:01:22:59:59")  
logSystem.put(3, "2017:01:02:08:30:00")
# self.logs = [(1, "2017:01:01:23:59:59"), 
#              (2, "2017:01:01:22:59:59"),
#              (3, "2017:01:02:08:30:00")]

Step 2: Retrieve with "Day" granularity

result = logSystem.retrieve("2017:01:01:20:00:00", "2017:01:01:23:59:59", "Day")

Let's trace through this retrieval:

  • Granularity is "Day", so i = self.d["Day"] = 10
  • Truncate start: "2017:01:01:20:00:00"[:10] = "2017:01:01"
  • Truncate end: "2017:01:01:23:59:59"[:10] = "2017:01:01"
  • Check each log:
    • Log 1: "2017:01:01:23:59:59"[:10] = "2017:01:01" ✓ (within range ["2017:01:01", "2017:01:01"])
    • Log 2: "2017:01:01:22:59:59"[:10] = "2017:01:01" ✓ (within range)
    • Log 3: "2017:01:02:08:30:00"[:10] = "2017:01:02" ✗ (outside range)
  • Result: [1, 2]

Step 3: Retrieve with "Hour" granularity

result = logSystem.retrieve("2017:01:01:22:00:00", "2017:01:01:23:00:00", "Hour")
  • Granularity is "Hour", so i = self.d["Hour"] = 13
  • Truncate start: "2017:01:01:22:00:00"[:13] = "2017:01:01:22"
  • Truncate end: "2017:01:01:23:00:00"[:13] = "2017:01:01:23"
  • Check each log:
    • Log 1: "2017:01:01:23:59:59"[:13] = "2017:01:01:23" ✓ (equals end)
    • Log 2: "2017:01:01:22:59:59"[:13] = "2017:01:01:22" ✓ (equals start)
    • Log 3: "2017:01:02:08:30:00"[:13] = "2017:01:02:08" ✗ (outside range)
  • Result: [1, 2]

Step 4: Retrieve with "Second" granularity

result = logSystem.retrieve("2017:01:01:22:59:59", "2017:01:01:23:00:00", "Second")
  • Granularity is "Second", so i = self.d["Second"] = 19
  • The full timestamps are compared:
  • Check each log:
    • Log 1: "2017:01:01:23:59:59" > "2017:01:01:23:00:00" ✗ (after end)
    • Log 2: "2017:01:01:22:59:59" = start ✓ (exactly at start)
    • Log 3: "2017:01:02:08:30:00" > end ✗ (after end)
  • Result: [2]

The key insight is that by truncating all timestamps (start, end, and each log's timestamp) to the same length based on granularity, we ensure fair comparison at the desired precision level.

Solution Implementation

1from typing import List
2
3class LogSystem:
4    def __init__(self):
5        """Initialize the log system with empty logs and granularity mapping."""
6        # Store logs as list of tuples (id, timestamp)
7        self.logs = []
8      
9        # Map granularity levels to their corresponding string slice positions
10        # Each value represents where to slice the timestamp string
11        self.granularity_index = {
12            "Year": 4,      # "2017" - slice at position 4
13            "Month": 7,     # "2017:01" - slice at position 7  
14            "Day": 10,      # "2017:01:01" - slice at position 10
15            "Hour": 13,     # "2017:01:01:00" - slice at position 13
16            "Minute": 16,   # "2017:01:01:00:00" - slice at position 16
17            "Second": 19,   # "2017:01:01:00:00:00" - slice at position 19
18        }
19
20    def put(self, id: int, timestamp: str) -> None:
21        """
22        Store a log entry with given id and timestamp.
23      
24        Args:
25            id: The unique identifier for the log entry
26            timestamp: The timestamp in format "Year:Month:Day:Hour:Minute:Second"
27        """
28        self.logs.append((id, timestamp))
29
30    def retrieve(self, start: str, end: str, granularity: str) -> List[int]:
31        """
32        Retrieve all log IDs within the time range at specified granularity.
33      
34        Args:
35            start: Start timestamp in format "Year:Month:Day:Hour:Minute:Second"
36            end: End timestamp in format "Year:Month:Day:Hour:Minute:Second"
37            granularity: Level of time precision ("Year", "Month", "Day", "Hour", "Minute", or "Second")
38          
39        Returns:
40            List of log IDs that fall within the specified time range
41        """
42        # Get the slice index based on granularity level
43        slice_index = self.granularity_index[granularity]
44      
45        # Filter logs by comparing truncated timestamps
46        # Only compare up to the specified granularity level
47        result_ids = [
48            log_id 
49            for log_id, timestamp in self.logs 
50            if start[:slice_index] <= timestamp[:slice_index] <= end[:slice_index]
51        ]
52      
53        return result_ids
54
55
56# Your LogSystem object will be instantiated and called as such:
57# obj = LogSystem()
58# obj.put(id, timestamp)
59# param_2 = obj.retrieve(start, end, granularity)
60
1/**
2 * A log system that stores logs with timestamps and supports retrieval 
3 * of logs within a time range at different granularity levels.
4 */
5class LogSystem {
6    // List to store all log entries
7    private List<Log> logEntries;
8  
9    // Map to store the substring length for each granularity level
10    private Map<String, Integer> granularityIndexMap;
11
12    /**
13     * Initialize the log system with granularity mappings.
14     * Timestamp format: "Year:Month:Day:Hour:Minute:Second"
15     */
16    public LogSystem() {
17        logEntries = new ArrayList<>();
18        granularityIndexMap = new HashMap<>();
19      
20        // Define substring lengths for each granularity level
21        granularityIndexMap.put("Year", 4);      // "YYYY"
22        granularityIndexMap.put("Month", 7);     // "YYYY:MM"
23        granularityIndexMap.put("Day", 10);      // "YYYY:MM:DD"
24        granularityIndexMap.put("Hour", 13);     // "YYYY:MM:DD:HH"
25        granularityIndexMap.put("Minute", 16);   // "YYYY:MM:DD:HH:MM"
26        granularityIndexMap.put("Second", 19);   // "YYYY:MM:DD:HH:MM:SS"
27    }
28
29    /**
30     * Store a new log entry with the given ID and timestamp.
31     * 
32     * @param id        The unique identifier for the log entry
33     * @param timestamp The timestamp in format "Year:Month:Day:Hour:Minute:Second"
34     */
35    public void put(int id, String timestamp) {
36        logEntries.add(new Log(id, timestamp));
37    }
38
39    /**
40     * Retrieve all log IDs within the specified time range at the given granularity.
41     * 
42     * @param start       The start timestamp of the range
43     * @param end         The end timestamp of the range
44     * @param granularity The level of time precision ("Year", "Month", "Day", "Hour", "Minute", or "Second")
45     * @return            A list of log IDs that fall within the specified range
46     */
47    public List<Integer> retrieve(String start, String end, String granularity) {
48        List<Integer> resultIds = new ArrayList<>();
49      
50        // Get the substring length for the specified granularity
51        int substringLength = granularityIndexMap.get(granularity);
52      
53        // Truncate start and end timestamps to match the granularity
54        String startTruncated = start.substring(0, substringLength);
55        String endTruncated = end.substring(0, substringLength);
56      
57        // Iterate through all logs and check if they fall within the range
58        for (Log logEntry : logEntries) {
59            // Truncate the log timestamp to match the granularity
60            String logTimestampTruncated = logEntry.timestamp.substring(0, substringLength);
61          
62            // Check if the truncated timestamp is within the range (inclusive)
63            if (startTruncated.compareTo(logTimestampTruncated) <= 0 && 
64                logTimestampTruncated.compareTo(endTruncated) <= 0) {
65                resultIds.add(logEntry.id);
66            }
67        }
68      
69        return resultIds;
70    }
71}
72
73/**
74 * Represents a single log entry with an ID and timestamp.
75 */
76class Log {
77    int id;           // The unique identifier for this log entry
78    String timestamp; // The timestamp when this log was created
79  
80    /**
81     * Constructor to create a new log entry.
82     * 
83     * @param id        The unique identifier for the log
84     * @param timestamp The timestamp in format "Year:Month:Day:Hour:Minute:Second"
85     */
86    Log(int id, String timestamp) {
87        this.id = id;
88        this.timestamp = timestamp;
89    }
90}
91
92/**
93 * Your LogSystem object will be instantiated and called as such:
94 * LogSystem obj = new LogSystem();
95 * obj.put(id,timestamp);
96 * List<Integer> param_2 = obj.retrieve(start,end,granularity);
97 */
98
1class LogSystem {
2public:
3    LogSystem() {
4        // Initialize granularity to substring length mapping
5        // Each granularity level corresponds to a specific position in the timestamp string
6        // Format: "Year:Month:Day:Hour:Minute:Second" -> "YYYY:MM:DD:HH:MM:SS"
7        granularityToLength["Year"] = 4;      // "YYYY"
8        granularityToLength["Month"] = 7;     // "YYYY:MM"
9        granularityToLength["Day"] = 10;      // "YYYY:MM:DD"
10        granularityToLength["Hour"] = 13;     // "YYYY:MM:DD:HH"
11        granularityToLength["Minute"] = 16;   // "YYYY:MM:DD:HH:MM"
12        granularityToLength["Second"] = 19;   // "YYYY:MM:DD:HH:MM:SS"
13    }
14  
15    void put(int id, string timestamp) {
16        // Store the log entry as a pair of id and timestamp
17        logEntries.push_back({id, timestamp});
18    }
19  
20    vector<int> retrieve(string start, string end, string granularity) {
21        vector<int> result;
22      
23        // Get the substring length based on the granularity level
24        int substringLength = granularityToLength[granularity];
25      
26        // Extract relevant portions of timestamps based on granularity
27        string startPrefix = start.substr(0, substringLength);
28        string endPrefix = end.substr(0, substringLength);
29      
30        // Iterate through all log entries and check if they fall within the range
31        for (const auto& [logId, logTimestamp] : logEntries) {
32            string timestampPrefix = logTimestamp.substr(0, substringLength);
33          
34            // Check if the timestamp falls within the specified range
35            if (startPrefix <= timestampPrefix && timestampPrefix <= endPrefix) {
36                result.emplace_back(logId);
37            }
38        }
39      
40        return result;
41    }
42  
43private:
44    // Store all log entries as pairs of (id, timestamp)
45    vector<pair<int, string>> logEntries;
46  
47    // Map granularity level to the corresponding substring length
48    unordered_map<string, int> granularityToLength;
49};
50
51/**
52 * Your LogSystem object will be instantiated and called as such:
53 * LogSystem* obj = new LogSystem();
54 * obj->put(id,timestamp);
55 * vector<int> param_2 = obj->retrieve(start,end,granularity);
56 */
57
1// Map to store granularity level to corresponding substring length
2const granularityToLength: Map<string, number> = new Map([
3    ["Year", 4],      // "YYYY"
4    ["Month", 7],     // "YYYY:MM"
5    ["Day", 10],      // "YYYY:MM:DD"
6    ["Hour", 13],     // "YYYY:MM:DD:HH"
7    ["Minute", 16],   // "YYYY:MM:DD:HH:MM"
8    ["Second", 19]    // "YYYY:MM:DD:HH:MM:SS"
9]);
10
11// Store all log entries as array of objects containing id and timestamp
12const logEntries: Array<{id: number, timestamp: string}> = [];
13
14/**
15 * Stores a log entry with the given id and timestamp
16 * @param id - The unique identifier for the log entry
17 * @param timestamp - The timestamp in format "YYYY:MM:DD:HH:MM:SS"
18 */
19function put(id: number, timestamp: string): void {
20    // Add the log entry to the collection
21    logEntries.push({id, timestamp});
22}
23
24/**
25 * Retrieves all log IDs within the specified time range at the given granularity
26 * @param start - The start timestamp in format "YYYY:MM:DD:HH:MM:SS"
27 * @param end - The end timestamp in format "YYYY:MM:DD:HH:MM:SS"
28 * @param granularity - The level of precision for comparison ("Year", "Month", "Day", "Hour", "Minute", or "Second")
29 * @returns Array of log IDs that fall within the specified range
30 */
31function retrieve(start: string, end: string, granularity: string): number[] {
32    const result: number[] = [];
33  
34    // Get the substring length based on the granularity level
35    const substringLength = granularityToLength.get(granularity)!;
36  
37    // Extract relevant portions of timestamps based on granularity
38    const startPrefix = start.substring(0, substringLength);
39    const endPrefix = end.substring(0, substringLength);
40  
41    // Iterate through all log entries and check if they fall within the range
42    for (const logEntry of logEntries) {
43        const timestampPrefix = logEntry.timestamp.substring(0, substringLength);
44      
45        // Check if the timestamp falls within the specified range
46        if (startPrefix <= timestampPrefix && timestampPrefix <= endPrefix) {
47            result.push(logEntry.id);
48        }
49    }
50  
51    return result;
52}
53

Time and Space Complexity

Time Complexity:

  • put(id, timestamp): O(1) - This method simply appends a tuple (id, timestamp) to the logs list, which is a constant time operation.

  • retrieve(start, end, granularity): O(n) where n is the number of logs stored in the system. The method iterates through all logs in the list comprehension [id for id, ts in self.logs if start[:i] <= ts[:i] <= end[:i]], checking each timestamp against the given range. The string slicing operation ts[:i] takes O(1) time since the maximum value of i is 19 (constant), and string comparison also takes O(1) time for fixed-length strings.

Space Complexity:

  • Overall space complexity: O(n) where n is the number of logs stored. The logs list stores all the log entries as tuples of (id, timestamp).

  • put(id, timestamp): O(1) - No additional space is used beyond storing the single log entry.

  • retrieve(start, end, granularity): O(k) where k is the number of matching log IDs returned. In the worst case where all logs match the criteria, k = n, making it O(n).

  • The dictionary d uses O(1) space as it has a fixed number of entries (6 granularity levels).

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrect String Comparison Assumption

A critical pitfall is assuming that string comparison always works correctly for date/time comparisons. While this solution works because the timestamp format is fixed and zero-padded, developers might incorrectly apply this approach to other date formats.

Problem Example:

# This would fail with non-zero-padded dates:
"2017:1:1:9:30:45" < "2017:1:1:10:30:45"  # Returns False (incorrect!)
# Because "9" > "10" in string comparison

Solution: Always ensure timestamps are zero-padded and maintain consistent format. Add validation:

def put(self, id: int, timestamp: str) -> None:
    # Validate timestamp format
    parts = timestamp.split(':')
    if len(parts) != 6:
        raise ValueError("Invalid timestamp format")
  
    # Verify each part has correct length
    expected_lengths = [4, 2, 2, 2, 2, 2]  # Year, Month, Day, Hour, Minute, Second
    for part, expected_len in zip(parts, expected_lengths):
        if len(part) != expected_len or not part.isdigit():
            raise ValueError(f"Invalid timestamp component: {part}")
  
    self.logs.append((id, timestamp))

2. Performance Degradation with Large Datasets

The retrieve operation has O(n) time complexity, scanning all logs every time. This becomes inefficient with millions of logs.

Problem Example:

# With 1 million logs, every retrieve call checks all 1 million entries
# Even if only 10 logs match the criteria

Solution: Use a sorted data structure or index:

import bisect

class OptimizedLogSystem:
    def __init__(self):
        self.logs = []  # Keep sorted by timestamp
        self.granularity_index = {...}  # same as before
  
    def put(self, id: int, timestamp: str) -> None:
        # Insert in sorted order
        bisect.insort(self.logs, (timestamp, id))
  
    def retrieve(self, start: str, end: str, granularity: str) -> List[int]:
        slice_index = self.granularity_index[granularity]
      
        # Binary search for range boundaries
        left = bisect.bisect_left(self.logs, (start[:slice_index], 0))
        right = bisect.bisect_right(self.logs, (end[:slice_index] + '\xff', float('inf')))
      
        return [id for ts, id in self.logs[left:right] 
                if start[:slice_index] <= ts[:slice_index] <= end[:slice_index]]

3. Memory Inefficiency with Duplicate Timestamps

Storing full timestamps for every log wastes memory when many logs share the same timestamp or when only certain granularities are frequently queried.

Problem Example:

# If 10,000 logs all occurred on "2017:01:01:XX:XX:XX"
# We're storing the "2017:01:01:" prefix 10,000 times

Solution: Use timestamp compression or indexing:

class CompressedLogSystem:
    def __init__(self):
        self.logs = []
        self.granularity_index = {...}
        # Create indexes for frequently queried granularities
        self.year_index = {}  # year -> list of log indices
        self.month_index = {}  # year:month -> list of log indices
  
    def put(self, id: int, timestamp: str) -> None:
        log_idx = len(self.logs)
        self.logs.append((id, timestamp))
      
        # Update indexes
        year = timestamp[:4]
        month = timestamp[:7]
      
        self.year_index.setdefault(year, []).append(log_idx)
        self.month_index.setdefault(month, []).append(log_idx)
  
    def retrieve(self, start: str, end: str, granularity: str) -> List[int]:
        # Use indexes for Year/Month granularities for faster retrieval
        if granularity == "Year":
            # Use year_index to narrow down search space
            pass
        # ... implement index-based retrieval
Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

You are given an array of intervals where intervals[i] = [start_i, end_i] represent the start and end of the ith interval. You need to merge all overlapping intervals and return an array of the non-overlapping intervals that cover all the intervals in the input.


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More