981. Time Based Key-Value Store

MediumDesignHash TableStringBinary Search
Leetcode Link

Problem Description

The problem is about creating a data structure that allows storing multiple values for the same key but at different timestamps, similar to a versioning system. The TimeMap should support two operations:

  1. set: This operation stores the key with the value at the given timestamp.
  2. get: This operation retrieves the value associated with a key at a specific timestamp. The catch here is that if the exact timestamp doesn't exist for that key, we need to provide the value with the closest previous timestamp. If there are no earlier timestamps, it should return an empty string.

To understand it better, imagine you are building a history tracking system for document edits. Each time someone edits a document, you record the new version of the document with a timestamp. Later, someone might want to see the version at a particular time. If a version is not available at the exact time requested, you show the latest version before the requested time.

Intuition

The key to solving this problem lies in efficiently managing the history of each key's values along with their timestamps. The intuition is to use a hash map to store keys but with a twist: each key has a list of pairs, each containing a timestamp and the corresponding value at that timestamp.

Given this structure, implementing the set function becomes straightforward: simply append the (timestamp, value) pair to the list associated with the key.

However, the challenge is in the get function, where you're asked to retrieve a value that is the closest to, but not greater than, the given timestamp. This is where binary search comes into play. The list of (timestamp, value) pairs for each key can be considered as sorted by timestamp. Using binary search, we that the closest previous timestamp to the given timestamp without going over it.

The binary search is implemented using the bisect_right function from Python's bisect module, which returns the index where an element should be inserted to maintain the sorted order. We search for a tuple where the first element is the given timestamp, and the second element is a dummy character chosen to be larger than any possible value (in this case, chr(127) which is the last ASCII character). If bisect_right provides a non-zero index, we step one index back and return the associated value. If the list is empty or the index is 0, it means there are no valid timestamps before the given timestamp, and as per problem description, we return an empty string.

Solution Approach

The solution is implemented with the following key ideas:

  1. Use of Defaultdict: The TimeMap class leverages a defaultdict with lists as its default value. This is a choice of data structure from the collections module in Python which helps in automatically initializing a list for each new key without checking for the key's existence. The structure self.ktv is a dictionary where each key has a list of (timestamp, value) tuples.

  2. Storing Values with TimeStamps:

    • The set method appends a tuple consisting of the timestamp and the value to the list associated with the key. Since set is always called with increasing timestamp order, the list of values for each key naturally remains sorted by timestamp.
  3. Retrieving Values:

    • The get method uses a binary search algorithm to quickly find the correct value for a given timestamp.
    • It checks whether the key exists in self.ktv. If not, it returns an empty string since there are no entries for that key.
    • If the key exists, a binary search is performed using the bisect_right function, which finds the insertion point in the list of values to maintain the sorted order.
    • The target of the binary search is a tuple where the first element is the timestamp and the second is a very high-value character. chr(127) works as the dummy character because chr(127) will always be greater than any string value, ensuring bisect_right returns the position for the latest timestamp that is not greater than the given timestamp.
    • If bisect_right returns zero, it means there were no timestamps less than or equal to the timestamp that we passed, so we return an empty string.
    • Otherwise, we subtract one from the index that bisect_right returned to find the largest timestamp less than or equal to the timestamp requested. Then we return the value associated with this timestamp.

Here's the encapsulated code structure:

  • Initialization: def __init__(self): self.ktv = defaultdict(list)
  • set: def set(self, key: str, value: str, timestamp: int) -> None: self.ktv[key].append((timestamp, value))
  • get:
    1def get(self, key: str, timestamp: int) -> str:
    2    if key not in self.ktv:
    3        return ''
    4    tv = self.ktv[key]
    5    i = bisect_right(tv, (timestamp, chr(127)))
    6    return tv[i - 1][1] if i else ''

Overall, the use of binary search ensures that the retrieval of values is efficient, even when the number of timestamps grows large. It allows the get operation to have a time complexity of O(log n), where n is the number of timestamps associated with a key, making the solution scalable.

💪
Level Up Your
Algo Skills

Example Walkthrough

Let's walk through the process with a simple example to illustrate the solution approach.

Suppose we are using the TimeMap to record the status of a project at different times.

  • At time 1, the status is "Started"
  • At time 3, the status updates to "In Progress"
  • At time 5, the status finally changes to "Completed"

We would perform the following operations on our TimeMap:

  1. set("projectStatus", "Started", 1)
  2. set("projectStatus", "In Progress", 3)
  3. set("projectStatus", "Completed", 5)

Now, our TimeMap data structure, self.ktv, has an entry for "projectStatus", which is a list of timestamp and value tuples like the following:

1"projectStatus": [(1, "Started"), (3, "In Progress"), (5, "Completed")]

Next, let's retrieve the status of the project at different times using the get operation:

  • get("projectStatus", 2): This should return "Started" since it's the latest status before time 2.
  • get("projectStatus", 4): This should return "In Progress" for the same reason.
  • get("projectStatus", 5): This should return "Completed" as it matches the timestamp exactly.
  • get("projectStatus", 0): This should return an empty string since there are no statuses recorded before time 1.

Let's walk through the first get operation step-by-step:

  1. Look up "projectStatus" in self.ktv and find the list [(1, "Started"), (3, "In Progress"), (5, "Completed")].
  2. Using bisect_right, find the index where we would place (2, chr(127)) to maintain sorted order.
    • The sorted position would be index 1 (between (1, "Started") and (3, "In Progress")).
  3. Subtract one from the index: 1 - 1 = 0.
  4. We return the value associated with this index, so ("projectStatus", 2) returns "Started".

Now, imagine checking for get("projectStatus", 0):

  1. Look up "projectStatus" and find the same list as above.
  2. With bisect_right, we find the index where (0, chr(127)) would fit, which gives us index 0 because 0 is less than any existing timestamps in the list.
  3. Since the index is 0, it would be incorrect to subtract 1 to find the previous timestamp, as this would lead to a negative index. Instead, we return an empty string, as there's no status before time 1.

Therefore, ("projectStatus", 0) returns an empty string, which aligns with the functionality described in the problem. This example confirms that the binary search method provides an efficient and effective way to retrieve the correct value, complying with the problem's requirements.

Python Solution

1from collections import defaultdict
2from bisect import bisect_right
3
4class TimeMap:
5    def __init__(self):
6        # Initialize a dictionary to store the lists of (timestamp, value) tuples for each key.
7        self.key_time_value = defaultdict(list)
8
9    def set(self, key: str, value: str, timestamp: int) -> None:
10        # Append the (timestamp, value) tuple to the list corresponding to the key.
11        self.key_time_value[key].append((timestamp, value))
12
13    def get(self, key: str, timestamp: int) -> str:
14        # If the key does not exist in the dictionary, return an empty string.
15        if key not in self.key_time_value:
16            return ''
17      
18        # Retrieve the list of (timestamp, value) tuples for the given key.
19        time_value_pairs = self.key_time_value[key]
20      
21        # Use bisect_right to find the index where the timestamp would be inserted
22        # to maintain the list in sorted order. Since the list is sorted by timestamp,
23        # and we want to find the largest timestamp less than or equal to the given timestamp,
24        # we use (timestamp, chr(127)) as a "high" value to ensure we can locate
25        # timestamps that are equal to the given one.
26        index = bisect_right(time_value_pairs, (timestamp, chr(127)))
27      
28        # If index is 0, there is no timestamp less than or equal to the given timestamp,
29        # so we return an empty string. If not, we return the value corresponding to 
30        # the timestamp immediately before the insertion point, which would be index - 1.
31        return time_value_pairs[index - 1][1] if index else ''
32
33# Your TimeMap object will be instantiated and called as such:
34# obj = TimeMap()
35# obj.set(key,value,timestamp)
36# param_2 = obj.get(key,timestamp)
37

Java Solution

1import java.util.HashMap;
2import java.util.Map;
3import java.util.TreeMap;
4
5class TimeMap {
6    // Using a Map where each key is a string and its value is a TreeMap that associates timestamps with values
7    private Map<String, TreeMap<Integer, String>> keyTimeValueMap = new HashMap<>();
8
9    /**
10     * Initializes the TimeMap object.
11     */
12    public TimeMap() {
13        // Constructor does not need to initialize anything since the HashMap is already initialized.
14    }
15
16    /**
17     * Stores the key with the value along with the given timestamp.
18     *
19     * @param key       the key to store
20     * @param value     the value associated with the key
21     * @param timestamp the timestamp at which the value is set for the key
22     */
23    public void set(String key, String value, int timestamp) {
24        // Compute if absent will insert a new TreeMap if the key is not already present
25        // It then puts the given timestamp and value into the TreeMap for the given key
26        keyTimeValueMap.computeIfAbsent(key, k -> new TreeMap<>()).put(timestamp, value);
27    }
28
29    /**
30     * Retrieves the value for the key at the given timestamp or the last value set before the timestamp.
31     *
32     * @param key       the key to retrieve the value for
33     * @param timestamp the timestamp to retrieve the value at
34     * @return the value set for the key at the given timestamp or the closest previous timestamp
35     */
36    public String get(String key, int timestamp) {
37        // If the key does not exist, return an empty string
38        if (!keyTimeValueMap.containsKey(key)) {
39            return "";
40        }
41        // Retrieve the TreeMap for the given key
42        TreeMap<Integer, String> timeValueMap = keyTimeValueMap.get(key);
43        // Find the greatest timestamp less than or equal to the given timestamp
44        Integer closestTimestamp = timeValueMap.floorKey(timestamp);
45        // If there is no timestamp that is less than or equal to the given timestamp, return an empty string
46        // Otherwise, return the associated value
47        return closestTimestamp == null ? "" : timeValueMap.get(closestTimestamp);
48    }
49}
50

C++ Solution

1#include <string>
2#include <vector>
3#include <unordered_map>
4#include <algorithm>
5using namespace std;
6
7class TimeMap {
8public:
9    /** Initialize your data structure here. */
10    TimeMap() {
11        // Constructor remains empty as there's no initialization needed beyond that of member variables.
12    }
13
14    /**
15    * Stores the key with its value along with the timestamp in the map.
16    * The values associated with each key are stored in a vector of pairs, sorted by their timestamps.
17    * @param key The string key of the value.
18    * @param value The value to associate with the key.
19    * @param timestamp The time at which the value was set.
20    */
21    void set(string key, string value, int timestamp) {
22        // Appending the pair <timestamp, value> to the vector associated with the key.
23        keyTimeValueMap[key].emplace_back(timestamp, value);
24    }
25
26    /**
27    * Retrieves a value associated with the key at a particular timestamp.
28    * If the specific timestamp doesn't exist, the value at the most recent timestamp before that is returned.
29    * @param key The key whose value needs to be retrieved.
30    * @param timestamp The timestamp at which the value was needed.
31    * @return The value at the given timestamp or the last known value before the given timestamp.
32    */
33    string get(string key, int timestamp) {
34        // Reference to the vector of pairs for the given key.
35        auto& pairs = keyTimeValueMap[key];
36      
37        // Constructing a dummy pair up to the current timestamp to use in searching.
38        pair<int, string> dummyPair = {timestamp, string({127})};  // 127 is used as the end of the range character.
39
40        // Finding the upper bound, which will point at the first element that is greater than the dummy pair.
41        auto it = upper_bound(pairs.begin(), pairs.end(), dummyPair);
42
43        // If the iterator is at the beginning, there are no elements less than or equal to the timestamp,
44        // hence return an empty string. Otherwise, decrement iterator to get to the element which has
45        // the latest time less than or equal to the given timestamp.
46        return it == pairs.begin() ? "" : (prev(it))->second;
47    }
48
49private:
50    // Map that associates each key with a vector of pairs, where each pair consists of a timestamp 
51    // and the value associated with that timestamp.
52    unordered_map<string, vector<pair<int, string>>> keyTimeValueMap;
53};
54
55/**
56 * Your TimeMap object will be instantiated and called as such:
57 * TimeMap* obj = new TimeMap();
58 * obj->set(key, value, timestamp);
59 * string val = obj->get(key, timestamp);
60 */
61

Typescript Solution

1type KeyValuePair = { timestamp: number; value: string };
2const keyTimeValueMap: Record<string, KeyValuePair[]> = {};
3
4function set(key: string, value: string, timestamp: number): void {
5    if (!keyTimeValueMap[key]) {
6        keyTimeValueMap[key] = [];
7    }
8    // Append the pair {timestamp, value} to the array associated with the key.
9    keyTimeValueMap[key].push({ timestamp, value });
10}
11
12function get(key: string, timestamp: number): string {
13    const pairs = keyTimeValueMap[key];
14    if (!pairs) {
15        return '';
16    }
17    // Use binary search to find the smallest index i such that pairs[i].timestamp is greater than the timestamp.
18    let low = 0, high = pairs.length;
19    while (low < high) {
20        const mid = Math.floor((low + high) / 2);
21        if (pairs[mid].timestamp <= timestamp) {
22            low = mid + 1;
23        } else {
24            high = mid;
25        }
26    }
27    // If we found an element whose timestamp is greater than our target or there are no
28    // elements with a timestamp <= target, 'low' will be the length of the array or 0, respectively.
29    if (low === 0) {
30        return '';
31    }
32    // Return the last known value before the given timestamp.
33    return pairs[low - 1].value;
34}
35

Time and Space Complexity

Time Complexity

The set method has a time complexity of O(1) for each operation, as it appends the (timestamp, value) tuple to the list corresponding to the key in the defaultdict.

The get method's time complexity is O(log N) for each operation, where N is the number of entries associated with the specific key. This is because it performs a binary search (bisect_right) to find the position where the given timestamp would fit in the sorted order of timestamps stored.

Space Complexity

The space complexity is O(K + T), where K is the number of unique keys and T is the total number of set calls or the total number of timestamp and value pairs. This is because all values with their corresponding timestamps for all keys are stored in the defaultdict.

😈
Become an
Algo Monster

Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫