Facebook Pixel

1348. Tweet Counts Per Frequency

MediumDesignHash TableBinary SearchOrdered SetSorting
Leetcode Link

Problem Description

This problem asks you to design a system that tracks tweets and analyzes their frequency over different time periods.

You need to implement a TweetCounts class with the following functionality:

  1. Store tweets: Record tweets with their name and timestamp (in seconds)
  2. Analyze tweet frequency: Count how many tweets occurred in specific time intervals

The key challenge is partitioning time periods into chunks based on different frequencies:

  • Minute: 60-second chunks
  • Hour: 3600-second chunks
  • Day: 86400-second chunks

For example, if you want to analyze tweets from time 10 to 10000:

  • With "minute" frequency: You get chunks [10,69], [70,129], [130,189], etc.
  • With "hour" frequency: You get chunks [10,3609], [3610,7209], [7210,10000]
  • With "day" frequency: You get a single chunk [10,10000]

The last chunk might be shorter than the standard chunk size - it always ends at the specified endTime.

The class methods are:

  • TweetCounts(): Initialize the data structure
  • recordTweet(tweetName, time): Store a tweet with its name and timestamp
  • getTweetCountsPerFrequency(freq, tweetName, startTime, endTime): Return a list where each element represents the count of tweets in each time chunk within [startTime, endTime]

The solution uses a defaultdict of SortedList to store tweets efficiently. Each tweet name maps to a sorted list of timestamps. When querying, it uses binary search (bisect_left) to quickly find how many tweets fall within each time chunk, iterating through chunks of size determined by the frequency parameter.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The core insight is that we need to efficiently count tweets within specific time ranges. Since we'll be querying different time intervals repeatedly, we need a data structure that allows fast range queries.

The first observation is that tweets for different names are independent - we can store them separately. This leads us to use a dictionary where keys are tweet names.

For each tweet name, we need to store timestamps in a way that allows quick counting within any given range [start, end]. If we store timestamps in sorted order, we can use binary search to find:

  • The position where our range starts
  • The position where our range ends
  • The count is simply the difference between these positions

This is why SortedList is perfect - it maintains sorted order automatically when we insert new tweets, and provides bisect_left for efficient binary search.

When getting tweet counts per frequency, we need to divide the time period [startTime, endTime] into chunks. The chunk size depends on the frequency:

  • "minute" → 60 seconds
  • "hour" → 3600 seconds
  • "day" → 86400 seconds

We iterate through the time period, creating chunks of the appropriate size. For each chunk [t, t+f) where f is the frequency size, we:

  1. Find the index of the first tweet at or after time t using bisect_left(t)
  2. Find the index of the first tweet at or after time t+f (or endTime+1 if that's smaller) using bisect_left(min(t+f, endTime+1))
  3. The difference gives us the count of tweets in this chunk

The key optimization is using binary search instead of linear scanning, reducing the time complexity from O(n) to O(log n) for each chunk query, where n is the number of tweets for that name.

Learn more about Binary Search and Sorting patterns.

Solution Approach

The solution uses a dictionary of sorted lists to efficiently store and query tweet data.

Data Structure Design:

  • self.d: A dictionary mapping frequency strings to their corresponding seconds ({"minute": 60, "hour": 3600, "day": 86400})
  • self.data: A defaultdict(SortedList) where each key is a tweet name and the value is a sorted list of timestamps

Method Implementation:

  1. __init__(): Initializes the two data structures mentioned above. Using defaultdict ensures we don't need to check if a tweet name exists before adding timestamps.

  2. recordTweet(tweetName, time): Simply adds the timestamp to the sorted list for the given tweet name. The SortedList automatically maintains sorted order with O(log n) insertion time.

    self.data[tweetName].add(time)
  3. getTweetCountsPerFrequency(freq, tweetName, startTime, endTime): This is the core method that partitions the time range and counts tweets.

    • First, get the chunk size f from the frequency dictionary
    • Retrieve the sorted list of timestamps for the tweet name
    • Initialize time pointer t at startTime
    • For each chunk:
      l = tweets.bisect_left(t)  # Find first tweet >= t
      r = tweets.bisect_left(min(t + f, endTime + 1))  # Find first tweet >= end of chunk
      ans.append(r - l)  # Count = difference in indices
      t += f  # Move to next chunk

    The key insight is using min(t + f, endTime + 1) for the right boundary. This ensures:

    • Regular chunks use t + f as the boundary
    • The last chunk is capped at endTime + 1 (since we want tweets up to and including endTime)

    The bisect_left operations find the leftmost position where we could insert the given value while maintaining sorted order, effectively giving us the index of the first element >= the search value.

Time Complexity:

  • recordTweet: O(log n) where n is the number of tweets for that name
  • getTweetCountsPerFrequency: O(k * log n) where k is the number of chunks and n is the number of tweets for that name

Space Complexity:

  • O(m * n) where m is the number of unique tweet names and n is the average number of tweets per name

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to understand how the solution works.

Setup:

tc = TweetCounts()

Recording tweets:

tc.recordTweet("tweet1", 10)
tc.recordTweet("tweet1", 80)
tc.recordTweet("tweet1", 150)
tc.recordTweet("tweet1", 3650)

After these operations, our data structure looks like:

self.data = {
    "tweet1": SortedList([10, 80, 150, 3650])
}

Query: Get tweet counts per minute from time 0 to 3700

result = tc.getTweetCountsPerFrequency("minute", "tweet1", 0, 3700)

Let's trace through the execution:

  1. Initialize variables:

    • f = 60 (chunk size for "minute")
    • tweets = [10, 80, 150, 3650]
    • t = 0 (starting time)
    • ans = [] (result array)
  2. Process chunks:

    Chunk 1: [0, 60)

    • l = bisect_left(0) = 0 (index where 0 would be inserted)
    • r = bisect_left(min(0+60, 3701)) = bisect_left(60) = 1 (index where 60 would be inserted)
    • Count = r - l = 1 - 0 = 1 (only tweet at time 10 is in this range)
    • ans = [1], t = 60

    Chunk 2: [60, 120)

    • l = bisect_left(60) = 1
    • r = bisect_left(120) = 2
    • Count = 2 - 1 = 1 (only tweet at time 80)
    • ans = [1, 1], t = 120

    Chunk 3: [120, 180)

    • l = bisect_left(120) = 2
    • r = bisect_left(180) = 3
    • Count = 3 - 2 = 1 (only tweet at time 150)
    • ans = [1, 1, 1], t = 180

    Chunks 4-61: [180, 240), [240, 300), ..., [3600, 3660)

    • All these chunks have no tweets, so we append 0s
    • ans = [1, 1, 1, 0, 0, ..., 0] (58 zeros)
    • t = 3660

    Final Chunk: [3660, 3700]

    • l = bisect_left(3660) = 4
    • r = bisect_left(min(3660+60, 3701)) = bisect_left(3701) = 4
    • Count = 4 - 4 = 0 (tweet at 3650 is before this chunk)
    • Final ans = [1, 1, 1, 0, 0, ..., 0] (total 62 elements)

Key Observations:

  1. Binary search efficiency: Instead of checking every tweet for each chunk, bisect_left finds the boundaries in O(log n) time.

  2. Handling the last chunk: Notice how we use min(t + f, endTime + 1). In the final chunk, instead of going to 3720 (3660+60), we cap it at 3701 (3700+1), ensuring we only count tweets up to and including time 3700.

  3. Empty chunks: The solution correctly handles chunks with no tweets (returning 0).

  4. Sorted order maintenance: The SortedList automatically keeps timestamps in order, making our binary searches valid even as we add new tweets.

Solution Implementation

1from collections import defaultdict
2from sortedcontainers import SortedList
3from typing import List
4
5
6class TweetCounts:
7    def __init__(self):
8        # Dictionary mapping frequency types to their duration in seconds
9        self.frequency_duration = {
10            "minute": 60,
11            "hour": 3600,
12            "day": 86400
13        }
14        # Store tweets for each user as a sorted list of timestamps
15        self.tweet_data = defaultdict(SortedList)
16
17    def recordTweet(self, tweetName: str, time: int) -> None:
18        """
19        Records a tweet with the given name at the specified time.
20      
21        Args:
22            tweetName: The name/ID of the tweet
23            time: The timestamp when the tweet was posted
24        """
25        self.tweet_data[tweetName].add(time)
26
27    def getTweetCountsPerFrequency(
28        self, freq: str, tweetName: str, startTime: int, endTime: int
29    ) -> List[int]:
30        """
31        Returns the count of tweets per time interval based on the given frequency.
32      
33        Args:
34            freq: The frequency type ("minute", "hour", or "day")
35            tweetName: The name/ID of the tweet to count
36            startTime: The start timestamp of the range (inclusive)
37            endTime: The end timestamp of the range (inclusive)
38          
39        Returns:
40            List of tweet counts for each time interval
41        """
42        # Get the duration for the specified frequency
43        interval_duration = self.frequency_duration[freq]
44      
45        # Get all tweets for the specified tweet name
46        tweet_timestamps = self.tweet_data[tweetName]
47      
48        # Initialize variables for iteration
49        current_time = startTime
50        result = []
51      
52        # Iterate through each time interval
53        while current_time <= endTime:
54            # Find the start index for tweets in this interval
55            left_index = tweet_timestamps.bisect_left(current_time)
56          
57            # Find the end index for tweets in this interval
58            # The interval ends at either (current_time + duration) or (endTime + 1), whichever is smaller
59            interval_end = min(current_time + interval_duration, endTime + 1)
60            right_index = tweet_timestamps.bisect_left(interval_end)
61          
62            # Count tweets in this interval (difference between indices)
63            tweet_count = right_index - left_index
64            result.append(tweet_count)
65          
66            # Move to the next interval
67            current_time += interval_duration
68          
69        return result
70
71
72# Your TweetCounts object will be instantiated and called as such:
73# obj = TweetCounts()
74# obj.recordTweet(tweetName,time)
75# param_2 = obj.getTweetCountsPerFrequency(freq,tweetName,startTime,endTime)
76
1class TweetCounts {
2    // Map to store tweet name -> TreeMap of (timestamp -> count)
3    // TreeMap maintains timestamps in sorted order for efficient range queries
4    private Map<String, TreeMap<Integer, Integer>> tweetData;
5
6    /**
7     * Initialize the TweetCounts data structure
8     */
9    public TweetCounts() {
10        tweetData = new HashMap<>();
11    }
12
13    /**
14     * Records a tweet at the given timestamp
15     * @param tweetName The name of the tweet
16     * @param time The timestamp when the tweet occurred
17     */
18    public void recordTweet(String tweetName, int time) {
19        // Create a new TreeMap for this tweet name if it doesn't exist
20        tweetData.putIfAbsent(tweetName, new TreeMap<>());
21      
22        // Get the TreeMap for this tweet name
23        TreeMap<Integer, Integer> timeMap = tweetData.get(tweetName);
24      
25        // Increment the count for this timestamp (or set to 1 if first occurrence)
26        timeMap.put(time, timeMap.getOrDefault(time, 0) + 1);
27    }
28
29    /**
30     * Gets tweet counts grouped by the specified frequency interval
31     * @param freq The frequency interval: "minute", "hour", or "day"
32     * @param tweetName The name of the tweet to query
33     * @param startTime The start timestamp (inclusive)
34     * @param endTime The end timestamp (inclusive)
35     * @return List of tweet counts per frequency interval
36     */
37    public List<Integer> getTweetCountsPerFrequency(
38            String freq, String tweetName, int startTime, int endTime) {
39      
40        // Determine the interval size in seconds based on frequency
41        int intervalSeconds = 60;  // Default to minute (60 seconds)
42        if ("hour".equals(freq)) {
43            intervalSeconds = 3600;  // 60 * 60 seconds
44        } else if ("day".equals(freq)) {
45            intervalSeconds = 86400;  // 60 * 60 * 24 seconds
46        }
47      
48        // Get the TreeMap for the specified tweet name
49        TreeMap<Integer, Integer> timeMap = tweetData.get(tweetName);
50      
51        // Initialize result list
52        List<Integer> result = new ArrayList<>();
53      
54        // If tweet doesn't exist, return empty list with appropriate size
55        if (timeMap == null) {
56            for (int currentStart = startTime; currentStart <= endTime; currentStart += intervalSeconds) {
57                result.add(0);
58            }
59            return result;
60        }
61      
62        // Iterate through each time interval
63        for (int currentStart = startTime; currentStart <= endTime; currentStart += intervalSeconds) {
64            int tweetCount = 0;
65          
66            // Calculate the end of current interval (exclusive)
67            // Ensure it doesn't exceed endTime + 1
68            int currentEnd = Math.min(currentStart + intervalSeconds, endTime + 1);
69          
70            // Get all tweets in the current interval [currentStart, currentEnd)
71            // subMap returns a view of the portion of this map whose keys range from
72            // currentStart (inclusive) to currentEnd (exclusive)
73            for (int count : timeMap.subMap(currentStart, currentEnd).values()) {
74                tweetCount += count;
75            }
76          
77            result.add(tweetCount);
78        }
79      
80        return result;
81    }
82}
83
84/**
85 * Your TweetCounts object will be instantiated and called as such:
86 * TweetCounts obj = new TweetCounts();
87 * obj.recordTweet(tweetName,time);
88 * List<Integer> param_2 = obj.getTweetCountsPerFrequency(freq,tweetName,startTime,endTime);
89 */
90
1class TweetCounts {
2private:
3    // Store tweets for each tweet name, using multiset to keep timestamps sorted and allow duplicates
4    unordered_map<string, multiset<int>> tweetData;
5
6public:
7    TweetCounts() {
8        // Constructor - no initialization needed as containers are default initialized
9    }
10
11    void recordTweet(string tweetName, int time) {
12        // Insert the timestamp for the given tweet name
13        // multiset maintains sorted order automatically
14        tweetData[tweetName].insert(time);
15    }
16
17    vector<int> getTweetCountsPerFrequency(string freq, string tweetName, int startTime, int endTime) {
18        // Determine the interval size based on frequency type
19        int intervalSeconds = 60;  // Default to "minute" (60 seconds)
20        if (freq == "hour") {
21            intervalSeconds = 3600;  // 60 * 60 seconds
22        } else if (freq == "day") {
23            intervalSeconds = 86400;  // 60 * 60 * 24 seconds
24        }
25      
26        // Calculate the number of intervals needed
27        // Add 1 because intervals are inclusive
28        int numIntervals = (endTime - startTime) / intervalSeconds + 1;
29        vector<int> result(numIntervals, 0);
30      
31        // Find the range of tweets within [startTime, endTime]
32        // lower_bound returns iterator to first element >= startTime
33        auto startIter = tweetData[tweetName].lower_bound(startTime);
34        // upper_bound returns iterator to first element > endTime
35        auto endIter = tweetData[tweetName].upper_bound(endTime);
36      
37        // Count tweets in each interval
38        for (auto iter = startIter; iter != endIter; ++iter) {
39            // Calculate which interval this tweet belongs to
40            int intervalIndex = (*iter - startTime) / intervalSeconds;
41            // Increment the count for this interval
42            ++result[intervalIndex];
43        }
44      
45        return result;
46    }
47};
48
49/**
50 * Your TweetCounts object will be instantiated and called as such:
51 * TweetCounts* obj = new TweetCounts();
52 * obj->recordTweet(tweetName,time);
53 * vector<int> param_2 = obj->getTweetCountsPerFrequency(freq,tweetName,startTime,endTime);
54 */
55
1// Store tweets for each tweet name, using an array to keep timestamps
2// We'll need to sort when querying since TypeScript doesn't have a built-in sorted set
3const tweetData: Map<string, number[]> = new Map();
4
5/**
6 * Records a tweet with the given name at the specified time
7 * @param tweetName - The name of the tweet
8 * @param time - The timestamp when the tweet was recorded
9 */
10function recordTweet(tweetName: string, time: number): void {
11    // Get or create the array for this tweet name
12    if (!tweetData.has(tweetName)) {
13        tweetData.set(tweetName, []);
14    }
15  
16    // Add the timestamp to the array
17    const timestamps = tweetData.get(tweetName)!;
18    timestamps.push(time);
19  
20    // Keep the array sorted for efficient range queries
21    // Insert in sorted position to maintain order
22    let insertIndex = timestamps.length - 1;
23    while (insertIndex > 0 && timestamps[insertIndex] < timestamps[insertIndex - 1]) {
24        // Swap elements to maintain sorted order
25        const temp = timestamps[insertIndex];
26        timestamps[insertIndex] = timestamps[insertIndex - 1];
27        timestamps[insertIndex - 1] = temp;
28        insertIndex--;
29    }
30}
31
32/**
33 * Gets the count of tweets per time interval based on the specified frequency
34 * @param freq - The frequency type: "minute", "hour", or "day"
35 * @param tweetName - The name of the tweet to query
36 * @param startTime - The start time of the query range (inclusive)
37 * @param endTime - The end time of the query range (inclusive)
38 * @returns An array of tweet counts for each interval
39 */
40function getTweetCountsPerFrequency(
41    freq: string, 
42    tweetName: string, 
43    startTime: number, 
44    endTime: number
45): number[] {
46    // Determine the interval size in seconds based on frequency type
47    let intervalSeconds: number;
48    if (freq === "minute") {
49        intervalSeconds = 60;  // 60 seconds
50    } else if (freq === "hour") {
51        intervalSeconds = 3600;  // 60 * 60 seconds
52    } else {  // freq === "day"
53        intervalSeconds = 86400;  // 60 * 60 * 24 seconds
54    }
55  
56    // Calculate the number of intervals needed
57    // Add 1 because intervals are inclusive
58    const numIntervals = Math.floor((endTime - startTime) / intervalSeconds) + 1;
59    const result: number[] = new Array(numIntervals).fill(0);
60  
61    // Get the timestamps for this tweet name, return empty result if not found
62    const timestamps = tweetData.get(tweetName);
63    if (!timestamps) {
64        return result;
65    }
66  
67    // Binary search to find the start index (first element >= startTime)
68    let startIndex = lowerBound(timestamps, startTime);
69  
70    // Binary search to find the end index (first element > endTime)
71    let endIndex = upperBound(timestamps, endTime);
72  
73    // Count tweets in each interval
74    for (let i = startIndex; i < endIndex; i++) {
75        // Calculate which interval this tweet belongs to
76        const intervalIndex = Math.floor((timestamps[i] - startTime) / intervalSeconds);
77        // Increment the count for this interval
78        result[intervalIndex]++;
79    }
80  
81    return result;
82}
83
84/**
85 * Binary search helper to find the first element >= target
86 * @param arr - Sorted array to search
87 * @param target - Target value to find
88 * @returns Index of first element >= target
89 */
90function lowerBound(arr: number[], target: number): number {
91    let left = 0;
92    let right = arr.length;
93  
94    while (left < right) {
95        const mid = Math.floor((left + right) / 2);
96        if (arr[mid] < target) {
97            left = mid + 1;
98        } else {
99            right = mid;
100        }
101    }
102  
103    return left;
104}
105
106/**
107 * Binary search helper to find the first element > target
108 * @param arr - Sorted array to search
109 * @param target - Target value to find
110 * @returns Index of first element > target
111 */
112function upperBound(arr: number[], target: number): number {
113    let left = 0;
114    let right = arr.length;
115  
116    while (left < right) {
117        const mid = Math.floor((left + right) / 2);
118        if (arr[mid] <= target) {
119            left = mid + 1;
120        } else {
121            right = mid;
122        }
123    }
124  
125    return left;
126}
127

Time and Space Complexity

Time Complexity:

  • __init__(): O(1) - Simply initializes a dictionary and a defaultdict
  • recordTweet(): O(log n) where n is the number of tweets for the given tweetName - The SortedList add operation maintains sorted order using binary search insertion
  • getTweetCountsPerFrequency(): O(k * log n) where k is the number of intervals (endTime - startTime) / freq and n is the number of tweets for the given tweetName
    • The while loop runs k times where k = ceiling((endTime - startTime + 1) / freq)
    • Each iteration performs two bisect_left operations, each taking O(log n) time
    • The subtraction r - l takes O(1) time
    • Therefore, total time is O(k * log n)

Space Complexity:

  • Overall space: O(m * n) where m is the number of unique tweet names and n is the average number of tweets per name
  • self.d: O(1) - Stores only 3 key-value pairs
  • self.data: O(m * n) - Stores all tweets where m is the number of unique tweet names and n is the average number of tweets per name
  • recordTweet(): O(1) additional space (not counting the space to store the tweet itself)
  • getTweetCountsPerFrequency(): O(k) additional space where k is the number of intervals, used for the ans list

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Off-by-One Error with End Time Boundary

One of the most common mistakes is incorrectly handling the inclusive nature of the endTime parameter. The problem specifies that the time range is [startTime, endTime] (both inclusive), but when using bisect_left, you need to search for endTime + 1 to include tweets at exactly endTime.

Incorrect:

right_index = tweet_timestamps.bisect_left(min(current_time + interval_duration, endTime))

Correct:

right_index = tweet_timestamps.bisect_left(min(current_time + interval_duration, endTime + 1))

2. Importing SortedList Incorrectly

The SortedList is not a built-in Python data structure. It requires the sortedcontainers library, which may not be available in all coding environments.

Alternative Solution Using Built-in bisect:

from collections import defaultdict
import bisect

class TweetCounts:
    def __init__(self):
        self.frequency_duration = {"minute": 60, "hour": 3600, "day": 86400}
        self.tweet_data = defaultdict(list)
  
    def recordTweet(self, tweetName: str, time: int) -> None:
        # Use bisect.insort to maintain sorted order
        bisect.insort(self.tweet_data[tweetName], time)

3. Empty Tweet List Handling

While the current implementation handles empty tweet lists correctly (returning zeros), explicitly checking can make the code more readable and potentially more efficient:

def getTweetCountsPerFrequency(self, freq: str, tweetName: str, startTime: int, endTime: int) -> List[int]:
    interval_duration = self.frequency_duration[freq]
  
    # Early return for non-existent tweet names
    if tweetName not in self.tweet_data:
        num_intervals = (endTime - startTime) // interval_duration + 1
        return [0] * num_intervals
  
    tweet_timestamps = self.tweet_data[tweetName]
    # ... rest of the logic

4. Misunderstanding Chunk Boundaries

A common misconception is that chunks should be aligned to absolute time boundaries (e.g., minute 0-59, 60-119). However, chunks actually start from startTime, not from time 0.

Example: If startTime=10 and freq="minute":

  • First chunk: [10, 69] (NOT [10, 59])
  • Second chunk: [70, 129] (NOT [60, 119])

5. Integer Overflow in Large Time Ranges

For extremely large time ranges with "minute" frequency, the number of chunks could be very large, potentially causing memory issues:

def getTweetCountsPerFrequency(self, freq: str, tweetName: str, startTime: int, endTime: int) -> List[int]:
    interval_duration = self.frequency_duration[freq]
  
    # Add validation to prevent excessive memory usage
    max_chunks = (endTime - startTime) // interval_duration + 1
    if max_chunks > 10**6:  # Arbitrary large limit
        raise ValueError("Time range too large for given frequency")
  
    # ... rest of the implementation

6. Not Handling Duplicate Timestamps

The current solution correctly handles duplicate timestamps (multiple tweets at the same time), but it's worth noting that using a regular list with bisect would also handle this correctly. The key is understanding that both bisect_left and bisect_right would give different results when duplicates exist, and choosing the appropriate one based on your needs.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

How many times is a tree node visited in a depth first search?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More