1348. Tweet Counts Per Frequency
Problem Description
This problem asks you to design a system that tracks tweets and analyzes their frequency over different time periods.
You need to implement a TweetCounts
class with the following functionality:
- Store tweets: Record tweets with their name and timestamp (in seconds)
- Analyze tweet frequency: Count how many tweets occurred in specific time intervals
The key challenge is partitioning time periods into chunks based on different frequencies:
- Minute: 60-second chunks
- Hour: 3600-second chunks
- Day: 86400-second chunks
For example, if you want to analyze tweets from time 10 to 10000:
- With "minute" frequency: You get chunks
[10,69]
,[70,129]
,[130,189]
, etc. - With "hour" frequency: You get chunks
[10,3609]
,[3610,7209]
,[7210,10000]
- With "day" frequency: You get a single chunk
[10,10000]
The last chunk might be shorter than the standard chunk size - it always ends at the specified endTime
.
The class methods are:
TweetCounts()
: Initialize the data structurerecordTweet(tweetName, time)
: Store a tweet with its name and timestampgetTweetCountsPerFrequency(freq, tweetName, startTime, endTime)
: Return a list where each element represents the count of tweets in each time chunk within[startTime, endTime]
The solution uses a defaultdict
of SortedList
to store tweets efficiently. Each tweet name maps to a sorted list of timestamps. When querying, it uses binary search (bisect_left
) to quickly find how many tweets fall within each time chunk, iterating through chunks of size determined by the frequency parameter.
Intuition
The core insight is that we need to efficiently count tweets within specific time ranges. Since we'll be querying different time intervals repeatedly, we need a data structure that allows fast range queries.
The first observation is that tweets for different names are independent - we can store them separately. This leads us to use a dictionary where keys are tweet names.
For each tweet name, we need to store timestamps in a way that allows quick counting within any given range [start, end]
. If we store timestamps in sorted order, we can use binary search to find:
- The position where our range starts
- The position where our range ends
- The count is simply the difference between these positions
This is why SortedList
is perfect - it maintains sorted order automatically when we insert new tweets, and provides bisect_left
for efficient binary search.
When getting tweet counts per frequency, we need to divide the time period [startTime, endTime]
into chunks. The chunk size depends on the frequency:
- "minute" → 60 seconds
- "hour" → 3600 seconds
- "day" → 86400 seconds
We iterate through the time period, creating chunks of the appropriate size. For each chunk [t, t+f)
where f
is the frequency size, we:
- Find the index of the first tweet at or after time
t
usingbisect_left(t)
- Find the index of the first tweet at or after time
t+f
(orendTime+1
if that's smaller) usingbisect_left(min(t+f, endTime+1))
- The difference gives us the count of tweets in this chunk
The key optimization is using binary search instead of linear scanning, reducing the time complexity from O(n) to O(log n) for each chunk query, where n is the number of tweets for that name.
Learn more about Binary Search and Sorting patterns.
Solution Approach
The solution uses a dictionary of sorted lists to efficiently store and query tweet data.
Data Structure Design:
self.d
: A dictionary mapping frequency strings to their corresponding seconds ({"minute": 60, "hour": 3600, "day": 86400}
)self.data
: Adefaultdict(SortedList)
where each key is a tweet name and the value is a sorted list of timestamps
Method Implementation:
-
__init__()
: Initializes the two data structures mentioned above. Usingdefaultdict
ensures we don't need to check if a tweet name exists before adding timestamps. -
recordTweet(tweetName, time)
: Simply adds the timestamp to the sorted list for the given tweet name. TheSortedList
automatically maintains sorted order with O(log n) insertion time.self.data[tweetName].add(time)
-
getTweetCountsPerFrequency(freq, tweetName, startTime, endTime)
: This is the core method that partitions the time range and counts tweets.- First, get the chunk size
f
from the frequency dictionary - Retrieve the sorted list of timestamps for the tweet name
- Initialize time pointer
t
atstartTime
- For each chunk:
l = tweets.bisect_left(t) # Find first tweet >= t r = tweets.bisect_left(min(t + f, endTime + 1)) # Find first tweet >= end of chunk ans.append(r - l) # Count = difference in indices t += f # Move to next chunk
The key insight is using
min(t + f, endTime + 1)
for the right boundary. This ensures:- Regular chunks use
t + f
as the boundary - The last chunk is capped at
endTime + 1
(since we want tweets up to and includingendTime
)
The
bisect_left
operations find the leftmost position where we could insert the given value while maintaining sorted order, effectively giving us the index of the first element >= the search value. - First, get the chunk size
Time Complexity:
recordTweet
: O(log n) where n is the number of tweets for that namegetTweetCountsPerFrequency
: O(k * log n) where k is the number of chunks and n is the number of tweets for that name
Space Complexity:
- O(m * n) where m is the number of unique tweet names and n is the average number of tweets per name
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a concrete example to understand how the solution works.
Setup:
tc = TweetCounts()
Recording tweets:
tc.recordTweet("tweet1", 10) tc.recordTweet("tweet1", 80) tc.recordTweet("tweet1", 150) tc.recordTweet("tweet1", 3650)
After these operations, our data structure looks like:
self.data = { "tweet1": SortedList([10, 80, 150, 3650]) }
Query: Get tweet counts per minute from time 0 to 3700
result = tc.getTweetCountsPerFrequency("minute", "tweet1", 0, 3700)
Let's trace through the execution:
-
Initialize variables:
f = 60
(chunk size for "minute")tweets = [10, 80, 150, 3650]
t = 0
(starting time)ans = []
(result array)
-
Process chunks:
Chunk 1: [0, 60)
l = bisect_left(0) = 0
(index where 0 would be inserted)r = bisect_left(min(0+60, 3701)) = bisect_left(60) = 1
(index where 60 would be inserted)- Count =
r - l = 1 - 0 = 1
(only tweet at time 10 is in this range) ans = [1]
,t = 60
Chunk 2: [60, 120)
l = bisect_left(60) = 1
r = bisect_left(120) = 2
- Count =
2 - 1 = 1
(only tweet at time 80) ans = [1, 1]
,t = 120
Chunk 3: [120, 180)
l = bisect_left(120) = 2
r = bisect_left(180) = 3
- Count =
3 - 2 = 1
(only tweet at time 150) ans = [1, 1, 1]
,t = 180
Chunks 4-61: [180, 240), [240, 300), ..., [3600, 3660)
- All these chunks have no tweets, so we append 0s
ans = [1, 1, 1, 0, 0, ..., 0]
(58 zeros)t = 3660
Final Chunk: [3660, 3700]
l = bisect_left(3660) = 4
r = bisect_left(min(3660+60, 3701)) = bisect_left(3701) = 4
- Count =
4 - 4 = 0
(tweet at 3650 is before this chunk) - Final
ans = [1, 1, 1, 0, 0, ..., 0]
(total 62 elements)
Key Observations:
-
Binary search efficiency: Instead of checking every tweet for each chunk,
bisect_left
finds the boundaries in O(log n) time. -
Handling the last chunk: Notice how we use
min(t + f, endTime + 1)
. In the final chunk, instead of going to 3720 (3660+60), we cap it at 3701 (3700+1), ensuring we only count tweets up to and including time 3700. -
Empty chunks: The solution correctly handles chunks with no tweets (returning 0).
-
Sorted order maintenance: The
SortedList
automatically keeps timestamps in order, making our binary searches valid even as we add new tweets.
Solution Implementation
1from collections import defaultdict
2from sortedcontainers import SortedList
3from typing import List
4
5
6class TweetCounts:
7 def __init__(self):
8 # Dictionary mapping frequency types to their duration in seconds
9 self.frequency_duration = {
10 "minute": 60,
11 "hour": 3600,
12 "day": 86400
13 }
14 # Store tweets for each user as a sorted list of timestamps
15 self.tweet_data = defaultdict(SortedList)
16
17 def recordTweet(self, tweetName: str, time: int) -> None:
18 """
19 Records a tweet with the given name at the specified time.
20
21 Args:
22 tweetName: The name/ID of the tweet
23 time: The timestamp when the tweet was posted
24 """
25 self.tweet_data[tweetName].add(time)
26
27 def getTweetCountsPerFrequency(
28 self, freq: str, tweetName: str, startTime: int, endTime: int
29 ) -> List[int]:
30 """
31 Returns the count of tweets per time interval based on the given frequency.
32
33 Args:
34 freq: The frequency type ("minute", "hour", or "day")
35 tweetName: The name/ID of the tweet to count
36 startTime: The start timestamp of the range (inclusive)
37 endTime: The end timestamp of the range (inclusive)
38
39 Returns:
40 List of tweet counts for each time interval
41 """
42 # Get the duration for the specified frequency
43 interval_duration = self.frequency_duration[freq]
44
45 # Get all tweets for the specified tweet name
46 tweet_timestamps = self.tweet_data[tweetName]
47
48 # Initialize variables for iteration
49 current_time = startTime
50 result = []
51
52 # Iterate through each time interval
53 while current_time <= endTime:
54 # Find the start index for tweets in this interval
55 left_index = tweet_timestamps.bisect_left(current_time)
56
57 # Find the end index for tweets in this interval
58 # The interval ends at either (current_time + duration) or (endTime + 1), whichever is smaller
59 interval_end = min(current_time + interval_duration, endTime + 1)
60 right_index = tweet_timestamps.bisect_left(interval_end)
61
62 # Count tweets in this interval (difference between indices)
63 tweet_count = right_index - left_index
64 result.append(tweet_count)
65
66 # Move to the next interval
67 current_time += interval_duration
68
69 return result
70
71
72# Your TweetCounts object will be instantiated and called as such:
73# obj = TweetCounts()
74# obj.recordTweet(tweetName,time)
75# param_2 = obj.getTweetCountsPerFrequency(freq,tweetName,startTime,endTime)
76
1class TweetCounts {
2 // Map to store tweet name -> TreeMap of (timestamp -> count)
3 // TreeMap maintains timestamps in sorted order for efficient range queries
4 private Map<String, TreeMap<Integer, Integer>> tweetData;
5
6 /**
7 * Initialize the TweetCounts data structure
8 */
9 public TweetCounts() {
10 tweetData = new HashMap<>();
11 }
12
13 /**
14 * Records a tweet at the given timestamp
15 * @param tweetName The name of the tweet
16 * @param time The timestamp when the tweet occurred
17 */
18 public void recordTweet(String tweetName, int time) {
19 // Create a new TreeMap for this tweet name if it doesn't exist
20 tweetData.putIfAbsent(tweetName, new TreeMap<>());
21
22 // Get the TreeMap for this tweet name
23 TreeMap<Integer, Integer> timeMap = tweetData.get(tweetName);
24
25 // Increment the count for this timestamp (or set to 1 if first occurrence)
26 timeMap.put(time, timeMap.getOrDefault(time, 0) + 1);
27 }
28
29 /**
30 * Gets tweet counts grouped by the specified frequency interval
31 * @param freq The frequency interval: "minute", "hour", or "day"
32 * @param tweetName The name of the tweet to query
33 * @param startTime The start timestamp (inclusive)
34 * @param endTime The end timestamp (inclusive)
35 * @return List of tweet counts per frequency interval
36 */
37 public List<Integer> getTweetCountsPerFrequency(
38 String freq, String tweetName, int startTime, int endTime) {
39
40 // Determine the interval size in seconds based on frequency
41 int intervalSeconds = 60; // Default to minute (60 seconds)
42 if ("hour".equals(freq)) {
43 intervalSeconds = 3600; // 60 * 60 seconds
44 } else if ("day".equals(freq)) {
45 intervalSeconds = 86400; // 60 * 60 * 24 seconds
46 }
47
48 // Get the TreeMap for the specified tweet name
49 TreeMap<Integer, Integer> timeMap = tweetData.get(tweetName);
50
51 // Initialize result list
52 List<Integer> result = new ArrayList<>();
53
54 // If tweet doesn't exist, return empty list with appropriate size
55 if (timeMap == null) {
56 for (int currentStart = startTime; currentStart <= endTime; currentStart += intervalSeconds) {
57 result.add(0);
58 }
59 return result;
60 }
61
62 // Iterate through each time interval
63 for (int currentStart = startTime; currentStart <= endTime; currentStart += intervalSeconds) {
64 int tweetCount = 0;
65
66 // Calculate the end of current interval (exclusive)
67 // Ensure it doesn't exceed endTime + 1
68 int currentEnd = Math.min(currentStart + intervalSeconds, endTime + 1);
69
70 // Get all tweets in the current interval [currentStart, currentEnd)
71 // subMap returns a view of the portion of this map whose keys range from
72 // currentStart (inclusive) to currentEnd (exclusive)
73 for (int count : timeMap.subMap(currentStart, currentEnd).values()) {
74 tweetCount += count;
75 }
76
77 result.add(tweetCount);
78 }
79
80 return result;
81 }
82}
83
84/**
85 * Your TweetCounts object will be instantiated and called as such:
86 * TweetCounts obj = new TweetCounts();
87 * obj.recordTweet(tweetName,time);
88 * List<Integer> param_2 = obj.getTweetCountsPerFrequency(freq,tweetName,startTime,endTime);
89 */
90
1class TweetCounts {
2private:
3 // Store tweets for each tweet name, using multiset to keep timestamps sorted and allow duplicates
4 unordered_map<string, multiset<int>> tweetData;
5
6public:
7 TweetCounts() {
8 // Constructor - no initialization needed as containers are default initialized
9 }
10
11 void recordTweet(string tweetName, int time) {
12 // Insert the timestamp for the given tweet name
13 // multiset maintains sorted order automatically
14 tweetData[tweetName].insert(time);
15 }
16
17 vector<int> getTweetCountsPerFrequency(string freq, string tweetName, int startTime, int endTime) {
18 // Determine the interval size based on frequency type
19 int intervalSeconds = 60; // Default to "minute" (60 seconds)
20 if (freq == "hour") {
21 intervalSeconds = 3600; // 60 * 60 seconds
22 } else if (freq == "day") {
23 intervalSeconds = 86400; // 60 * 60 * 24 seconds
24 }
25
26 // Calculate the number of intervals needed
27 // Add 1 because intervals are inclusive
28 int numIntervals = (endTime - startTime) / intervalSeconds + 1;
29 vector<int> result(numIntervals, 0);
30
31 // Find the range of tweets within [startTime, endTime]
32 // lower_bound returns iterator to first element >= startTime
33 auto startIter = tweetData[tweetName].lower_bound(startTime);
34 // upper_bound returns iterator to first element > endTime
35 auto endIter = tweetData[tweetName].upper_bound(endTime);
36
37 // Count tweets in each interval
38 for (auto iter = startIter; iter != endIter; ++iter) {
39 // Calculate which interval this tweet belongs to
40 int intervalIndex = (*iter - startTime) / intervalSeconds;
41 // Increment the count for this interval
42 ++result[intervalIndex];
43 }
44
45 return result;
46 }
47};
48
49/**
50 * Your TweetCounts object will be instantiated and called as such:
51 * TweetCounts* obj = new TweetCounts();
52 * obj->recordTweet(tweetName,time);
53 * vector<int> param_2 = obj->getTweetCountsPerFrequency(freq,tweetName,startTime,endTime);
54 */
55
1// Store tweets for each tweet name, using an array to keep timestamps
2// We'll need to sort when querying since TypeScript doesn't have a built-in sorted set
3const tweetData: Map<string, number[]> = new Map();
4
5/**
6 * Records a tweet with the given name at the specified time
7 * @param tweetName - The name of the tweet
8 * @param time - The timestamp when the tweet was recorded
9 */
10function recordTweet(tweetName: string, time: number): void {
11 // Get or create the array for this tweet name
12 if (!tweetData.has(tweetName)) {
13 tweetData.set(tweetName, []);
14 }
15
16 // Add the timestamp to the array
17 const timestamps = tweetData.get(tweetName)!;
18 timestamps.push(time);
19
20 // Keep the array sorted for efficient range queries
21 // Insert in sorted position to maintain order
22 let insertIndex = timestamps.length - 1;
23 while (insertIndex > 0 && timestamps[insertIndex] < timestamps[insertIndex - 1]) {
24 // Swap elements to maintain sorted order
25 const temp = timestamps[insertIndex];
26 timestamps[insertIndex] = timestamps[insertIndex - 1];
27 timestamps[insertIndex - 1] = temp;
28 insertIndex--;
29 }
30}
31
32/**
33 * Gets the count of tweets per time interval based on the specified frequency
34 * @param freq - The frequency type: "minute", "hour", or "day"
35 * @param tweetName - The name of the tweet to query
36 * @param startTime - The start time of the query range (inclusive)
37 * @param endTime - The end time of the query range (inclusive)
38 * @returns An array of tweet counts for each interval
39 */
40function getTweetCountsPerFrequency(
41 freq: string,
42 tweetName: string,
43 startTime: number,
44 endTime: number
45): number[] {
46 // Determine the interval size in seconds based on frequency type
47 let intervalSeconds: number;
48 if (freq === "minute") {
49 intervalSeconds = 60; // 60 seconds
50 } else if (freq === "hour") {
51 intervalSeconds = 3600; // 60 * 60 seconds
52 } else { // freq === "day"
53 intervalSeconds = 86400; // 60 * 60 * 24 seconds
54 }
55
56 // Calculate the number of intervals needed
57 // Add 1 because intervals are inclusive
58 const numIntervals = Math.floor((endTime - startTime) / intervalSeconds) + 1;
59 const result: number[] = new Array(numIntervals).fill(0);
60
61 // Get the timestamps for this tweet name, return empty result if not found
62 const timestamps = tweetData.get(tweetName);
63 if (!timestamps) {
64 return result;
65 }
66
67 // Binary search to find the start index (first element >= startTime)
68 let startIndex = lowerBound(timestamps, startTime);
69
70 // Binary search to find the end index (first element > endTime)
71 let endIndex = upperBound(timestamps, endTime);
72
73 // Count tweets in each interval
74 for (let i = startIndex; i < endIndex; i++) {
75 // Calculate which interval this tweet belongs to
76 const intervalIndex = Math.floor((timestamps[i] - startTime) / intervalSeconds);
77 // Increment the count for this interval
78 result[intervalIndex]++;
79 }
80
81 return result;
82}
83
84/**
85 * Binary search helper to find the first element >= target
86 * @param arr - Sorted array to search
87 * @param target - Target value to find
88 * @returns Index of first element >= target
89 */
90function lowerBound(arr: number[], target: number): number {
91 let left = 0;
92 let right = arr.length;
93
94 while (left < right) {
95 const mid = Math.floor((left + right) / 2);
96 if (arr[mid] < target) {
97 left = mid + 1;
98 } else {
99 right = mid;
100 }
101 }
102
103 return left;
104}
105
106/**
107 * Binary search helper to find the first element > target
108 * @param arr - Sorted array to search
109 * @param target - Target value to find
110 * @returns Index of first element > target
111 */
112function upperBound(arr: number[], target: number): number {
113 let left = 0;
114 let right = arr.length;
115
116 while (left < right) {
117 const mid = Math.floor((left + right) / 2);
118 if (arr[mid] <= target) {
119 left = mid + 1;
120 } else {
121 right = mid;
122 }
123 }
124
125 return left;
126}
127
Time and Space Complexity
Time Complexity:
__init__()
:O(1)
- Simply initializes a dictionary and a defaultdictrecordTweet()
:O(log n)
wheren
is the number of tweets for the given tweetName - The SortedListadd
operation maintains sorted order using binary search insertiongetTweetCountsPerFrequency()
:O(k * log n)
wherek
is the number of intervals(endTime - startTime) / freq
andn
is the number of tweets for the given tweetName- The while loop runs
k
times wherek = ceiling((endTime - startTime + 1) / freq)
- Each iteration performs two
bisect_left
operations, each takingO(log n)
time - The subtraction
r - l
takesO(1)
time - Therefore, total time is
O(k * log n)
- The while loop runs
Space Complexity:
- Overall space:
O(m * n)
wherem
is the number of unique tweet names andn
is the average number of tweets per name self.d
:O(1)
- Stores only 3 key-value pairsself.data
:O(m * n)
- Stores all tweets wherem
is the number of unique tweet names andn
is the average number of tweets per namerecordTweet()
:O(1)
additional space (not counting the space to store the tweet itself)getTweetCountsPerFrequency()
:O(k)
additional space wherek
is the number of intervals, used for theans
list
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Off-by-One Error with End Time Boundary
One of the most common mistakes is incorrectly handling the inclusive nature of the endTime
parameter. The problem specifies that the time range is [startTime, endTime]
(both inclusive), but when using bisect_left
, you need to search for endTime + 1
to include tweets at exactly endTime
.
Incorrect:
right_index = tweet_timestamps.bisect_left(min(current_time + interval_duration, endTime))
Correct:
right_index = tweet_timestamps.bisect_left(min(current_time + interval_duration, endTime + 1))
2. Importing SortedList Incorrectly
The SortedList
is not a built-in Python data structure. It requires the sortedcontainers
library, which may not be available in all coding environments.
Alternative Solution Using Built-in bisect:
from collections import defaultdict
import bisect
class TweetCounts:
def __init__(self):
self.frequency_duration = {"minute": 60, "hour": 3600, "day": 86400}
self.tweet_data = defaultdict(list)
def recordTweet(self, tweetName: str, time: int) -> None:
# Use bisect.insort to maintain sorted order
bisect.insort(self.tweet_data[tweetName], time)
3. Empty Tweet List Handling
While the current implementation handles empty tweet lists correctly (returning zeros), explicitly checking can make the code more readable and potentially more efficient:
def getTweetCountsPerFrequency(self, freq: str, tweetName: str, startTime: int, endTime: int) -> List[int]:
interval_duration = self.frequency_duration[freq]
# Early return for non-existent tweet names
if tweetName not in self.tweet_data:
num_intervals = (endTime - startTime) // interval_duration + 1
return [0] * num_intervals
tweet_timestamps = self.tweet_data[tweetName]
# ... rest of the logic
4. Misunderstanding Chunk Boundaries
A common misconception is that chunks should be aligned to absolute time boundaries (e.g., minute 0-59, 60-119). However, chunks actually start from startTime
, not from time 0.
Example: If startTime=10
and freq="minute"
:
- First chunk:
[10, 69]
(NOT[10, 59]
) - Second chunk:
[70, 129]
(NOT[60, 119]
)
5. Integer Overflow in Large Time Ranges
For extremely large time ranges with "minute" frequency, the number of chunks could be very large, potentially causing memory issues:
def getTweetCountsPerFrequency(self, freq: str, tweetName: str, startTime: int, endTime: int) -> List[int]:
interval_duration = self.frequency_duration[freq]
# Add validation to prevent excessive memory usage
max_chunks = (endTime - startTime) // interval_duration + 1
if max_chunks > 10**6: # Arbitrary large limit
raise ValueError("Time range too large for given frequency")
# ... rest of the implementation
6. Not Handling Duplicate Timestamps
The current solution correctly handles duplicate timestamps (multiple tweets at the same time), but it's worth noting that using a regular list with bisect
would also handle this correctly. The key is understanding that both bisect_left
and bisect_right
would give different results when duplicates exist, and choosing the appropriate one based on your needs.
How many times is a tree node visited in a depth first search?
Recommended Readings
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Don’t Miss This!