1348. Tweet Counts Per Frequency
Problem Description
The problem requires us to design a class TweetCounts
that can track the number of tweets posted on a social media platform and allow querying the number of tweets that occur in specified periods of time. These periods are partitioned into smaller time chunks that are either every minute, every hour, or every day.
The main operations that we need to support are:
-
recordTweet(tweetName, time)
: This method records a tweet at a given time for a particular tweet name. The assumed time unit is seconds. -
getTweetCountsPerFrequency(freq, tweetName, startTime, endTime)
: This method retrieves the number of tweets for a giventweetName
within a specified time period[startTime, endTime]
. The result will be a list where each element corresponds to the number of tweets in a time chunk of the size specified byfreq
. Thefreq
parameter can be "minute", "hour", or "day", which corresponds to time chunks of 60 seconds, 3600 seconds, and 86400 seconds, respectively.
The problem also specifies that if the last chunk of time does not fit perfectly into the frequency's chunk size, it should end with the end time of the period, and it may be shorter.
The goal is to implement the functionality of this class such that the queries can be answered efficiently.
Intuition
For the recordTweet
method, we require an efficient means to store and access timestamps of tweets given their names. A defaultdict
of SortedList
from the sortedcontainers
module is utilized for this purpose because it allows us to maintain a sorted list of tweet timestamps for each tweet name, which is optimal for the range queries we need to perform later.
The getTweetCountsPerFrequency
method is where the core of our task lies. Since we have the timestamps sorted, we can quickly find the number of tweets in a specific time range using binary search. The SortedList
class has a bisect_left
method which can find the index of the first timestamp greater than or equal to a given start time of the chunk quickly, and similarly for the end time of the chunk. By using these indices, we easily compute the number of tweets within this time chunk.
For each time chunk in the period [startTime, endTime]
, determined by the frequency freq
, we find the number of tweets in that chunk by doing a binary search for the start and end of the chunk. The difference between the indices gives us the count of tweets within that chunk. This step is continued until we cover the entire period from startTime
to endTime
.
As chunks are usually fixed except for the last one, we must handle the case where the last chunk may not align perfectly with the time frequency. This is accomplished by taking the minimum of the start time plus the frequency and endTime + 1
for the binary search.
This approach leverages the fact that binary search on a sorted list takes O(log n) time, making the overall process efficient for the range queries we need to perform.
Learn more about Binary Search and Sorting patterns.
Solution Approach
The implementation uses two key data structures: a dictionary (defaultdict
from collections
) paired with a sorted list (SortedList
from sortedcontainers
). Here's how each component plays a role in the solution:
Data Structures Used
-
defaultdict(SortedList)
: Thedefaultdict
is initialized withSortedList
, which means that every key in this dictionary will have aSortedList
as its value by default. This is a sophisticated choice because it allows us to automatically handle the insertion of timestamps for eachtweetName
without having to check if the key already exists. ASortedList
maintains the elements in sorted order, enabling binary search operations out of the box. -
self.d
: This is a simple dictionary that translates the frequency from its string representation ("minute", "hour", "day") into the number of seconds (60, 3600, 86400), easing the calculations for our chunks.
Algorithms and Patterns
recordTweet
Method
- Insertion in Sorted Order: The
recordTweet
method simply inserts the timestamp into the appropriateSortedList
for the giventweetName
. TheSortedList
takes care of maintaining the order upon insertion. The insertion time is logarithmic relative to the number of elements already in the list because the underlying structure ofSortedList
allows for efficient insertion.
getTweetCountsPerFrequency
Method
-
Binary Search: To count tweets in each time chunk, binary search is utilized. We use
self.data[tweetName].bisect_left(t)
to find the left boundary (inclusive) of the current time chunk andself.data[tweetName].bisect_left(min(t + f, endTime + 1))
to find the right boundary (exclusive) of the chunk. Thebisect_left
function ofSortedList
is a binary search that returns the index where the given element (time in this case) would be inserted in the list. Since the list is in sorted order, the indices between these two boundaries contain all timestamps within our time chunk. -
Iteration over Time Chunks: The method uses a
while
loop to traverse the entire time range[startTime, endTime]
in increments of the frequency's chunk sizef
. During each iteration, it calculates the counts using binary search as described and appends it to the answer listans
. -
Handling End of Time Range: For the binary search of the right boundary, we must ensure that we don't go beyond
endTime
. Hencemin(t + f, endTime + 1)
is used to get the exclusive end of the time chunk. The+1
ensures that if a tweet occurred precisely atendTime
, it will be included in the last chunk.
The final output is the list of tweet counts per frequency chunk which is obtained by executing the above algorithm for each frequency chunk in our specified time range.
This approach is efficient provided that the amount of tweets for a given name does not grow too large, as it relies on binary search which has a logarithmic time complexity. Since the requirements of the problem guarantee that there will be at most 10^4
calls in total to both recordTweet
and getTweetCountsPerFrequency
, the solution is well suited for the problem's constraints.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's illustrate the solution approach by walking through a small example with a hypothetical TweetCounts
object.
-
Initialization
We first create an instance of the
TweetCounts
class:1tweet_counts = TweetCounts()
At this point, our data structure is empty, waiting for tweets to be recorded.
-
Recording Tweets
Let's record some tweets:
1tweet_counts.recordTweet("tweet1", 15) # A tweet named "tweet1" at time=15 seconds 2tweet_counts.recordTweet("tweet1", 60) # Another "tweet1" at time=60 seconds 3tweet_counts.recordTweet("tweet1", 300) # Another "tweet1" at time=300 seconds (5 minutes)
The
defaultdict(SortedList)
data structure in ourTweetCounts
object now looks like this:1{ 2 "tweet1": SortedList([15, 60, 300]) 3}
Each time a tweet is recorded, it's added to the
SortedList
corresponding to its name. The list always remains sorted. -
Querying Tweet Counts
Now, let's say we want to get the count of "tweet1" every minute from time=0 to time=10 minutes (600 seconds):
1counts = tweet_counts.getTweetCountsPerFrequency("minute", "tweet1", 0, 600)
To process this, the program will:
- Convert "minute" into its seconds representation using
self.d
. Here, 1 minute equals 60 seconds. - It starts at
startTime=0
and will run whilestartTime <= endTime
.
Processing the first chunk (0-60 seconds):
- Perform a binary search to find the number of tweets between time=0 and time=60.
- The
bisect_left
method finds the index for time=0 and time=60 in theSortedList
, yielding0
and2
because timestamps15
and60
fall within this range. Thus, there are2 - 0 = 2
tweets in this time chunk.
Processing the second chunk (60-120 seconds):
- Since there's no new tweet until time=300, the binary searches for these middle chunks will reveal
0
tweets each.
Processing subsequent chunks until (300-360 seconds):
- When processing the chunk with time=300, the binary search will find indices for
300
and360
, yielding indices2
and3
and telling us there's1
tweet within this time chunk.
Continuing this process for each minute up to 10 minutes will give us our final counts list.
The final result is:
1counts = [2, 0, 0, 0, 0, 1, 0, 0, 0, 0] # Counts of tweets for every minute chunk up to 10 minutes
- Convert "minute" into its seconds representation using
-
Handling the End Time Range
If the
endTime
were at 250 seconds and we still query per minute, we would only need the counts up until the time chunk that includes 250 seconds:1counts = tweet_counts.getTweetCountsPerFrequency("minute", "tweet1", 0, 250)
This would give us a shorter list, stopping at the chunks that include 250 seconds:
1counts = [2, 0, 0, 0, 1] # Counts of tweets for every minute chunk until 4 minutes and 10 seconds
The count for the last chunk (240-250 seconds) is again found by binary searching the indices for
240
and250
in theSortedList
.
By following the prescribed steps, the TweetCounts
class effectively tracks and counts tweets over various time frequencies. The implementation assures that queries regarding tweet counts over different time chunks are answered efficiently.
Solution Implementation
1from sortedcontainers import SortedList
2from collections import defaultdict
3from typing import List
4
5class TweetCounts:
6 def __init__(self):
7 # A dictionary to map the frequency names to their respective seconds
8 self.frequency_to_seconds = {"minute": 60, "hour": 3600, "day": 86400}
9 # Defaultdict to store the sorted list of tweet timestamps for each tweet name
10 self.tweets_data = defaultdict(SortedList)
11
12 def record_tweet(self, tweet_name: str, time: int) -> None:
13 """
14 Record a tweet at a given time.
15
16 :param tweet_name: Name of the tweet
17 :param time: The timestamp at which the tweet was posted
18 """
19 # Add the time to the sorted list for the specified tweet name
20 self.tweets_data[tweet_name].add(time)
21
22 def get_tweet_counts_per_frequency(
23 self, freq: str, tweet_name: str, start_time: int, end_time: int
24 ) -> List[int]:
25 """
26 Retrieve the count of tweets in each frequency interval.
27
28 :param freq: Frequency interval ('minute', 'hour', 'day')
29 :param tweet_name: The name of the tweet for which to retrieve counts
30 :param start_time: Start timestamp for the range
31 :param end_time: End timestamp for the range
32 :return: A list containing the number of tweets in each interval
33 """
34 # Convert frequency to seconds using the pre-defined dictionary
35 frequency_in_seconds = self.frequency_to_seconds[freq]
36 # Retrieve the list of tweet timestamps for the given tweet name
37 tweets = self.tweets_data[tweet_name]
38 current_time = start_time
39 counts = []
40
41 # Loop over each interval
42 while current_time <= end_time:
43 # Find the starting index for this interval
44 left_index = tweets.bisect_left(current_time)
45 # Find the ending index for this interval, ensuring it does not exceed `end_time`
46 right_index = tweets.bisect_left(min(current_time + frequency_in_seconds, end_time + 1))
47 # Append the count of tweets in the current interval to the results list
48 counts.append(right_index - left_index)
49 # Move to the next interval
50 current_time += frequency_in_seconds
51
52 return counts
53
54# Demonstrate how the TweetCounts class could be instantiated and used
55# obj = TweetCounts()
56# obj.record_tweet(tweet_name, time)
57# tweet_counts = obj.get_tweet_counts_per_frequency(freq, tweet_name, start_time, end_time)
58
1import java.util.ArrayList;
2import java.util.HashMap;
3import java.util.List;
4import java.util.Map;
5import java.util.TreeMap;
6
7/**
8 * The TweetCounts class tracks the number of tweets that occur and supports
9 * querying the number of tweets in a specified time range and frequency.
10 */
11class TweetCounts {
12 // Map storing tweet names as keys and another TreeMap as values,
13 // which maps a timestamp to the number of tweets at that time.
14 private Map<String, TreeMap<Integer, Integer>> tweetData = new HashMap<>();
15
16 /**
17 * Constructor for TweetCounts.
18 */
19 public TweetCounts() {
20 // No initialization needed since tweetData is already initialized.
21 }
22
23 /**
24 * Record a tweet at a given time.
25 *
26 * @param tweetName the name of the tweet
27 * @param time the timestamp the tweet occurred
28 */
29 public void recordTweet(String tweetName, int time) {
30 tweetData.putIfAbsent(tweetName, new TreeMap<>());
31 TreeMap<Integer, Integer> timeMap = tweetData.get(tweetName);
32 timeMap.put(time, timeMap.getOrDefault(time, 0) + 1);
33 }
34
35 /**
36 * Retrieves the count of tweets per frequency in a time range.
37 *
38 * @param frequency the frequency ("minute", "hour", or "day")
39 * @param tweetName the name of the tweet
40 * @param startTime the start of the time range (inclusive)
41 * @param endTime the end of the time range (inclusive)
42 * @return a list of tweet counts per frequency
43 */
44 public List<Integer> getTweetCountsPerFrequency(
45 String frequency, String tweetName, int startTime, int endTime) {
46 // Convert frequency to seconds
47 int intervalDurationInSeconds = 60; // Default to minute
48 if ("hour".equals(frequency)) {
49 intervalDurationInSeconds = 3600; // 60 minutes * 60 seconds
50 } else if ("day".equals(frequency)) {
51 intervalDurationInSeconds = 86400; // 24 hours * 60 minutes * 60 seconds
52 }
53
54 TreeMap<Integer, Integer> timeMap = tweetData.get(tweetName);
55 List<Integer> counts = new ArrayList<>(); // Holds the final counts
56
57 // Iterate through time ranges and count tweets
58 for (int i = startTime; i <= endTime; i += intervalDurationInSeconds) {
59 int sum = 0;
60 // Calculate end time for this interval, ensuring we don't exceed endTime
61 int intervalEnd = Math.min(i + intervalDurationInSeconds, endTime + 1);
62 // Sum tweet counts within the current interval
63 for (int count : timeMap.subMap(i, intervalEnd).values()) {
64 sum += count;
65 }
66 counts.add(sum);
67 }
68 return counts;
69 }
70}
71
72/**
73 * The following code showcases how the TweetCounts class may be used.
74 * It is not part of the TweetCounts class itself.
75 */
76/*
77TweetCounts tweetCounter = new TweetCounts();
78tweetCounter.recordTweet("tweet1", 10);
79List<Integer> counts = tweetCounter.getTweetCountsPerFrequency("minute", "tweet1", 0, 59);
80*/
81
1#include <vector>
2#include <string>
3#include <unordered_map>
4#include <set>
5using namespace std;
6
7// Define the TweetCounts class to store and fetch tweet counts per frequency
8class TweetCounts {
9public:
10 // Constructor
11 TweetCounts() {
12 // No initialization needed for the unordered_map here
13 }
14
15 // Record the time at which the tweet with given name occurs
16 void recordTweet(const string& tweetName, int time) {
17 tweets[tweetName].insert(time);
18 }
19
20 // Fetch the tweet count per specified frequency for a tweet with given name in a time range
21 vector<int> getTweetCountsPerFrequency(const string& freq, const string& tweetName, int startTime, int endTime) {
22 // Frequency in seconds: Defaults to a minute
23 int freqInSeconds = 60;
24 if (freq == "hour")
25 freqInSeconds = 3600; // An hour in seconds
26 else if (freq == "day")
27 freqInSeconds = 86400; // A day in seconds
28
29 // Calculate the number of intervals in the specified time range
30 int intervalCount = (endTime - startTime) / freqInSeconds + 1;
31
32 // Prepare the answer vector with the interval count, pre-filled with zeroes
33 vector<int> ans(intervalCount, 0);
34
35 // Fetch all tweet times for the specified tweetName and within the given time range
36 auto itLow = tweets[tweetName].lower_bound(startTime); // First not less than startTime
37 auto itUp = tweets[tweetName].upper_bound(endTime); // First greater than endTime
38
39 // Count the tweets in each interval
40 for (auto it = itLow; it != itUp; ++it) {
41 // Determine the interval and increment the corresponding counter
42 int intervalIndex = (*it - startTime) / freqInSeconds;
43 ans[intervalIndex]++;
44 }
45
46 return ans;
47 }
48
49private:
50 // Data structure to hold tweet times for each tweet name
51 unordered_map<string, multiset<int>> tweets;
52};
53
54/**
55 * Your TweetCounts object will be instantiated and called as such:
56 * TweetCounts* obj = new TweetCounts();
57 * obj->recordTweet(tweetName, time);
58 * vector<int> param_2 = obj->getTweetCountsPerFrequency(freq, tweetName, startTime, endTime);
59 */
60
1type TweetName = string;
2type Time = number;
3type Frequency = "minute" | "hour" | "day";
4
5// Store tweet times for each tweet name
6let tweets = new Map<TweetName, Set<Time>>();
7
8// Record the time at which the tweet with given name occurs
9function recordTweet(tweetName: TweetName, time: Time): void {
10 if (!tweets.has(tweetName)) {
11 tweets.set(tweetName, new Set<Time>());
12 }
13 tweets.get(tweetName)!.add(time);
14}
15
16// Fetch the tweet count per specified frequency for a tweet with given name in a time range
17function getTweetCountsPerFrequency(freq: Frequency, tweetName: TweetName, startTime: Time, endTime: Time): number[] {
18 // Convert frequency to seconds
19 const freqInSeconds = convertFrequencyToSeconds(freq);
20
21 // Calculate the number of intervals
22 const intervalCount = Math.floor((endTime - startTime) / freqInSeconds) + 1;
23
24 // Prepare the answer array with the interval count, pre-filled with zeroes
25 const ans: number[] = new Array(intervalCount).fill(0);
26
27 // Handle the case where there are no tweets recorded with the given name
28 if (!tweets.has(tweetName)) {
29 return ans;
30 }
31
32 // Fetch all tweet times for the specified tweetName and within the given time range
33 const allTweetTimes = Array.from(tweets.get(tweetName)!);
34 const relevantTweetTimes = allTweetTimes.filter(time => time >= startTime && time <= endTime);
35
36 // Count the tweets in each interval
37 relevantTweetTimes.forEach(time => {
38 const intervalIndex = Math.floor((time - startTime) / freqInSeconds);
39 ans[intervalIndex]++;
40 });
41
42 return ans;
43}
44
45// Helper function to convert frequency string to seconds
46function convertFrequencyToSeconds(freq: Frequency): number {
47 switch (freq) {
48 case "minute": return 60;
49 case "hour": return 3600;
50 case "day": return 86400;
51 default: return 60;
52 }
53}
54
55// Example of usage:
56// recordTweet("tweet1", 10);
57// recordTweet("tweet1", 20);
58// getTweetCountsPerFrequency("minute", "tweet1", 0, 59); // Should return [2]
59
Time and Space Complexity
Time Complexity:
__init__
Method:
- Time Complexity is
O(1)
for initializing the dictionary and other variables.
recordTweet
Method:
- Inserting an element into a SortedList costs
O(log n)
time, wheren
is the number of elements currently in the list for the respectivetweetName
. This is due to the binary search used to find the correct position and the insertion operation which on average takesO(log n)
time in a balanced binary search tree.
getTweetCountsPerFrequency
Method:
- The time spent in this method is dominated by the while loop and the bisect operations.
- The while loop runs once for each interval of size
f = self.d[freq]
in the range fromstartTime
toendTime
. There are(endTime - startTime) / f + 1
such intervals. - Within the loop, two bisect operations (actually
bisect_left
) are performed. Eachbisect_left
takesO(log m)
time, wherem
is the number of recorded times for thetweetName
. - Hence, the total time complexity for the
getTweetCountsPerFrequency
method isO((endTime - startTime)/f * log m)
.
Space Complexity:
__init__
Method:
- Space Complexity is
O(1)
for initializing the dictionary and other variables that do not depend on the input size.
recordTweet
Method:
- The space complexity is dominated by the space used to store the tweet times in the
SortedList
. In the worst case, if alln
recordTweet operations are for the sametweetName
, the space complexity would beO(n)
.
getTweetCountsPerFrequency
Method:
- The space complexity is
O(k)
, wherek
is the number of intervals. This is because we need to store a count for each interval.
Combining the space complexities from the different parts of the class, the overall space complexity of the TweetCounts
class is O(n + k)
, where n
is the total number of tweet times recorded, and k
is the number of intervals calculated in a single call to getTweetCountsPerFrequency
.
Learn more about how to find time and space complexity quickly using problem constraints.
What data structure does Breadth-first search typically uses to store intermediate states?
Recommended Readings
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Patterns The Shortest Path Algorithm for Coding Interviews The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we can determine the
Got a question? Ask the Monster Assistant anything you don't understand.
Still not clear? Submit the part you don't understand to our editors. Or join our Discord and ask the community.