3369. Design an Array Statistics Tracker π
Problem Description
Design a data structure called StatisticsTracker
that efficiently handles a stream of integer inputs and supports the following operations:
- Initialization: Create an empty
StatisticsTracker
. - addNumber(int number): Insert a number into the data structure.
- removeFirstAddedNumber(): Remove the first number added to the structure.
- getMean(): Return the floored mean of the numbers currently in the structure.
- getMedian(): Return the median value of the numbers.
- getMode(): Return the mode of the numbers. If there are multiple modes, return the smallest one.
Key points related to the calculations:
- The mean is the sum of all numbers divided by the count of numbers.
- The median is the middle element in a sorted array, or the larger of two middle elements if there is an even number of total elements.
- The mode is the value that appears most frequently. In case of a tie, the smallest value is chosen.
Intuition
To solve this problem efficiently, a combination of data structures is used to manage the numbers and perform operations in an optimal manner:
-
Queue (
q
): Used to track the order of input numbers, allowing easy removal of the first number added throughremoveFirstAddedNumber()
. -
Sum (
s
): Maintains the sum of all current numbers to quickly calculate the mean withgetMean()
. Updating this sum when numbers are added or removed allows the mean to be quickly recalculated. -
Frequency Count (
cnt
): A hash table tracks the frequency of each number, making it possible to determine the mode withgetMode()
. This frequency information is essential for both adding numbers and removing them. -
Ordered Sets (
sl
andsl2
):sl
: Keeps numbers in a sorted order to enable quick retrieval of the median withgetMedian()
. This structure supports efficient insertion, removal, and access operations needed for median calculations.sl2
: Maintains numbers and their frequencies, sorted by frequency in descending order and by value in ascending order. This ordered structure ensures that the mode can be quickly accessed.
By integrating these structures, the solution achieves efficient management of the data stream and ensures that each query can be handled in optimal time complexity, mostly through logarithmic operations due to the underlying data structures' efficient management of sorted order and frequency.
Learn more about Queue, Binary Search, Data Stream and Heap (Priority Queue) patterns.
Solution Approach
The solution leverages several data structures to efficiently manage the operations on the StatisticsTracker
:
-
[Queue](/problems/queue_intro) (q)
:- Utilized to keep track of the order of numbers as they are added. This allows easy access to the first added number when using
removeFirstAddedNumber()
. - Operations such as
append
andpopleft
are fundamental for adding and removing elements in constant time.
- Utilized to keep track of the order of numbers as they are added. This allows easy access to the first added number when using
-
Sum (s)
:- A running total of all numbers in the structure to facilitate the quick calculation of the mean.
- On each addition (
addNumber
) and removal (removeFirstAddedNumber
), the sum is updated accordingly.
-
Frequency Count (cnt)
:- A
defaultdict(int)
is used to keep a count of occurrences for each number, aiding in determining the mode. - Counts are incremented or decremented based on the operations being performed.
- A
-
Ordered Sets (sl and sl2)
:sl
: ASortedList
that maintains all numbers in sorted order. Accessing elements by index from this list allows for efficient median retrieval, usingself.sl[len(self.q) // 2]
to get the median position.sl2
: AnotherSortedList
, but sorted by frequency (in descending order) and then by value (in ascending order) using a custom key. This facilitates constant time access to the mode by simply looking at the first element.
Detailed Operation Breakdown
-
addNumber(int number)
:- Append the number to
q
. - Add the number to
sl
. - Update frequency in
cnt
and adjust entries insl2
to reflect the frequency change. - Increment the sum
s
by the number.
- Append the number to
-
removeFirstAddedNumber()
:- Remove the front entry from
q
. - Remove this number from
sl
. - Update frequency in
cnt
and adjustsl2
accordingly. - Subtract this number from the sum
s
.
- Remove the front entry from
-
getMean()
:- Return the floored mean by dividing
s
by the size ofq
using integer division.
- Return the floored mean by dividing
-
getMedian()
:- Return the middle element from
sl
.
- Return the middle element from
-
getMode()
:- Return the first element of
sl2
, which is structured to provide the mode efficiently.
- Return the first element of
This approach smartly uses SortedList
for maintaining sorted order and quick retrieval, combined with defaultdict
for managing frequencies, achieving a comprehensive and efficient solution for mean, median, and mode calculations.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's illustrate the solution approach using a small example. Assume we initialize a StatisticsTracker
and process a sequence of numbers: 5, 3, 8, and 5.
-
Initialization:
- Create an empty
StatisticsTracker
. - Structures:
q = []
,s = 0
,cnt = {}
,sl = []
,sl2 = []
- Create an empty
-
addNumber(5):
- Add 5 to the queue
q
:q = [5]
- Add 5 to the sorted list
sl
:sl = [5]
- Update frequency:
cnt = {5: 1}
- Update sorted frequencies:
sl2 = [(1, 5)]
- Update sum:
s = 5
- Add 5 to the queue
-
addNumber(3):
- Add 3 to the queue
q
:q = [5, 3]
- Add 3 into
sl
to maintain sorted order:sl = [3, 5]
- Update frequency:
cnt = {5: 1, 3: 1}
- Update sorted frequencies:
sl2 = [(1, 3), (1, 5)]
- Update sum:
s = 8
- Add 3 to the queue
-
addNumber(8):
- Add 8 to the queue
q
:q = [5, 3, 8]
- Add 8 into
sl
maintaining sorted order:sl = [3, 5, 8]
- Update frequency:
cnt = {5: 1, 3: 1, 8: 1}
- Update sorted frequencies:
sl2 = [(1, 3), (1, 5), (1, 8)]
- Update sum:
s = 16
- Add 8 to the queue
-
addNumber(5):
- Add another 5 to the queue
q
:q = [5, 3, 8, 5]
- Add 5 to
sl
:sl = [3, 5, 5, 8]
- Update frequency:
cnt = {5: 2, 3: 1, 8: 1}
- Update sorted frequencies:
sl2 = [(1, 3), (1, 8), (2, 5)]
(sorted by frequency) - Update sum:
s = 21
- Add another 5 to the queue
-
getMean():
- Calculate mean:
mean = 21 // 4 = 5
- Calculate mean:
-
getMedian():
- Retrieve median from
sl
:median = sl[4 // 2] = sl[2] = 5
- Retrieve median from
-
getMode():
- Retrieve mode from
sl2
:mode = sl2[0][1] = 5
- Retrieve mode from
-
removeFirstAddedNumber():
- Remove the first added number 5 from
q
:q = [3, 8, 5]
- Remove 5 from
sl
:sl = [3, 5, 8]
- Update frequency and sorted list:
cnt = {5: 1, 3: 1, 8: 1}
,sl2 = [(1, 3), (1, 5), (1, 8)]
- Update sum:
s = 16
- Remove the first added number 5 from
The above walkthrough demonstrates the organized use of multiple data structures to efficiently manage a data stream while supporting quick calculations of statistical measures.
Solution Implementation
1from collections import deque, defaultdict
2from sortedcontainers import SortedList
3
4class StatisticsTracker:
5 def __init__(self):
6 # Queue to maintain the order of added numbers
7 self.queue = deque()
8 # Variable to maintain the sum of added numbers
9 self.total_sum = 0
10 # Dictionary to keep count of each number
11 self.count = defaultdict(int)
12 # Sorted list to efficiently find median
13 self.sorted_list = SortedList()
14 # Sorted list with custom sorting for efficient mode calculation
15 self.sorted_mode_list = SortedList(key=lambda x: (-x[1], x[0]))
16
17 def addNumber(self, number: int) -> None:
18 # Add number to queue
19 self.queue.append(number)
20 # Add number to sorted list to maintain order
21 self.sorted_list.add(number)
22
23 # Update mode-related data
24 # Remove old count if it exists
25 self.sorted_mode_list.discard((number, self.count[number]))
26 # Increment count of the current number
27 self.count[number] += 1
28 # Add new count
29 self.sorted_mode_list.add((number, self.count[number]))
30
31 # Add number to total sum for mean calculation
32 self.total_sum += number
33
34 def removeFirstAddedNumber(self) -> None:
35 # Remove number from the front of the queue
36 number = self.queue.popleft()
37 # Remove number from sorted list
38 self.sorted_list.remove(number)
39
40 # Update mode-related data
41 # Remove old count
42 self.sorted_mode_list.discard((number, self.count[number]))
43 # Decrement count of the current number
44 self.count[number] -= 1
45 # Add new count
46 self.sorted_mode_list.add((number, self.count[number]))
47
48 # Subtract number from total sum
49 self.total_sum -= number
50
51 def getMean(self) -> int:
52 # Calculate mean by dividing total sum by the length of the queue
53 return self.total_sum // len(self.queue)
54
55 def getMedian(self) -> int:
56 # Retrieve median from the middle of the sorted list
57 return self.sorted_list[len(self.queue) // 2]
58
59 def getMode(self) -> int:
60 # Retrieve the number with the highest frequency
61 return self.sorted_mode_list[0][0]
62
63# Example usage:
64# tracker = StatisticsTracker()
65# tracker.addNumber(number)
66# tracker.removeFirstAddedNumber()
67# mean = tracker.getMean()
68# median = tracker.getMedian()
69# mode = tracker.getMode()
70
1import java.util.*;
2
3// Class to continuously find the median using two heaps
4class MedianFinder {
5
6 // Max heap to keep the smaller half of the numbers
7 private final PriorityQueue<Integer> smallHeap = new PriorityQueue<>(Comparator.reverseOrder());
8
9 // Min heap to keep the larger half of the numbers
10 private final PriorityQueue<Integer> largeHeap = new PriorityQueue<>();
11
12 // Map to track numbers that need to be removed
13 private final Map<Integer, Integer> delayed = new HashMap<>();
14
15 // Size variables for both heaps
16 private int sizeSmallHeap;
17 private int sizeLargeHeap;
18
19 // Adds a number into data structure
20 public void addNum(int num) {
21 // Add the number to the appropriate heap and adjust sizes
22 if (smallHeap.isEmpty() || num <= smallHeap.peek()) {
23 smallHeap.offer(num);
24 ++sizeSmallHeap;
25 } else {
26 largeHeap.offer(num);
27 ++sizeLargeHeap;
28 }
29 rebalanceHeaps();
30 }
31
32 // Returns the median of current data stream
33 public Integer findMedian() {
34 return sizeSmallHeap == sizeLargeHeap ? largeHeap.peek() : smallHeap.peek();
35 }
36
37 // Removes a number from data structure
38 public void removeNum(int num) {
39 delayed.merge(num, 1, Integer::sum);
40 // Adjust size and prune if needed
41 if (num <= smallHeap.peek()) {
42 --sizeSmallHeap;
43 if (num == smallHeap.peek()) {
44 pruneHeap(smallHeap);
45 }
46 } else {
47 --sizeLargeHeap;
48 if (num == largeHeap.peek()) {
49 pruneHeap(largeHeap);
50 }
51 }
52 rebalanceHeaps();
53 }
54
55 // Removes all delayed elements from the heap
56 private void pruneHeap(PriorityQueue<Integer> heap) {
57 while (!heap.isEmpty() && delayed.containsKey(heap.peek())) {
58 if (delayed.merge(heap.peek(), -1, Integer::sum) == 0) {
59 delayed.remove(heap.peek());
60 }
61 heap.poll();
62 }
63 }
64
65 // Ensures heaps are balanced such that |smallHeap-size - largeHeap-size| <= 1
66 private void rebalanceHeaps() {
67 if (sizeSmallHeap > sizeLargeHeap + 1) {
68 largeHeap.offer(smallHeap.poll());
69 --sizeSmallHeap;
70 ++sizeLargeHeap;
71 pruneHeap(smallHeap);
72 } else if (sizeSmallHeap < sizeLargeHeap) {
73 smallHeap.offer(largeHeap.poll());
74 --sizeLargeHeap;
75 ++sizeSmallHeap;
76 pruneHeap(largeHeap);
77 }
78 }
79}
80
81// Class to track statistics: mean, median, and mode
82class StatisticsTracker {
83
84 private final Deque<Integer> numbersQueue = new ArrayDeque<>();
85 private long sum;
86 private final Map<Integer, Integer> countMap = new HashMap<>();
87 private final MedianFinder medianFinder = new MedianFinder();
88
89 // TreeSet to store the frequency of each number for finding mode
90 private final TreeSet<int[]> modeSet = new TreeSet<>(
91 (a, b) -> a[1] == b[1] ? a[0] - b[0] : b[1] - a[1]
92 );
93
94 // Constructor
95 public StatisticsTracker() {
96 }
97
98 // Adds a number to the statistics tracker
99 public void addNumber(int number) {
100 numbersQueue.offerLast(number);
101 sum += number;
102 modeSet.remove(new int[] {number, countMap.getOrDefault(number, 0)});
103 countMap.merge(number, 1, Integer::sum);
104 medianFinder.addNum(number);
105 modeSet.add(new int[] {number, countMap.get(number)});
106 }
107
108 // Removes the first added number
109 public void removeFirstAddedNumber() {
110 int number = numbersQueue.pollFirst();
111 sum -= number;
112 modeSet.remove(new int[] {number, countMap.get(number)});
113 countMap.merge(number, -1, Integer::sum);
114 medianFinder.removeNum(number);
115 modeSet.add(new int[] {number, countMap.get(number)});
116 }
117
118 // Returns the mean of the numbers
119 public int getMean() {
120 return (int) (sum / numbersQueue.size());
121 }
122
123 // Returns the median of the numbers
124 public int getMedian() {
125 return medianFinder.findMedian();
126 }
127
128 // Returns the mode of the numbers
129 public int getMode() {
130 return modeSet.first()[0];
131 }
132}
133
1#include <queue>
2#include <unordered_map>
3#include <set>
4#include <vector>
5using namespace std;
6
7class MedianFinder {
8public:
9 // Add a number to the data structure
10 void addNum(int num) {
11 if (minHeap.empty() || num <= minHeap.top()) {
12 minHeap.push(num);
13 ++minHeapSize;
14 } else {
15 maxHeap.push(num);
16 ++maxHeapSize;
17 }
18 rebalance();
19 }
20
21 // Remove a number from the data structure
22 void removeNum(int num) {
23 ++delayed[num];
24 if (num <= minHeap.top()) {
25 --minHeapSize;
26 if (num == minHeap.top()) {
27 prune(minHeap);
28 }
29 } else {
30 --maxHeapSize;
31 if (num == maxHeap.top()) {
32 prune(maxHeap);
33 }
34 }
35 rebalance();
36 }
37
38 // Find the median of the current numbers
39 int findMedian() {
40 return minHeapSize == maxHeapSize ? maxHeap.top() : minHeap.top();
41 }
42
43private:
44 // Use a max-heap to keep track of the smaller half of numbers
45 priority_queue<int> minHeap;
46 // Use a min-heap to keep track of the larger half of numbers
47 priority_queue<int, vector<int>, greater<int>> maxHeap;
48 // Map to keep track of delayed removals
49 unordered_map<int, int> delayed;
50 // Track sizes of the heaps
51 int minHeapSize = 0;
52 int maxHeapSize = 0;
53
54 // Remove numbers that are delayed from the heap
55 template <typename T>
56 void prune(T& heap) {
57 while (!heap.empty() && delayed[heap.top()]) {
58 if (--delayed[heap.top()] == 0) {
59 delayed.erase(heap.top());
60 }
61 heap.pop();
62 }
63 }
64
65 // Rebalance the heaps if necessary
66 void rebalance() {
67 if (minHeapSize > maxHeapSize + 1) {
68 maxHeap.push(minHeap.top());
69 minHeap.pop();
70 --minHeapSize;
71 ++maxHeapSize;
72 prune(minHeap);
73 } else if (minHeapSize < maxHeapSize) {
74 minHeap.push(maxHeap.top());
75 maxHeap.pop();
76 ++minHeapSize;
77 --maxHeapSize;
78 prune(maxHeap);
79 }
80 }
81};
82
83class StatisticsTracker {
84private:
85 queue<int> numberQueue;
86 long long sum = 0;
87 unordered_map<int, int> count;
88 MedianFinder medianFinder;
89 // Ordered set to track numbers by frequency (mode calculations)
90 set<pair<int, int>> frequencySet;
91
92public:
93 StatisticsTracker() {}
94
95 // Add a number to the tracker
96 void addNumber(int number) {
97 numberQueue.push(number);
98 sum += number;
99
100 frequencySet.erase({-count[number], number});
101 ++count[number];
102 frequencySet.insert({-count[number], number});
103
104 medianFinder.addNum(number);
105 }
106
107 // Remove the oldest added number from the tracker
108 void removeFirstAddedNumber() {
109 int number = numberQueue.front();
110 numberQueue.pop();
111 sum -= number;
112
113 frequencySet.erase({-count[number], number});
114 --count[number];
115 if (count[number] > 0) {
116 frequencySet.insert({-count[number], number});
117 }
118
119 medianFinder.removeNum(number);
120 }
121
122 // Calculate and return the mean of the current numbers
123 int getMean() {
124 return static_cast<int>(sum / numberQueue.size());
125 }
126
127 // Return the median of the current numbers
128 int getMedian() {
129 return medianFinder.findMedian();
130 }
131
132 // Return the mode of the current numbers
133 int getMode() {
134 return frequencySet.begin()->second;
135 }
136};
137
1class PriorityQueue<T> {
2 private data: T[];
3 private comparator: (a: T, b: T) => number;
4
5 constructor(comparator: (a: T, b: T) => number) {
6 this.data = [];
7 this.comparator = comparator;
8 }
9
10 push(element: T) {
11 this.data.push(element);
12 this.data.sort(this.comparator);
13 }
14
15 pop(): T | undefined {
16 return this.data.shift();
17 }
18
19 top(): T | undefined {
20 return this.data[0];
21 }
22
23 empty(): boolean {
24 return this.data.length === 0;
25 }
26}
27
28let minHeap: PriorityQueue<number> = new PriorityQueue<number>((a, b) => b - a);
29let maxHeap: PriorityQueue<number> = new PriorityQueue<number>((a, b) => a - b);
30let delayed: Map<number, number> = new Map();
31let minHeapSize = 0;
32let maxHeapSize = 0;
33
34// Add a number to the data structure
35function addNum(num: number): void {
36 if (minHeap.empty() || num <= (minHeap.top() ?? Infinity)) {
37 minHeap.push(num);
38 minHeapSize++;
39 } else {
40 maxHeap.push(num);
41 maxHeapSize++;
42 }
43 rebalance();
44}
45
46// Remove a number from the data structure
47function removeNum(num: number): void {
48 delayed.set(num, (delayed.get(num) || 0) + 1);
49 if (num <= (minHeap.top() ?? Infinity)) {
50 minHeapSize--;
51 if (num === minHeap.top()) {
52 prune(minHeap);
53 }
54 } else {
55 maxHeapSize--;
56 if (num === maxHeap.top()) {
57 prune(maxHeap);
58 }
59 }
60 rebalance();
61}
62
63// Find the median of the current numbers
64function findMedian(): number {
65 return minHeapSize === maxHeapSize ? (maxHeap.top() ?? 0) : (minHeap.top() ?? 0);
66}
67
68// Remove numbers that are delayed from the heap
69function prune(heap: PriorityQueue<number>): void {
70 while (!heap.empty() && delayed.get(heap.top()!) !== undefined) {
71 let top = heap.pop();
72 let count = delayed.get(top!)!;
73 if (count === 1) {
74 delayed.delete(top!);
75 } else {
76 delayed.set(top!, count - 1);
77 }
78 }
79}
80
81// Rebalance the heaps if necessary
82function rebalance(): void {
83 if (minHeapSize > maxHeapSize + 1) {
84 maxHeap.push(minHeap.pop()!);
85 minHeapSize--;
86 maxHeapSize++;
87 prune(minHeap);
88 } else if (minHeapSize < maxHeapSize) {
89 minHeap.push(maxHeap.pop()!);
90 maxHeapSize--;
91 minHeapSize++;
92 prune(maxHeap);
93 }
94}
95
96let numberQueue: number[] = [];
97let sum: bigint = BigInt(0);
98let count: Map<number, number> = new Map();
99let frequencySet: Set<[number, number]> = new Set();
100
101// Add a number to the tracker
102function addNumber(number: number): void {
103 numberQueue.push(number);
104 sum += BigInt(number);
105
106 let currentCount = count.get(number) || 0;
107 count.set(number, currentCount + 1);
108 frequencySet.delete([-currentCount, number]);
109 frequencySet.add([-currentCount - 1, number]);
110
111 addNum(number);
112}
113
114// Remove the oldest added number from the tracker
115function removeFirstAddedNumber(): void {
116 let number = numberQueue.shift()!;
117 sum -= BigInt(number);
118
119 let currentCount = count.get(number) || 0;
120 frequencySet.delete([-currentCount, number]);
121 if (currentCount > 1) {
122 count.set(number, currentCount - 1);
123 frequencySet.add([-currentCount + 1, number]);
124 } else {
125 count.delete(number);
126 }
127
128 removeNum(number);
129}
130
131// Calculate and return the mean of the current numbers
132function getMean(): number {
133 return Number(sum / BigInt(numberQueue.length));
134}
135
136// Return the median of the current numbers
137function getMedian(): number {
138 return findMedian();
139}
140
141// Return the mode of the current numbers
142function getMode(): number {
143 let iterator = frequencySet.entries();
144 return iterator.next().value[0];
145}
146
Time and Space Complexity
-
Time Complexity:
-
addNumber
:- Appending to
deque
isO(1)
. - Adding to
SortedList
isO(log n)
. - Discard and add operations on another
SortedList
depend on its size, causingO(log n)
each. - Updating
defaultdict
isO(1)
. - Overall time complexity for
addNumber
isO(log n)
.
- Appending to
-
removeFirstAddedNumber
:- Popping from
deque
isO(1)
. - Removing from
SortedList
isO(log n)
. - Discard and add operations on the second
SortedList
areO(log n)
each. - Updating
defaultdict
isO(1)
. - Overall time complexity for
removeFirstAddedNumber
isO(log n)
.
- Popping from
-
getMean
:- Calculating mean involves summation and division and is
O(1)
.
- Calculating mean involves summation and division and is
-
getMedian
:- Accessing median from
SortedList
isO(1)
since it's based on index.
- Accessing median from
-
getMode
:- Accessing the first element in
SortedList
for mode isO(1)
.
- Accessing the first element in
-
Given all operations, the average time complexity across operations is
O(log n)
, with some likegetMean
,getMedian
, andgetMode
beingO(1)
.
-
-
Space Complexity:
deque
stores all numbers, contributingO(n)
.defaultdict
for count of numbers also maintainsO(n)
space.SortedList
manages up ton
elements, takingO(n)
space.- The second
SortedList
holds tuples (number, count), again contributingO(n)
. - The overall space complexity is
O(n)
.
Learn more about how to find time and space complexity quickly.
What is the best way of checking if an element exists in a sorted array once in terms of time complexity? Select the best that applies.
Recommended Readings
Queue Intro Think of the last time you stood in line to buy movie tickets The first person in line gets their ticket first and any newcomers join the end of the line This system where the first person to arrive is the first to be served is a queue in real
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
Median of Data Stream Given a stream of numbers find the median number at any given time accurate to 1 decimal place Example add_number 1 add_number 2 add_number 3 get_median 2 0 add_number 4 get_median 2 5 Try it yourself Explanation Intuition Brute force way is to sort the entire list every time we get a new number This would be O N log N for each add_number operation One step up is to notice that the list is sorted before we add any new number to it So every
Want a Structured Path to Master System Design Too? Donβt Miss This!