Facebook Pixel

3369. Design an Array Statistics Tracker πŸ”’


Problem Description

Design a data structure called StatisticsTracker that efficiently handles a stream of integer inputs and supports the following operations:

  • Initialization: Create an empty StatisticsTracker.
  • addNumber(int number): Insert a number into the data structure.
  • removeFirstAddedNumber(): Remove the first number added to the structure.
  • getMean(): Return the floored mean of the numbers currently in the structure.
  • getMedian(): Return the median value of the numbers.
  • getMode(): Return the mode of the numbers. If there are multiple modes, return the smallest one.

Key points related to the calculations:

  • The mean is the sum of all numbers divided by the count of numbers.
  • The median is the middle element in a sorted array, or the larger of two middle elements if there is an even number of total elements.
  • The mode is the value that appears most frequently. In case of a tie, the smallest value is chosen.

Intuition

To solve this problem efficiently, a combination of data structures is used to manage the numbers and perform operations in an optimal manner:

  1. Queue (q): Used to track the order of input numbers, allowing easy removal of the first number added through removeFirstAddedNumber().

  2. Sum (s): Maintains the sum of all current numbers to quickly calculate the mean with getMean(). Updating this sum when numbers are added or removed allows the mean to be quickly recalculated.

  3. Frequency Count (cnt): A hash table tracks the frequency of each number, making it possible to determine the mode with getMode(). This frequency information is essential for both adding numbers and removing them.

  4. Ordered Sets (sl and sl2):

    • sl: Keeps numbers in a sorted order to enable quick retrieval of the median with getMedian(). This structure supports efficient insertion, removal, and access operations needed for median calculations.
    • sl2: Maintains numbers and their frequencies, sorted by frequency in descending order and by value in ascending order. This ordered structure ensures that the mode can be quickly accessed.

By integrating these structures, the solution achieves efficient management of the data stream and ensures that each query can be handled in optimal time complexity, mostly through logarithmic operations due to the underlying data structures' efficient management of sorted order and frequency.

Learn more about Queue, Binary Search, Data Stream and Heap (Priority Queue) patterns.

Solution Approach

The solution leverages several data structures to efficiently manage the operations on the StatisticsTracker:

  1. [Queue](/problems/queue_intro) (q):

    • Utilized to keep track of the order of numbers as they are added. This allows easy access to the first added number when using removeFirstAddedNumber().
    • Operations such as append and popleft are fundamental for adding and removing elements in constant time.
  2. Sum (s):

    • A running total of all numbers in the structure to facilitate the quick calculation of the mean.
    • On each addition (addNumber) and removal (removeFirstAddedNumber), the sum is updated accordingly.
  3. Frequency Count (cnt):

    • A defaultdict(int) is used to keep a count of occurrences for each number, aiding in determining the mode.
    • Counts are incremented or decremented based on the operations being performed.
  4. Ordered Sets (sl and sl2):

    • sl: A SortedList that maintains all numbers in sorted order. Accessing elements by index from this list allows for efficient median retrieval, using self.sl[len(self.q) // 2] to get the median position.
    • sl2: Another SortedList, but sorted by frequency (in descending order) and then by value (in ascending order) using a custom key. This facilitates constant time access to the mode by simply looking at the first element.

Detailed Operation Breakdown

  • addNumber(int number):

    • Append the number to q.
    • Add the number to sl.
    • Update frequency in cnt and adjust entries in sl2 to reflect the frequency change.
    • Increment the sum s by the number.
  • removeFirstAddedNumber():

    • Remove the front entry from q.
    • Remove this number from sl.
    • Update frequency in cnt and adjust sl2 accordingly.
    • Subtract this number from the sum s.
  • getMean():

    • Return the floored mean by dividing s by the size of q using integer division.
  • getMedian():

    • Return the middle element from sl.
  • getMode():

    • Return the first element of sl2, which is structured to provide the mode efficiently.

This approach smartly uses SortedList for maintaining sorted order and quick retrieval, combined with defaultdict for managing frequencies, achieving a comprehensive and efficient solution for mean, median, and mode calculations.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's illustrate the solution approach using a small example. Assume we initialize a StatisticsTracker and process a sequence of numbers: 5, 3, 8, and 5.

  1. Initialization:

    • Create an empty StatisticsTracker.
    • Structures: q = [], s = 0, cnt = {}, sl = [], sl2 = []
  2. addNumber(5):

    • Add 5 to the queue q: q = [5]
    • Add 5 to the sorted list sl: sl = [5]
    • Update frequency: cnt = {5: 1}
    • Update sorted frequencies: sl2 = [(1, 5)]
    • Update sum: s = 5
  3. addNumber(3):

    • Add 3 to the queue q: q = [5, 3]
    • Add 3 into sl to maintain sorted order: sl = [3, 5]
    • Update frequency: cnt = {5: 1, 3: 1}
    • Update sorted frequencies: sl2 = [(1, 3), (1, 5)]
    • Update sum: s = 8
  4. addNumber(8):

    • Add 8 to the queue q: q = [5, 3, 8]
    • Add 8 into sl maintaining sorted order: sl = [3, 5, 8]
    • Update frequency: cnt = {5: 1, 3: 1, 8: 1}
    • Update sorted frequencies: sl2 = [(1, 3), (1, 5), (1, 8)]
    • Update sum: s = 16
  5. addNumber(5):

    • Add another 5 to the queue q: q = [5, 3, 8, 5]
    • Add 5 to sl: sl = [3, 5, 5, 8]
    • Update frequency: cnt = {5: 2, 3: 1, 8: 1}
    • Update sorted frequencies: sl2 = [(1, 3), (1, 8), (2, 5)] (sorted by frequency)
    • Update sum: s = 21
  6. getMean():

    • Calculate mean: mean = 21 // 4 = 5
  7. getMedian():

    • Retrieve median from sl: median = sl[4 // 2] = sl[2] = 5
  8. getMode():

    • Retrieve mode from sl2: mode = sl2[0][1] = 5
  9. removeFirstAddedNumber():

    • Remove the first added number 5 from q: q = [3, 8, 5]
    • Remove 5 from sl: sl = [3, 5, 8]
    • Update frequency and sorted list: cnt = {5: 1, 3: 1, 8: 1}, sl2 = [(1, 3), (1, 5), (1, 8)]
    • Update sum: s = 16

The above walkthrough demonstrates the organized use of multiple data structures to efficiently manage a data stream while supporting quick calculations of statistical measures.

Solution Implementation

1from collections import deque, defaultdict
2from sortedcontainers import SortedList
3
4class StatisticsTracker:
5    def __init__(self):
6        # Queue to maintain the order of added numbers
7        self.queue = deque()
8        # Variable to maintain the sum of added numbers
9        self.total_sum = 0
10        # Dictionary to keep count of each number
11        self.count = defaultdict(int)
12        # Sorted list to efficiently find median
13        self.sorted_list = SortedList()
14        # Sorted list with custom sorting for efficient mode calculation
15        self.sorted_mode_list = SortedList(key=lambda x: (-x[1], x[0]))
16
17    def addNumber(self, number: int) -> None:
18        # Add number to queue
19        self.queue.append(number)
20        # Add number to sorted list to maintain order
21        self.sorted_list.add(number)
22
23        # Update mode-related data
24        # Remove old count if it exists
25        self.sorted_mode_list.discard((number, self.count[number]))
26        # Increment count of the current number
27        self.count[number] += 1
28        # Add new count
29        self.sorted_mode_list.add((number, self.count[number]))
30
31        # Add number to total sum for mean calculation
32        self.total_sum += number
33
34    def removeFirstAddedNumber(self) -> None:
35        # Remove number from the front of the queue
36        number = self.queue.popleft()
37        # Remove number from sorted list
38        self.sorted_list.remove(number)
39
40        # Update mode-related data
41        # Remove old count
42        self.sorted_mode_list.discard((number, self.count[number]))
43        # Decrement count of the current number
44        self.count[number] -= 1
45        # Add new count
46        self.sorted_mode_list.add((number, self.count[number]))
47
48        # Subtract number from total sum
49        self.total_sum -= number
50
51    def getMean(self) -> int:
52        # Calculate mean by dividing total sum by the length of the queue
53        return self.total_sum // len(self.queue)
54
55    def getMedian(self) -> int:
56        # Retrieve median from the middle of the sorted list
57        return self.sorted_list[len(self.queue) // 2]
58
59    def getMode(self) -> int:
60        # Retrieve the number with the highest frequency
61        return self.sorted_mode_list[0][0]
62
63# Example usage:
64# tracker = StatisticsTracker()
65# tracker.addNumber(number)
66# tracker.removeFirstAddedNumber()
67# mean = tracker.getMean()
68# median = tracker.getMedian()
69# mode = tracker.getMode()
70
1import java.util.*;
2
3// Class to continuously find the median using two heaps
4class MedianFinder {
5  
6    // Max heap to keep the smaller half of the numbers
7    private final PriorityQueue<Integer> smallHeap = new PriorityQueue<>(Comparator.reverseOrder());
8  
9    // Min heap to keep the larger half of the numbers
10    private final PriorityQueue<Integer> largeHeap = new PriorityQueue<>();
11  
12    // Map to track numbers that need to be removed
13    private final Map<Integer, Integer> delayed = new HashMap<>();
14  
15    // Size variables for both heaps
16    private int sizeSmallHeap;
17    private int sizeLargeHeap;
18
19    // Adds a number into data structure
20    public void addNum(int num) {
21        // Add the number to the appropriate heap and adjust sizes
22        if (smallHeap.isEmpty() || num <= smallHeap.peek()) {
23            smallHeap.offer(num);
24            ++sizeSmallHeap;
25        } else {
26            largeHeap.offer(num);
27            ++sizeLargeHeap;
28        }
29        rebalanceHeaps();
30    }
31
32    // Returns the median of current data stream
33    public Integer findMedian() {
34        return sizeSmallHeap == sizeLargeHeap ? largeHeap.peek() : smallHeap.peek();
35    }
36
37    // Removes a number from data structure
38    public void removeNum(int num) {
39        delayed.merge(num, 1, Integer::sum);
40        // Adjust size and prune if needed
41        if (num <= smallHeap.peek()) {
42            --sizeSmallHeap;
43            if (num == smallHeap.peek()) {
44                pruneHeap(smallHeap);
45            }
46        } else {
47            --sizeLargeHeap;
48            if (num == largeHeap.peek()) {
49                pruneHeap(largeHeap);
50            }
51        }
52        rebalanceHeaps();
53    }
54
55    // Removes all delayed elements from the heap
56    private void pruneHeap(PriorityQueue<Integer> heap) {
57        while (!heap.isEmpty() && delayed.containsKey(heap.peek())) {
58            if (delayed.merge(heap.peek(), -1, Integer::sum) == 0) {
59                delayed.remove(heap.peek());
60            }
61            heap.poll();
62        }
63    }
64
65    // Ensures heaps are balanced such that |smallHeap-size - largeHeap-size| <= 1
66    private void rebalanceHeaps() {
67        if (sizeSmallHeap > sizeLargeHeap + 1) {
68            largeHeap.offer(smallHeap.poll());
69            --sizeSmallHeap;
70            ++sizeLargeHeap;
71            pruneHeap(smallHeap);
72        } else if (sizeSmallHeap < sizeLargeHeap) {
73            smallHeap.offer(largeHeap.poll());
74            --sizeLargeHeap;
75            ++sizeSmallHeap;
76            pruneHeap(largeHeap);
77        }
78    }
79}
80
81// Class to track statistics: mean, median, and mode
82class StatisticsTracker {
83
84    private final Deque<Integer> numbersQueue = new ArrayDeque<>();
85    private long sum;
86    private final Map<Integer, Integer> countMap = new HashMap<>();
87    private final MedianFinder medianFinder = new MedianFinder();
88  
89    // TreeSet to store the frequency of each number for finding mode
90    private final TreeSet<int[]> modeSet = new TreeSet<>(
91        (a, b) -> a[1] == b[1] ? a[0] - b[0] : b[1] - a[1]
92    );
93
94    // Constructor
95    public StatisticsTracker() {
96    }
97
98    // Adds a number to the statistics tracker
99    public void addNumber(int number) {
100        numbersQueue.offerLast(number);
101        sum += number;
102        modeSet.remove(new int[] {number, countMap.getOrDefault(number, 0)});
103        countMap.merge(number, 1, Integer::sum);
104        medianFinder.addNum(number);
105        modeSet.add(new int[] {number, countMap.get(number)});
106    }
107
108    // Removes the first added number
109    public void removeFirstAddedNumber() {
110        int number = numbersQueue.pollFirst();
111        sum -= number;
112        modeSet.remove(new int[] {number, countMap.get(number)});
113        countMap.merge(number, -1, Integer::sum);
114        medianFinder.removeNum(number);
115        modeSet.add(new int[] {number, countMap.get(number)});
116    }
117
118    // Returns the mean of the numbers
119    public int getMean() {
120        return (int) (sum / numbersQueue.size());
121    }
122
123    // Returns the median of the numbers
124    public int getMedian() {
125        return medianFinder.findMedian();
126    }
127
128    // Returns the mode of the numbers
129    public int getMode() {
130        return modeSet.first()[0];
131    }
132}
133
1#include <queue>
2#include <unordered_map>
3#include <set>
4#include <vector>
5using namespace std;
6
7class MedianFinder {
8public:
9    // Add a number to the data structure
10    void addNum(int num) {
11        if (minHeap.empty() || num <= minHeap.top()) {
12            minHeap.push(num);
13            ++minHeapSize;
14        } else {
15            maxHeap.push(num);
16            ++maxHeapSize;
17        }
18        rebalance();
19    }
20
21    // Remove a number from the data structure
22    void removeNum(int num) {
23        ++delayed[num];
24        if (num <= minHeap.top()) {
25            --minHeapSize;
26            if (num == minHeap.top()) {
27                prune(minHeap);
28            }
29        } else {
30            --maxHeapSize;
31            if (num == maxHeap.top()) {
32                prune(maxHeap);
33            }
34        }
35        rebalance();
36    }
37
38    // Find the median of the current numbers
39    int findMedian() {
40        return minHeapSize == maxHeapSize ? maxHeap.top() : minHeap.top();
41    }
42
43private:
44    // Use a max-heap to keep track of the smaller half of numbers
45    priority_queue<int> minHeap;
46    // Use a min-heap to keep track of the larger half of numbers
47    priority_queue<int, vector<int>, greater<int>> maxHeap;
48    // Map to keep track of delayed removals
49    unordered_map<int, int> delayed;
50    // Track sizes of the heaps
51    int minHeapSize = 0;
52    int maxHeapSize = 0;
53
54    // Remove numbers that are delayed from the heap
55    template <typename T>
56    void prune(T& heap) {
57        while (!heap.empty() && delayed[heap.top()]) {
58            if (--delayed[heap.top()] == 0) {
59                delayed.erase(heap.top());
60            }
61            heap.pop();
62        }
63    }
64
65    // Rebalance the heaps if necessary
66    void rebalance() {
67        if (minHeapSize > maxHeapSize + 1) {
68            maxHeap.push(minHeap.top());
69            minHeap.pop();
70            --minHeapSize;
71            ++maxHeapSize;
72            prune(minHeap);
73        } else if (minHeapSize < maxHeapSize) {
74            minHeap.push(maxHeap.top());
75            maxHeap.pop();
76            ++minHeapSize;
77            --maxHeapSize;
78            prune(maxHeap);
79        }
80    }
81};
82
83class StatisticsTracker {
84private:
85    queue<int> numberQueue;
86    long long sum = 0;
87    unordered_map<int, int> count;
88    MedianFinder medianFinder;
89    // Ordered set to track numbers by frequency (mode calculations)
90    set<pair<int, int>> frequencySet;
91
92public:
93    StatisticsTracker() {}
94
95    // Add a number to the tracker
96    void addNumber(int number) {
97        numberQueue.push(number);
98        sum += number;
99
100        frequencySet.erase({-count[number], number});
101        ++count[number];
102        frequencySet.insert({-count[number], number});
103
104        medianFinder.addNum(number);
105    }
106
107    // Remove the oldest added number from the tracker
108    void removeFirstAddedNumber() {
109        int number = numberQueue.front();
110        numberQueue.pop();
111        sum -= number;
112
113        frequencySet.erase({-count[number], number});
114        --count[number];
115        if (count[number] > 0) {
116            frequencySet.insert({-count[number], number});
117        }
118
119        medianFinder.removeNum(number);
120    }
121
122    // Calculate and return the mean of the current numbers
123    int getMean() {
124        return static_cast<int>(sum / numberQueue.size());
125    }
126
127    // Return the median of the current numbers
128    int getMedian() {
129        return medianFinder.findMedian();
130    }
131
132    // Return the mode of the current numbers
133    int getMode() {
134        return frequencySet.begin()->second;
135    }
136};
137
1class PriorityQueue<T> {
2    private data: T[];
3    private comparator: (a: T, b: T) => number;
4
5    constructor(comparator: (a: T, b: T) => number) {
6        this.data = [];
7        this.comparator = comparator;
8    }
9
10    push(element: T) {
11        this.data.push(element);
12        this.data.sort(this.comparator);
13    }
14
15    pop(): T | undefined {
16        return this.data.shift();
17    }
18
19    top(): T | undefined {
20        return this.data[0];
21    }
22
23    empty(): boolean {
24        return this.data.length === 0;
25    }
26}
27
28let minHeap: PriorityQueue<number> = new PriorityQueue<number>((a, b) => b - a);
29let maxHeap: PriorityQueue<number> = new PriorityQueue<number>((a, b) => a - b);
30let delayed: Map<number, number> = new Map();
31let minHeapSize = 0;
32let maxHeapSize = 0;
33
34// Add a number to the data structure
35function addNum(num: number): void {
36    if (minHeap.empty() || num <= (minHeap.top() ?? Infinity)) {
37        minHeap.push(num);
38        minHeapSize++;
39    } else {
40        maxHeap.push(num);
41        maxHeapSize++;
42    }
43    rebalance();
44}
45
46// Remove a number from the data structure
47function removeNum(num: number): void {
48    delayed.set(num, (delayed.get(num) || 0) + 1);
49    if (num <= (minHeap.top() ?? Infinity)) {
50        minHeapSize--;
51        if (num === minHeap.top()) {
52            prune(minHeap);
53        }
54    } else {
55        maxHeapSize--;
56        if (num === maxHeap.top()) {
57            prune(maxHeap);
58        }
59    }
60    rebalance();
61}
62
63// Find the median of the current numbers
64function findMedian(): number {
65    return minHeapSize === maxHeapSize ? (maxHeap.top() ?? 0) : (minHeap.top() ?? 0);
66}
67
68// Remove numbers that are delayed from the heap
69function prune(heap: PriorityQueue<number>): void {
70    while (!heap.empty() && delayed.get(heap.top()!) !== undefined) {
71        let top = heap.pop();
72        let count = delayed.get(top!)!;
73        if (count === 1) {
74            delayed.delete(top!);
75        } else {
76            delayed.set(top!, count - 1);
77        }
78    }
79}
80
81// Rebalance the heaps if necessary
82function rebalance(): void {
83    if (minHeapSize > maxHeapSize + 1) {
84        maxHeap.push(minHeap.pop()!);
85        minHeapSize--;
86        maxHeapSize++;
87        prune(minHeap);
88    } else if (minHeapSize < maxHeapSize) {
89        minHeap.push(maxHeap.pop()!);
90        maxHeapSize--;
91        minHeapSize++;
92        prune(maxHeap);
93    }
94}
95
96let numberQueue: number[] = [];
97let sum: bigint = BigInt(0);
98let count: Map<number, number> = new Map();
99let frequencySet: Set<[number, number]> = new Set();
100
101// Add a number to the tracker
102function addNumber(number: number): void {
103    numberQueue.push(number);
104    sum += BigInt(number);
105
106    let currentCount = count.get(number) || 0;
107    count.set(number, currentCount + 1);
108    frequencySet.delete([-currentCount, number]);
109    frequencySet.add([-currentCount - 1, number]);
110
111    addNum(number);
112}
113
114// Remove the oldest added number from the tracker
115function removeFirstAddedNumber(): void {
116    let number = numberQueue.shift()!;
117    sum -= BigInt(number);
118
119    let currentCount = count.get(number) || 0;
120    frequencySet.delete([-currentCount, number]);
121    if (currentCount > 1) {
122        count.set(number, currentCount - 1);
123        frequencySet.add([-currentCount + 1, number]);
124    } else {
125        count.delete(number);
126    }
127
128    removeNum(number);
129}
130
131// Calculate and return the mean of the current numbers
132function getMean(): number {
133    return Number(sum / BigInt(numberQueue.length));
134}
135
136// Return the median of the current numbers
137function getMedian(): number {
138    return findMedian();
139}
140
141// Return the mode of the current numbers
142function getMode(): number {
143    let iterator = frequencySet.entries();
144    return iterator.next().value[0];
145}
146

Time and Space Complexity

  • Time Complexity:

    • addNumber:

      • Appending to deque is O(1).
      • Adding to SortedList is O(log n).
      • Discard and add operations on another SortedList depend on its size, causing O(log n) each.
      • Updating defaultdict is O(1).
      • Overall time complexity for addNumber is O(log n).
    • removeFirstAddedNumber:

      • Popping from deque is O(1).
      • Removing from SortedList is O(log n).
      • Discard and add operations on the second SortedList are O(log n) each.
      • Updating defaultdict is O(1).
      • Overall time complexity for removeFirstAddedNumber is O(log n).
    • getMean:

      • Calculating mean involves summation and division and is O(1).
    • getMedian:

      • Accessing median from SortedList is O(1) since it's based on index.
    • getMode:

      • Accessing the first element in SortedList for mode is O(1).
    • Given all operations, the average time complexity across operations is O(log n), with some like getMean, getMedian, and getMode being O(1).

  • Space Complexity:

    • deque stores all numbers, contributing O(n).
    • defaultdict for count of numbers also maintains O(n) space.
    • SortedList manages up to n elements, taking O(n) space.
    • The second SortedList holds tuples (number, count), again contributing O(n).
    • The overall space complexity is O(n).

Learn more about how to find time and space complexity quickly.


Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:
Question 1 out of 10

What is the best way of checking if an element exists in a sorted array once in terms of time complexity? Select the best that applies.


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!


Load More