Facebook Pixel

3369. Design an Array Statistics Tracker 🔒


Problem Description

Design a data structure called StatisticsTracker that efficiently handles a stream of integer inputs and supports the following operations:

  • Initialization: Create an empty StatisticsTracker.
  • addNumber(int number): Insert a number into the data structure.
  • removeFirstAddedNumber(): Remove the first number added to the structure.
  • getMean(): Return the floored mean of the numbers currently in the structure.
  • getMedian(): Return the median value of the numbers.
  • getMode(): Return the mode of the numbers. If there are multiple modes, return the smallest one.

Key points related to the calculations:

  • The mean is the sum of all numbers divided by the count of numbers.
  • The median is the middle element in a sorted array, or the larger of two middle elements if there is an even number of total elements.
  • The mode is the value that appears most frequently. In case of a tie, the smallest value is chosen.

Intuition

To solve this problem efficiently, a combination of data structures is used to manage the numbers and perform operations in an optimal manner:

  1. Queue (q): Used to track the order of input numbers, allowing easy removal of the first number added through removeFirstAddedNumber().

  2. Sum (s): Maintains the sum of all current numbers to quickly calculate the mean with getMean(). Updating this sum when numbers are added or removed allows the mean to be quickly recalculated.

  3. Frequency Count (cnt): A hash table tracks the frequency of each number, making it possible to determine the mode with getMode(). This frequency information is essential for both adding numbers and removing them.

  4. Ordered Sets (sl and sl2):

    • sl: Keeps numbers in a sorted order to enable quick retrieval of the median with getMedian(). This structure supports efficient insertion, removal, and access operations needed for median calculations.
    • sl2: Maintains numbers and their frequencies, sorted by frequency in descending order and by value in ascending order. This ordered structure ensures that the mode can be quickly accessed.

By integrating these structures, the solution achieves efficient management of the data stream and ensures that each query can be handled in optimal time complexity, mostly through logarithmic operations due to the underlying data structures' efficient management of sorted order and frequency.

Learn more about Queue, Binary Search, Data Stream and Heap (Priority Queue) patterns.

Solution Approach

The solution leverages several data structures to efficiently manage the operations on the StatisticsTracker:

  1. [Queue](/problems/queue_intro) (q):

    • Utilized to keep track of the order of numbers as they are added. This allows easy access to the first added number when using removeFirstAddedNumber().
    • Operations such as append and popleft are fundamental for adding and removing elements in constant time.
  2. Sum (s):

    • A running total of all numbers in the structure to facilitate the quick calculation of the mean.
    • On each addition (addNumber) and removal (removeFirstAddedNumber), the sum is updated accordingly.
  3. Frequency Count (cnt):

    • A defaultdict(int) is used to keep a count of occurrences for each number, aiding in determining the mode.
    • Counts are incremented or decremented based on the operations being performed.
  4. Ordered Sets (sl and sl2):

    • sl: A SortedList that maintains all numbers in sorted order. Accessing elements by index from this list allows for efficient median retrieval, using self.sl[len(self.q) // 2] to get the median position.
    • sl2: Another SortedList, but sorted by frequency (in descending order) and then by value (in ascending order) using a custom key. This facilitates constant time access to the mode by simply looking at the first element.

Detailed Operation Breakdown

  • addNumber(int number):

    • Append the number to q.
    • Add the number to sl.
    • Update frequency in cnt and adjust entries in sl2 to reflect the frequency change.
    • Increment the sum s by the number.
  • removeFirstAddedNumber():

    • Remove the front entry from q.
    • Remove this number from sl.
    • Update frequency in cnt and adjust sl2 accordingly.
    • Subtract this number from the sum s.
  • getMean():

    • Return the floored mean by dividing s by the size of q using integer division.
  • getMedian():

    • Return the middle element from sl.
  • getMode():

    • Return the first element of sl2, which is structured to provide the mode efficiently.

This approach smartly uses SortedList for maintaining sorted order and quick retrieval, combined with defaultdict for managing frequencies, achieving a comprehensive and efficient solution for mean, median, and mode calculations.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's illustrate the solution approach using a small example. Assume we initialize a StatisticsTracker and process a sequence of numbers: 5, 3, 8, and 5.

  1. Initialization:

    • Create an empty StatisticsTracker.
    • Structures: q = [], s = 0, cnt = {}, sl = [], sl2 = []
  2. addNumber(5):

    • Add 5 to the queue q: q = [5]
    • Add 5 to the sorted list sl: sl = [5]
    • Update frequency: cnt = {5: 1}
    • Update sorted frequencies: sl2 = [(1, 5)]
    • Update sum: s = 5
  3. addNumber(3):

    • Add 3 to the queue q: q = [5, 3]
    • Add 3 into sl to maintain sorted order: sl = [3, 5]
    • Update frequency: cnt = {5: 1, 3: 1}
    • Update sorted frequencies: sl2 = [(1, 3), (1, 5)]
    • Update sum: s = 8
  4. addNumber(8):

    • Add 8 to the queue q: q = [5, 3, 8]
    • Add 8 into sl maintaining sorted order: sl = [3, 5, 8]
    • Update frequency: cnt = {5: 1, 3: 1, 8: 1}
    • Update sorted frequencies: sl2 = [(1, 3), (1, 5), (1, 8)]
    • Update sum: s = 16
  5. addNumber(5):

    • Add another 5 to the queue q: q = [5, 3, 8, 5]
    • Add 5 to sl: sl = [3, 5, 5, 8]
    • Update frequency: cnt = {5: 2, 3: 1, 8: 1}
    • Update sorted frequencies: sl2 = [(1, 3), (1, 8), (2, 5)] (sorted by frequency)
    • Update sum: s = 21
  6. getMean():

    • Calculate mean: mean = 21 // 4 = 5
  7. getMedian():

    • Retrieve median from sl: median = sl[4 // 2] = sl[2] = 5
  8. getMode():

    • Retrieve mode from sl2: mode = sl2[0][1] = 5
  9. removeFirstAddedNumber():

    • Remove the first added number 5 from q: q = [3, 8, 5]
    • Remove 5 from sl: sl = [3, 5, 8]
    • Update frequency and sorted list: cnt = {5: 1, 3: 1, 8: 1}, sl2 = [(1, 3), (1, 5), (1, 8)]
    • Update sum: s = 16

The above walkthrough demonstrates the organized use of multiple data structures to efficiently manage a data stream while supporting quick calculations of statistical measures.

Solution Implementation

1from collections import deque, defaultdict
2from sortedcontainers import SortedList
3
4class StatisticsTracker:
5    def __init__(self):
6        # Queue to maintain the order of added numbers
7        self.queue = deque()
8        # Variable to maintain the sum of added numbers
9        self.total_sum = 0
10        # Dictionary to keep count of each number
11        self.count = defaultdict(int)
12        # Sorted list to efficiently find median
13        self.sorted_list = SortedList()
14        # Sorted list with custom sorting for efficient mode calculation
15        self.sorted_mode_list = SortedList(key=lambda x: (-x[1], x[0]))
16
17    def addNumber(self, number: int) -> None:
18        # Add number to queue
19        self.queue.append(number)
20        # Add number to sorted list to maintain order
21        self.sorted_list.add(number)
22
23        # Update mode-related data
24        # Remove old count if it exists
25        self.sorted_mode_list.discard((number, self.count[number]))
26        # Increment count of the current number
27        self.count[number] += 1
28        # Add new count
29        self.sorted_mode_list.add((number, self.count[number]))
30
31        # Add number to total sum for mean calculation
32        self.total_sum += number
33
34    def removeFirstAddedNumber(self) -> None:
35        # Remove number from the front of the queue
36        number = self.queue.popleft()
37        # Remove number from sorted list
38        self.sorted_list.remove(number)
39
40        # Update mode-related data
41        # Remove old count
42        self.sorted_mode_list.discard((number, self.count[number]))
43        # Decrement count of the current number
44        self.count[number] -= 1
45        # Add new count
46        self.sorted_mode_list.add((number, self.count[number]))
47
48        # Subtract number from total sum
49        self.total_sum -= number
50
51    def getMean(self) -> int:
52        # Calculate mean by dividing total sum by the length of the queue
53        return self.total_sum // len(self.queue)
54
55    def getMedian(self) -> int:
56        # Retrieve median from the middle of the sorted list
57        return self.sorted_list[len(self.queue) // 2]
58
59    def getMode(self) -> int:
60        # Retrieve the number with the highest frequency
61        return self.sorted_mode_list[0][0]
62
63# Example usage:
64# tracker = StatisticsTracker()
65# tracker.addNumber(number)
66# tracker.removeFirstAddedNumber()
67# mean = tracker.getMean()
68# median = tracker.getMedian()
69# mode = tracker.getMode()
70
1import java.util.*;
2
3// Class to continuously find the median using two heaps
4class MedianFinder {
5  
6    // Max heap to keep the smaller half of the numbers
7    private final PriorityQueue<Integer> smallHeap = new PriorityQueue<>(Comparator.reverseOrder());
8  
9    // Min heap to keep the larger half of the numbers
10    private final PriorityQueue<Integer> largeHeap = new PriorityQueue<>();
11  
12    // Map to track numbers that need to be removed
13    private final Map<Integer, Integer> delayed = new HashMap<>();
14  
15    // Size variables for both heaps
16    private int sizeSmallHeap;
17    private int sizeLargeHeap;
18
19    // Adds a number into data structure
20    public void addNum(int num) {
21        // Add the number to the appropriate heap and adjust sizes
22        if (smallHeap.isEmpty() || num <= smallHeap.peek()) {
23            smallHeap.offer(num);
24            ++sizeSmallHeap;
25        } else {
26            largeHeap.offer(num);
27            ++sizeLargeHeap;
28        }
29        rebalanceHeaps();
30    }
31
32    // Returns the median of current data stream
33    public Integer findMedian() {
34        return sizeSmallHeap == sizeLargeHeap ? largeHeap.peek() : smallHeap.peek();
35    }
36
37    // Removes a number from data structure
38    public void removeNum(int num) {
39        delayed.merge(num, 1, Integer::sum);
40        // Adjust size and prune if needed
41        if (num <= smallHeap.peek()) {
42            --sizeSmallHeap;
43            if (num == smallHeap.peek()) {
44                pruneHeap(smallHeap);
45            }
46        } else {
47            --sizeLargeHeap;
48            if (num == largeHeap.peek()) {
49                pruneHeap(largeHeap);
50            }
51        }
52        rebalanceHeaps();
53    }
54
55    // Removes all delayed elements from the heap
56    private void pruneHeap(PriorityQueue<Integer> heap) {
57        while (!heap.isEmpty() && delayed.containsKey(heap.peek())) {
58            if (delayed.merge(heap.peek(), -1, Integer::sum) == 0) {
59                delayed.remove(heap.peek());
60            }
61            heap.poll();
62        }
63    }
64
65    // Ensures heaps are balanced such that |smallHeap-size - largeHeap-size| <= 1
66    private void rebalanceHeaps() {
67        if (sizeSmallHeap > sizeLargeHeap + 1) {
68            largeHeap.offer(smallHeap.poll());
69            --sizeSmallHeap;
70            ++sizeLargeHeap;
71            pruneHeap(smallHeap);
72        } else if (sizeSmallHeap < sizeLargeHeap) {
73            smallHeap.offer(largeHeap.poll());
74            --sizeLargeHeap;
75            ++sizeSmallHeap;
76            pruneHeap(largeHeap);
77        }
78    }
79}
80
81// Class to track statistics: mean, median, and mode
82class StatisticsTracker {
83
84    private final Deque<Integer> numbersQueue = new ArrayDeque<>();
85    private long sum;
86    private final Map<Integer, Integer> countMap = new HashMap<>();
87    private final MedianFinder medianFinder = new MedianFinder();
88  
89    // TreeSet to store the frequency of each number for finding mode
90    private final TreeSet<int[]> modeSet = new TreeSet<>(
91        (a, b) -> a[1] == b[1] ? a[0] - b[0] : b[1] - a[1]
92    );
93
94    // Constructor
95    public StatisticsTracker() {
96    }
97
98    // Adds a number to the statistics tracker
99    public void addNumber(int number) {
100        numbersQueue.offerLast(number);
101        sum += number;
102        modeSet.remove(new int[] {number, countMap.getOrDefault(number, 0)});
103        countMap.merge(number, 1, Integer::sum);
104        medianFinder.addNum(number);
105        modeSet.add(new int[] {number, countMap.get(number)});
106    }
107
108    // Removes the first added number
109    public void removeFirstAddedNumber() {
110        int number = numbersQueue.pollFirst();
111        sum -= number;
112        modeSet.remove(new int[] {number, countMap.get(number)});
113        countMap.merge(number, -1, Integer::sum);
114        medianFinder.removeNum(number);
115        modeSet.add(new int[] {number, countMap.get(number)});
116    }
117
118    // Returns the mean of the numbers
119    public int getMean() {
120        return (int) (sum / numbersQueue.size());
121    }
122
123    // Returns the median of the numbers
124    public int getMedian() {
125        return medianFinder.findMedian();
126    }
127
128    // Returns the mode of the numbers
129    public int getMode() {
130        return modeSet.first()[0];
131    }
132}
133
1#include <queue>
2#include <unordered_map>
3#include <set>
4#include <vector>
5using namespace std;
6
7class MedianFinder {
8public:
9    // Add a number to the data structure
10    void addNum(int num) {
11        if (minHeap.empty() || num <= minHeap.top()) {
12            minHeap.push(num);
13            ++minHeapSize;
14        } else {
15            maxHeap.push(num);
16            ++maxHeapSize;
17        }
18        rebalance();
19    }
20
21    // Remove a number from the data structure
22    void removeNum(int num) {
23        ++delayed[num];
24        if (num <= minHeap.top()) {
25            --minHeapSize;
26            if (num == minHeap.top()) {
27                prune(minHeap);
28            }
29        } else {
30            --maxHeapSize;
31            if (num == maxHeap.top()) {
32                prune(maxHeap);
33            }
34        }
35        rebalance();
36    }
37
38    // Find the median of the current numbers