295. Find Median from Data Stream
Problem Description
The problem requires designing a class, MedianFinder
, that can handle a stream of integers and provide a way to find the median at any given time. Median, by definition, is the middle value in an ordered list of numbers. If the list has an odd number of elements, the median is simply the middle element. If the list has an even number of elements, there is no single middle element, so the median is calculated by taking the average of the two middle numbers.
Intuition
To find the median efficiently, we need a data structure that allows quick access to the middle elements. Utilizing two heaps is an elegant solution: a max heap to store the smaller half of the numbers and a min heap to store the larger half. This way, the largest number in the smaller half or the smallest number in the larger half can easily give us the median.
In Python, the default heapq
module provides a min heap implementation. To get a max heap behavior, we insert negatives of the numbers into a heap.
By balancing the heaps so that their size differs by at most one, we ensure that we either have a single middle element when the combined size is odd (this will be at the top of the larger heap), or two middle elements when the combined size is even (the top of each heap).
The addNum
method works by adding a new number to the max heap (smaller half) first, by pushing its negative value. We then pop the top value from the max heap and push it to the min heap (larger half) to maintain the order and balance. If the larger half has more than one extra element compared to the smaller half, we move the top element from the larger half to the smaller half.
findMedian
checks the current size of the heaps. If the heaps are of the same size, the median is the average of the top values of both heaps. If the sizes differ, the median is the top element of the larger heap.
Learn more about Two Pointers, Data Stream, Sorting and Heap (Priority Queue) patterns.
Solution Approach
The solution is based on maintaining two heaps (h1
and h2
), which are used to simulate a max heap and a min heap respectively.
h1
(max heap) stores the smaller half of the numbers, and we simulate it by pushing negatives of the numbers into a min heap.h2
(min heap) stores the larger half of the numbers as they are, since theheapq
library already implements a min heap.
Here's how the methods of the MedianFinder
class work:
-
The
__init__
method simply initializes two empty lists,h1
andh2
, that we'll use as heaps. -
addNum
adds a new number to the data structure:- We first add the number to
h1
(which is simulating a max heap) by pushing its negative. - Then, we balance the max heap by popping an element from
h1
and pushing it ontoh2
(min heap). - We check if
h2
(larger half) has more than one element thanh1
(smaller half). If it does, we balance the heaps by moving the top element fromh2
toh1
.
- We first add the number to
-
findMedian
computes the median based on the current elements:- If
h2
has more elements thanh1
, the median is just the top element ofh2
(the smallest number in the larger half). - If
h1
andh2
have the same number of elements, the median is the average of the top element ofh1
(the largest number in the smaller half, rememberh1
is storing negatives) and the top element ofh2
(the actual smallest number in the larger half).
- If
The efficiency of adding numbers is O(log n)
due to the heap operations, and finding the median is O(1)
since it involves only accessing the tops of the heaps. This solution is quite efficient and handles the streaming data well.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate the implementation of the MedianFinder
class using the given solution approach:
-
Initialize the
MedianFinder
class, so we start with two empty heaps,h1
(max heap) andh2
(min heap). -
We execute
addNum(3)
. Here's how it works:- We insert -3 into
h1
to keep track of the max heap. - We pop -3 from
h1
(which is 3 in its original form) and push it toh2
. - Since
h1
is now empty andh2
has only one element, no balancing is needed.
- We insert -3 into
-
Next, we execute
addNum(1)
:- We insert -1 into
h1
. - We pop -1 from
h1
(which is 1 in its original form) and push it toh2
. - Now
h2
has two elements (1 and 3), we pop 1 fromh2
and push its negative intoh1
to balance the heaps.
- We insert -1 into
-
At this point,
h1
has the single element -1, andh2
has the single element 3. -
We add another number by calling
addNum(5)
:- We insert -5 into
h1
. - We pop -5 from
h1
(which is 5 in its original form) and push it intoh2
. h2
now having 3 and 5, is larger thanh1
which only has -1. No additional balancing is needed.
- We insert -5 into
-
Now, when calling
findMedian()
:- Since we have a total of 3 elements and
h1
has 1 element andh2
has 2 elements, the median is the top ofh2
which is 3.
- Since we have a total of 3 elements and
-
Let's add another number by executing
addNum(2)
:- We insert -2 into
h1
. - We pop -2 from
h1
(which is 2 in its original form) and push it toh2
. - Since
h2
(with elements 2, 3, and 5) now has more elements, we pop 2 fromh2
and push its negative intoh1
to balance the heaps. Nowh1
has -2 and -1, andh2
has 3 and 5.
- We insert -2 into
-
Calling
findMedian()
now gives us:- Since
h1
andh2
have the same number of elements (2 each), the median is the average of the top element ofh1
(-1, which is 1 when we negate it) and the top element ofh2
(which is 3), giving us a median of (1 + 3) / 2 = 2.
- Since
Through these steps, we can see how the MedianFinder
class handles the addition of numbers and maintains a structure that allows us to easily find the median at any point. The use of two heaps is instrumental in efficiently managing the median calculation for a stream of data.
Solution Implementation
1from heapq import heappush, heappop
2
3class MedianFinder:
4 def __init__(self):
5 """
6 Initialize your data structure here.
7 """
8 self.small = [] # Max heap (simulated with negative values)
9 self.large = [] # Min heap
10
11 def addNum(self, num: int) -> None:
12 """
13 Adds a number into the data structure.
14 """
15 heappush(self.small, -num) # Add to max heap
16 # Move the largest element of small (max heap) to large (min heap)
17 heappush(self.large, -heappop(self.small))
18
19 # If large heap has more elements, move the smallest element of large to small
20 if len(self.large) > len(self.small):
21 heappush(self.small, -heappop(self.large))
22
23 def findMedian(self) -> float:
24 """
25 Returns the median of current data stream
26 """
27 # If the heaps have equal size, the median is the average of the two heap's top elements
28 if len(self.small) == len(self.large):
29 return (-self.small[0] + self.large[0]) / 2.0
30 # If the small heap has more elements, return the top element of the small heap (which is the max element)
31 elif len(self.small) > len(self.large):
32 return -self.small[0]
33 # Otherwise, return the top element of the large heap (which is the min element)
34 else:
35 return self.large[0]
36
1import java.util.PriorityQueue;
2import java.util.Collections;
3
4class MedianFinder {
5 private PriorityQueue<Integer> minHeap = new PriorityQueue<>(); // Min-heap to store the larger half of the numbers
6 private PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder()); // Max-heap to store the smaller half of the numbers
7
8 /** Initialize MedianFinder. */
9 public MedianFinder() {
10 // The constructor is kept empty as there's nothing to initialize outside the declarations.
11 }
12
13 /** Adds a number into the data structure. */
14 public void addNum(int num) {
15 minHeap.offer(num); // Add the number to the min-heap
16 maxHeap.offer(minHeap.poll()); // Balance the heaps by moving the smallest number of min-heap to max-heap
17
18 // Ensure max-heap has equal or one more element than the min-heap
19 if (maxHeap.size() > minHeap.size() + 1) {
20 minHeap.offer(maxHeap.poll()); // Move the maximum number of max-heap to min-heap
21 }
22 }
23
24 /** Returns the median of current data stream. */
25 public double findMedian() {
26 if (maxHeap.size() > minHeap.size()) {
27 // If max-heap has more elements, the median is the top of the max-heap
28 return maxHeap.peek();
29 }
30 // Otherwise, the median is the average of the tops of both heaps
31 return (minHeap.peek() + maxHeap.peek()) / 2.0;
32 }
33}
34
35/**
36 * Example of usage:
37 * MedianFinder medianFinder = new MedianFinder();
38 * medianFinder.addNum(num); // Add a number
39 * double median = medianFinder.findMedian(); // Get the current median
40 */
41
1#include <queue>
2#include <vector>
3
4class MedianFinder {
5public:
6 MedianFinder() {
7 // Constructor doesn't need any code,
8 // as the priority queues are automatically initialized.
9 }
10
11 // Inserts a number into the data structure.
12 void addNum(int num) {
13 // Add the new number to the max heap.
14 maxHeap.push(num);
15
16 // Now balance the heaps by always having the top of the max heap
17 // ready to move to the min heap to maintain the ordering.
18 minHeap.push(maxHeap.top());
19 maxHeap.pop();
20
21 // If the min heap has more elements than the max heap,
22 // move the top element back to the max heap to maintain
23 // size property of heaps.
24 if (minHeap.size() > maxHeap.size()) {
25 maxHeap.push(minHeap.top());
26 minHeap.pop();
27 }
28 }
29
30 // Finds and returns the median of all numbers inserted.
31 double findMedian() {
32 // If the max heap is larger, the median is at the top of the max heap.
33 if (maxHeap.size() > minHeap.size()) {
34 return maxHeap.top();
35 }
36 // If both heaps have the same size, the median is the average of
37 // the tops of both heaps.
38 return (double) (minHeap.top() + maxHeap.top()) / 2;
39 }
40
41private:
42 // Max heap for the lower half of the numbers.
43 std::priority_queue<int> maxHeap;
44 // Min heap for the upper half of the numbers
45 // (using greater<> to make it a min heap).
46 std::priority_queue<int, std::vector<int>, std::greater<int>> minHeap;
47};
48
49/**
50 * The MedianFinder object will be instantiated and called as such:
51 * MedianFinder* obj = new MedianFinder();
52 * obj->addNum(num); // Method to add a number into the MedianFinder.
53 * double median = obj->findMedian(); // Method to find and return the median.
54 */
55
1class MedianFinder {
2 private small: number[]; // Simulated Max Heap (we'll use Min Heap logic but with negated values)
3 private large: number[]; // Min Heap
4
5 constructor() {
6 this.small = [];
7 this.large = [];
8 }
9
10 addNum(num: number): void {
11 // Add to max heap (simulated by adding negative number to min heap)
12 this.heappush(this.small, -num);
13 // Ensure the smallest of the max heap moves to the min heap
14 this.heappush(this.large, -this.heappop(this.small));
15
16 // Balance the heaps so that either both have the same size or small has one more element
17 if (this.large.length > this.small.length) {
18 this.heappush(this.small, -this.heappop(this.large));
19 }
20 }
21
22 findMedian(): number {
23 if (this.small.length > this.large.length) {
24 // If the small heap is larger, the median is its top element (remember to negate it back)
25 return -this.small[0];
26 } else if (this.small.length === this.large.length) {
27 // If the heaps are of equal size, the median is the average of the tops of both heaps
28 return (-this.small[0] + this.large[0]) / 2;
29 } else {
30 // If the large heap somehow ends up with more elements, this will handle that case
31 return this.large[0];
32 }
33 }
34
35 private heappush(heap: number[], val: number): void {
36 heap.push(val);
37 heap.sort((a, b) => a - b); // This simulates the heap push by sorting
38 }
39
40 private heappop(heap: number[]): number {
41 return heap.shift()!; // This simulates the heap pop by removing the first element
42 }
43}
44
45/**
46* Your MedianFinder object will be instantiated and called as such:
47* var obj = new MedianFinder()
48* obj.addNum(num)
49* var param_2 = obj.findMedian()
50*/
51
Time and Space Complexity
The provided class MedianFinder
has two methods: addNum
and findMedian
, used for adding numbers and finding the median, respectively. The class maintains two heaps: h1
is the min-heap that stores the larger half of the numbers, and h2
is the max-heap that stores the smaller half of the numbers (as negatives).
addNum
Method
Time Complexity:
-
Inserting an element into a heap (
heappush
) has a time complexity ofO(log n)
, wheren
is the number of elements in the heap. -
Removing the smallest element from a min heap (
heappop
onh1
) or the largest element from a max heap (heappop
onh2
) has a time complexity ofO(log n)
. -
The
addNum
method does at most two push and pop operations each time it is called:heappush(self.h1, num)
: Insertnum
into the min-heaph1
.heappush(self.h2, -heappop(self.h1))
: Pop the minimum value fromh1
, negate it (to simulate a max-heap), and push it intoh2
.heappush(self.h1, -heappop(self.h2))
: This happens only if the size ofh2
exceeds the size ofh1
by more than one, ensuring the difference in the number of elements in the two heaps never exceeds one.
Considering the above steps, the time complexity for addNum
is O(log n)
because of the heap operations.
Space Complexity:
- The space complexity of the
MedianFinder
class isO(n)
, wheren
is the number of elements inserted into theMedianFinder
. This is due to the storage of all elements across two heaps.
findMedian
Method
Time Complexity:
- Accessing the top element of a heap (minimum value in
h1
and maximum value inh2
) has a time complexity ofO(1)
. findMedian
performs at most one arithmetic operation, which is done in constant time.
Therefore, the time complexity for findMedian
is O(1)
.
Space Complexity:
- The
findMedian
method does not use any additional space that depends on the input size, so its space complexity isO(1)
.
Learn more about how to find time and space complexity quickly using problem constraints.
What data structure does Breadth-first search typically uses to store intermediate states?
Recommended Readings
Tech Interview Pattern Two Pointers Introduction If you prefer videos here's a super quick introduction to Two Pointers div class responsive iframe iframe src https www youtube com embed xZ4AfXHQ1VQ title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture allowfullscreen iframe div Two pointers is a common interview
Median of Data Stream Given a stream of numbers find the median number at any given time accurate to 1 decimal place Example add_number 1 add_number 2 add_number 3 get_median 2 0 add_number 4 get_median 2 5 Try it yourself Explanation Intuition Brute force way is to sort the entire list every time we get a new number This would be O N log N for each add_number operation One step up is to notice that the list is sorted before we add any new number to it So every
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Got a question?ย Ask the Monster Assistantย anything you don't understand.
Still not clear? ย Submitย the part you don't understand to our editors. Or join ourย Discord and ask the community.