295. Find Median from Data Stream
Problem Description
This problem asks you to design a data structure that efficiently finds the median value from a stream of integers.
The median is defined as:
- For an odd-sized list: the middle element when sorted
- For an even-sized list: the average of the two middle elements when sorted
For example:
[2,3,4]
has median3
(the middle element)[2,3]
has median2.5
(average of 2 and 3)
You need to implement a MedianFinder
class with three methods:
MedianFinder()
: Initializes an empty data structureaddNum(int num)
: Adds a new integer from the data stream to your data structurefindMedian()
: Returns the current median of all numbers added so far
The challenge is to maintain the median efficiently as new numbers are continuously added. The solution uses two heaps:
- A max heap (
maxq
) to store the smaller half of numbers (with the largest of the small numbers at the top) - A min heap (
minq
) to store the larger half of numbers (with the smallest of the large numbers at the top)
The key insight is that by maintaining these two heaps with balanced sizes (difference of at most 1), the median can always be found from the top elements of the heaps:
- When both heaps have equal size: median is the average of both tops
- When
minq
has one more element: median is the top ofminq
The addNum
operation maintains this balance by:
- First adding the new number to
maxq
(smaller half) - Moving the largest element from
maxq
tominq
to ensure proper ordering - Rebalancing if
minq
becomes too large by moving its smallest element back tomaxq
This design allows addNum
to run in O(log n)
time and findMedian
to run in O(1)
time.
Intuition
The naive approach would be to store all numbers in a list and sort it every time we need to find the median. However, this would be inefficient with O(n log n)
time for each median query.
The key observation is that to find the median, we don't need the entire sorted array - we only need to know the middle element(s). This leads us to think: what if we could partition the numbers into two halves and only track the "boundary" between them?
Imagine splitting the sorted array into two parts:
- Left half: smaller numbers
- Right half: larger numbers
The median would always be at the boundary between these two halves. If we could efficiently maintain this partition and quickly access the maximum of the left half and minimum of the right half, we could find the median instantly.
This naturally leads to using heaps:
- A max heap for the left half (smaller numbers) - gives us the largest of the small numbers
- A min heap for the right half (larger numbers) - gives us the smallest of the large numbers
Why this works beautifully:
- The top of the max heap is the largest element in the smaller half
- The top of the min heap is the smallest element in the larger half
- These two elements are exactly the middle elements of the entire dataset!
The balancing strategy ensures that:
- Both heaps have roughly equal size (differ by at most 1)
- Every element in the max heap ≤ every element in the min heap
When adding a new number, we maintain these invariants by:
- First adding to the max heap (arbitrarily chosen as the entry point)
- Moving the largest from max heap to min heap (ensures the ordering property)
- Rebalancing if needed (ensures the size property)
This elegant design transforms the problem from "finding median in a sorted array" to "maintaining two balanced heaps", which is much more efficient for a stream of data.
Solution Approach
The implementation uses two heaps to maintain the median efficiently:
Data Structures:
minq
: A min heap storing the larger half of numbersmaxq
: A max heap storing the smaller half of numbers
In Python, we use the heapq
module which only provides min heap functionality. To simulate a max heap, we negate the values when inserting into maxq
.
Initialization (__init__
):
self.minq = [] # Min heap for larger half self.maxq = [] # Max heap for smaller half
Adding Numbers (addNum
):
The algorithm ensures two invariants:
- Every element in
maxq
≤ every element inminq
- Size difference between heaps is at most 1
heappush(self.minq, -heappushpop(self.maxq, -num))
This line does two things:
- First, add
num
tomaxq
(negated to simulate max heap):heappushpop(self.maxq, -num)
- Then immediately pop the maximum from
maxq
and push it tominq
(negated back to positive)
This ensures that even if num
was supposed to go to minq
, it will end up there after this operation, maintaining the ordering invariant.
After this operation, minq
always gains one element. So we check if rebalancing is needed:
if len(self.minq) - len(self.maxq) > 1:
heappush(self.maxq, -heappop(self.minq))
If minq
has more than one extra element compared to maxq
, we move the smallest element from minq
back to maxq
.
Finding Median (findMedian
):
Two cases based on the total number of elements:
-
Even number of elements (both heaps have equal size):
return (self.minq[0] - self.maxq[0]) / 2
The median is the average of:
- Top of
minq
(smallest of larger half):self.minq[0]
- Top of
maxq
(largest of smaller half):-self.maxq[0]
(remember values are negated)
- Top of
-
Odd number of elements (
minq
has one extra element):return self.minq[0]
The median is simply the top of
minq
.
Time Complexity:
addNum
:O(log n)
- heap operationsfindMedian
:O(1)
- just accessing heap tops
Space Complexity: O(n)
- storing all n elements across two heaps
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through adding numbers [5, 3, 8, 2, 10]
one by one and finding the median after each addition.
Initial State:
maxq = []
(max heap for smaller half)minq = []
(min heap for larger half)
Step 1: Add 5
- Add 5 to maxq:
maxq = [-5]
(negated for max heap) - Pop max from maxq (-5) and push to minq:
maxq = [], minq = [5]
- Check balance:
len(minq) - len(maxq) = 1 - 0 = 1
(balanced) - Median = 5 (odd count, return top of minq)
Step 2: Add 3
- Add 3 to maxq:
maxq = [-3]
- Pop max from maxq (-3) and push to minq:
maxq = [], minq = [3, 5]
- Check balance:
len(minq) - len(maxq) = 2 - 0 = 2
(unbalanced!) - Rebalance: Move min from minq to maxq:
maxq = [-3], minq = [5]
- Median = (5 + 3) / 2 = 4 (even count, average of tops)
Step 3: Add 8
- Add 8 to maxq:
maxq = [-8, -3]
- Pop max from maxq (-8) and push to minq:
maxq = [-3], minq = [5, 8]
- Check balance:
len(minq) - len(maxq) = 2 - 1 = 1
(balanced) - Median = 5 (odd count, return top of minq)
- Current sorted order: [3, 5, 8]
Step 4: Add 2
- Add 2 to maxq:
maxq = [-3, -2]
- Pop max from maxq (-3) and push to minq:
maxq = [-2], minq = [3, 8, 5]
→ heap adjusts to[3, 5, 8]
- Check balance:
len(minq) - len(maxq) = 3 - 1 = 2
(unbalanced!) - Rebalance: Move min from minq to maxq:
maxq = [-3, -2], minq = [5, 8]
- Median = (3 + 5) / 2 = 4 (even count, average of tops)
- Current sorted order: [2, 3, 5, 8]
Step 5: Add 10
- Add 10 to maxq:
maxq = [-10, -2, -3]
- Pop max from maxq (-10) and push to minq:
maxq = [-3, -2], minq = [5, 8, 10]
- Check balance:
len(minq) - len(maxq) = 3 - 2 = 1
(balanced) - Median = 5 (odd count, return top of minq)
- Current sorted order: [2, 3, 5, 8, 10]
Notice how the median is always efficiently found from the heap tops without needing to sort all elements!
Solution Implementation
1from heapq import heappush, heappop, heappushpop
2
3class MedianFinder:
4 """
5 A data structure that supports adding numbers and finding the median efficiently.
6 Uses two heaps to maintain the median in O(log n) time for additions and O(1) for retrieval.
7 """
8
9 def __init__(self):
10 """
11 Initialize the MedianFinder with two heaps:
12 - min_heap: stores the larger half of numbers (min heap)
13 - max_heap: stores the smaller half of numbers (implemented as negative min heap)
14 """
15 self.min_heap = [] # Stores larger half of numbers
16 self.max_heap = [] # Stores smaller half of numbers (negated for max heap behavior)
17
18 def addNum(self, num: int) -> None:
19 """
20 Add a number to the data structure.
21
22 Algorithm:
23 1. Push num to max_heap (smaller half)
24 2. Move the largest from max_heap to min_heap (larger half)
25 3. Balance the heaps if min_heap has more than 1 extra element
26
27 Args:
28 num: Integer to add to the data structure
29 """
30 # Add to max_heap first, then move its maximum to min_heap
31 # This ensures all elements in max_heap are <= all elements in min_heap
32 heappush(self.min_heap, -heappushpop(self.max_heap, -num))
33
34 # Balance the heaps: min_heap can have at most 1 more element than max_heap
35 if len(self.min_heap) - len(self.max_heap) > 1:
36 # Move smallest element from min_heap to max_heap
37 heappush(self.max_heap, -heappop(self.min_heap))
38
39 def findMedian(self) -> float:
40 """
41 Find the median of all numbers added so far.
42
43 Returns:
44 The median as a float
45 """
46 # If both heaps have equal size, median is average of both tops
47 if len(self.min_heap) == len(self.max_heap):
48 # min_heap[0] is smallest of larger half
49 # -max_heap[0] is largest of smaller half
50 return (self.min_heap[0] - self.max_heap[0]) / 2.0
51
52 # If unequal size, min_heap has one extra element which is the median
53 return float(self.min_heap[0])
54
55
56# Your MedianFinder object will be instantiated and called as such:
57# obj = MedianFinder()
58# obj.addNum(num)
59# param_2 = obj.findMedian()
60
1class MedianFinder {
2 // Min heap to store the larger half of numbers
3 private PriorityQueue<Integer> minHeap = new PriorityQueue<>();
4
5 // Max heap to store the smaller half of numbers
6 private PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
7
8 /**
9 * Initialize the MedianFinder data structure.
10 * Uses two heaps to maintain the median efficiently:
11 * - maxHeap: stores the smaller half (max element at top)
12 * - minHeap: stores the larger half (min element at top)
13 */
14 public MedianFinder() {
15 }
16
17 /**
18 * Adds a number from the data stream to the data structure.
19 * Maintains the invariant that:
20 * 1. All elements in maxHeap <= all elements in minHeap
21 * 2. The size difference between heaps is at most 1
22 * 3. If odd total elements, minHeap has one extra element
23 *
24 * @param num the number to be added
25 */
26 public void addNum(int num) {
27 // First add to maxHeap (smaller half)
28 maxHeap.offer(num);
29
30 // Move the largest element from maxHeap to minHeap to maintain order
31 minHeap.offer(maxHeap.poll());
32
33 // Balance the heaps - ensure minHeap doesn't have more than 1 extra element
34 if (minHeap.size() - maxHeap.size() > 1) {
35 maxHeap.offer(minHeap.poll());
36 }
37 }
38
39 /**
40 * Returns the median of all elements added so far.
41 * If even number of elements: returns average of middle two elements
42 * If odd number of elements: returns the middle element (from minHeap)
43 *
44 * @return the median as a double value
45 */
46 public double findMedian() {
47 return minHeap.size() == maxHeap.size()
48 ? (minHeap.peek() + maxHeap.peek()) / 2.0
49 : minHeap.peek();
50 }
51}
52
53/**
54 * Your MedianFinder object will be instantiated and called as such:
55 * MedianFinder obj = new MedianFinder();
56 * obj.addNum(num);
57 * double param_2 = obj.findMedian();
58 */
59
1class MedianFinder {
2public:
3 MedianFinder() {
4 // Constructor - no initialization needed as containers are empty by default
5 }
6
7 void addNum(int num) {
8 // Step 1: Always add the new number to the max heap first
9 maxHeap.push(num);
10
11 // Step 2: Move the largest element from max heap to min heap
12 // This ensures all elements in max heap are <= all elements in min heap
13 minHeap.push(maxHeap.top());
14 maxHeap.pop();
15
16 // Step 3: Balance the heaps if necessary
17 // Min heap should have at most one more element than max heap
18 if (minHeap.size() > maxHeap.size() + 1) {
19 maxHeap.push(minHeap.top());
20 minHeap.pop();
21 }
22 }
23
24 double findMedian() {
25 // If both heaps have equal size, median is the average of both tops
26 // Otherwise, median is the top of the larger heap (min heap)
27 return minHeap.size() == maxHeap.size()
28 ? (minHeap.top() + maxHeap.top()) / 2.0
29 : minHeap.top();
30 }
31
32private:
33 // Max heap to store the smaller half of numbers
34 // Top element is the largest in the smaller half
35 priority_queue<int> maxHeap;
36
37 // Min heap to store the larger half of numbers
38 // Top element is the smallest in the larger half
39 priority_queue<int, vector<int>, greater<int>> minHeap;
40};
41
42/**
43 * Your MedianFinder object will be instantiated and called as such:
44 * MedianFinder* obj = new MedianFinder();
45 * obj->addNum(num);
46 * double param_2 = obj->findMedian();
47 */
48
1// Use two heaps to maintain the median of a dynamic stream of numbers
2// minHeap stores the larger half of numbers (min element at top)
3// maxHeap stores the smaller half of numbers (max element at top)
4let minHeap = new MinPriorityQueue();
5let maxHeap = new MaxPriorityQueue();
6
7/**
8 * Adds a number to the data structure
9 * Maintains the property that minHeap contains the larger half of numbers
10 * and maxHeap contains the smaller half, with their sizes differing by at most 1
11 * @param num - The number to be added
12 */
13function addNum(num: number): void {
14 // First, add the number to maxHeap (smaller half)
15 maxHeap.enqueue(num);
16
17 // Move the largest element from maxHeap to minHeap to maintain order
18 // This ensures all elements in maxHeap are <= all elements in minHeap
19 minHeap.enqueue(maxHeap.dequeue().element);
20
21 // Balance the heaps if minHeap has more than 1 extra element
22 // We maintain that minHeap.size() - maxHeap.size() is either 0 or 1
23 if (minHeap.size() - maxHeap.size() > 1) {
24 maxHeap.enqueue(minHeap.dequeue().element);
25 }
26}
27
28/**
29 * Finds the median of all numbers added so far
30 * @returns The median value
31 */
32function findMedian(): number {
33 // If both heaps have equal size, median is the average of their tops
34 if (minHeap.size() === maxHeap.size()) {
35 return (minHeap.front().element + maxHeap.front().element) / 2;
36 }
37
38 // If sizes are unequal, minHeap has one extra element which is the median
39 return minHeap.front().element;
40}
41
Time and Space Complexity
Time Complexity:
-
addNum(num)
:O(log n)
wheren
is the number of elements already in the data structure. This is because:heappushpop
onmaxq
takesO(log n)
timeheappush
onminq
takesO(log n)
time- The conditional rebalancing with
heappop
andheappush
takesO(log n)
time when executed - Overall:
O(log n)
for each insertion
-
findMedian()
:O(1)
since we're only accessing the top elements of the heaps and performing constant-time arithmetic operations
Space Complexity:
O(n)
wheren
is the total number of elements added to the MedianFinder- The space is used to store all
n
elements across two heaps (minq
andmaxq
) - The sum of elements in both heaps equals the total number of elements inserted
Common Pitfalls
1. Incorrect Heap Balancing Logic
Pitfall: A common mistake is to always alternate between heaps or use simple size comparison without considering the actual values. For example:
# WRONG: Simply alternating between heaps
def addNum(self, num):
if len(self.min_heap) <= len(self.max_heap):
heappush(self.min_heap, num)
else:
heappush(self.max_heap, -num)
This breaks the invariant that all elements in max_heap
should be ≤ all elements in min_heap
.
Solution: Always route through one heap first, then transfer to maintain ordering:
# CORRECT: Ensures proper ordering heappush(self.min_heap, -heappushpop(self.max_heap, -num))
2. Forgetting to Negate Values for Max Heap
Pitfall: Python's heapq
only provides min heap. Forgetting to negate values when using it as a max heap leads to incorrect results:
# WRONG: Treating max_heap as regular min heap heappush(self.max_heap, num) # Should be -num median = (self.min_heap[0] + self.max_heap[0]) / 2 # Should be min_heap[0] - max_heap[0]
Solution: Always negate when pushing to and popping from the max heap:
# CORRECT: Proper negation for max heap simulation heappush(self.max_heap, -num) median = (self.min_heap[0] - self.max_heap[0]) / 2
3. Integer Division Instead of Float Division
Pitfall: Using integer division when calculating the median of even-sized lists:
# WRONG in Python 2 or with // operator return (self.min_heap[0] - self.max_heap[0]) // 2 # Integer division
Solution: Use float division or explicitly convert to float:
# CORRECT: Float division
return (self.min_heap[0] - self.max_heap[0]) / 2.0
# OR
return float(self.min_heap[0] - self.max_heap[0]) / 2
4. Incorrect Heap Size Management
Pitfall: Allowing the size difference between heaps to exceed 1, or always keeping min_heap larger:
# WRONG: max_heap could end up with 2+ more elements
if len(self.max_heap) - len(self.min_heap) > 1:
heappush(self.min_heap, -heappop(self.max_heap))
Solution: Consistently maintain that min_heap has either equal or one more element:
# CORRECT: min_heap has at most 1 extra element
if len(self.min_heap) - len(self.max_heap) > 1:
heappush(self.max_heap, -heappop(self.min_heap))
5. Empty Heap Access
Pitfall: Not handling the edge case when finding median with no elements added:
def findMedian(self):
# WRONG: Crashes if called before any addNum
return self.min_heap[0] # IndexError if empty
Solution: Either add a check or ensure the problem constraints guarantee at least one element:
def findMedian(self):
if not self.min_heap and not self.max_heap:
return 0 # or raise an exception
# ... rest of the logic
Problem: Given a list of tasks and a list of requirements, compute a sequence of tasks that can be performed, such that we complete every task once while satisfying all the requirements.
Which of the following method should we use to solve this problem?
Recommended Readings
Tech Interview Pattern Two Pointers Introduction If you prefer videos here's a super quick introduction to Two Pointers div class responsive iframe iframe src https www youtube com embed xZ4AfXHQ1VQ title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture allowfullscreen iframe div Two pointers is a common interview
Median of Data Stream Given a stream of numbers find the median number at any given time accurate to 1 decimal place Example add_number 1 add_number 2 add_number 3 get_median 2 0 add_number 4 get_median 2 5 Try it yourself Explanation Intuition Brute force way is to sort the entire list every time we get a new number This would be O N log N for each add_number operation One step up is to notice that the list is sorted before we add any new number to it So every
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Want a Structured Path to Master System Design Too? Don’t Miss This!