480. Sliding Window Median


Problem Description

The problem involves finding the median of each window of size k in an array called nums. A window is a contiguous subset of the array, and it slides one position to the right on each step, so the subsequent window contains k elements as well, with one new element added on the right and one old element removed from the left.

The median is defined as the middle value of an ordered list. If the size of the list is odd, it's simply the middle value. If the size of the list is even, it's the average of the two middle values.

The task is to output an array containing the median of the array nums within each sliding window.

Intuition

Finding the median requires the elements to be in sorted order. Sorting the entire window for each position of the sliding window would result in a time-consuming algorithm, especially for large windows or arrays.

To optimize this, we can use two heaps: a max-heap to store the smaller half of the elements and a min-heap to store the larger half. By maintaining these two heaps, we can ensure that we have quick access to the median elements at all times. The max-heap allows us to access the largest element in the smaller half, and the min-heap allows us to access the smallest element in the larger half. When the window size k is odd, the median is the top element of the max-heap. When k is even, the median is the average of the top elements of both heaps.

Additional complications arise from the sliding nature of the window: We need to add a new element and remove an old one with each step. To prevent the heaps from growing with duplicates of shifted-out elements, a delayed deletion approach using a hash map (defaultdict) is employed. With this method, elements that should be removed are marked ('delayed'), and actual removal from the heaps occurs lazily, which means we only remove when the element is at the top of the respective heap. Proper rebalancing of heaps is done after each add or remove operation to maintain the half-size property of the two heaps.

The code defines a MedianFinder class to encapsulate this logic, and uses it within the medianSlidingWindow function to generate the array of medians as required by iterating through the entire array, sliding the window one step at a time and efficiently calculating the medians.

Learn more about Sliding Window and Heap (Priority Queue) patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

What is the worst case running time for finding an element in a binary tree (not necessarily binary search tree) of size n?

Solution Approach

The solution uses a class MedianFinder to encapsulate the logic for finding medians in a sliding window. Here's a step-by-step breakdown of how the MedianFinder class and the solution function medianSlidingWindow works:

  1. Initialization: The MedianFinder object initializes two heaps: self.small (a max-heap) to store the smaller half of the numbers and self.large (a min-heap) to store the larger half. It also maintains a hash map self.delayed to keep track of the numbers that have been "logically" removed but not physically removed from the heaps. self.small_size and self.large_size help track the size of the two halves.

  2. Adding Numbers: When a new number is added (add_num), it is compared with the smallest number in self.large. If it is smaller or equal, it should belong to the self.small heap, and it's added there; otherwise, it's added to self.large. Once a number is added, rebalance is called to make sure that both heaps have roughly the same number of elements, allowing for a maximum size difference of one.

  3. Rebalancing the Heaps: The rebalance method ensures that the number of elements in self.small and self.large are either equal or that self.small has one more element than self.large if k is odd. This is achieved by transferring the top element from one heap to the other when the size difference condition is violated.

  4. Finding the Median: The find_median method calculates the median based on the sizes of the heaps. If k is odd, the median is the top element of self.small. If k is even, the median is the average of the top elements of self.small and self.large.

  5. Removing Numbers: When moving the window, numbers that exit the window need to be removed (remove_num). To avoid costly removal operations, a lazy deletion approach is used where the number is marked for deletion in the self.delayed hash map but not immediately removed from the heap. The actual removal happens in the prune method, which is triggered when the number to be removed is at the top of the respective heap.

  6. Calculating Medians in Sliding Window: The medianSlidingWindow function creates a MedianFinder instance and then slides the window across the nums array. Initially, the first k numbers are added to the MedianFinder, from where the first median is obtained. As the window slides, the function adds a new element and removes the leftmost element of the previous window, then calculates the new median and appends it to the ans list.

  7. Time Complexity: Each add and remove operation can be done in logarithmic time regarding the total number of elements in the heaps. Consequently, the entire sliding window median calculation takes O(n log k) time, where n is the number of elements in nums and k is the size of the sliding window.

  8. Space Complexity: The space used is O(k) for storing elements in the heaps and the hash map for delayed removals.

By following this approach, medians of sliding windows can be calculated efficiently without resorting to sorting the window each time it slides, thus saving computational time and making the algorithm suitable for large datasets.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Given an array of 1,000,000 integers that is almost sorted, except for 2 pairs of integers. Which algorithm is fastest for sorting the array?

Example Walkthrough

Let's take a simple example to illustrate the solution approach. Consider the array nums = [1,3,-1,-3,5,3,6,7] and a window size k = 3. The goal is to find the median for each window of size 3 as we slide through nums.

Here's how the algorithm process would work:

Initial Preparation

  • Instantiate a MedianFinder object.
  • Heaps are initialized but empty. self.small is the max-heap, and self.large is the min-heap.

First Window: [1,3,-1]

  • Add 1, 3, -1 to the MedianFinder. After adding and rebalancing:
    • self.small = [1], self.large = [-1, 3]
  • Calculate the median and append to results. The median is 1.

Slide to Second Window: [3,-1,-3]

  • Remove 1 (by marking it delayed).
  • Add 3 and -3:
    • 3 is added to self.large since it's not smaller than the smallest in self.large which is -1.
    • -3 is added to self.small after comparison with self.large.
  • self.small = [-3, 1], (1 is delayed) self.large = [-1, 3, 3] after rebalancing.
  • Calculate the median. The median is -1.

Slide to Third Window: [-1,-3,5]

  • Remove 3 (mark it delayed).
  • Add 5:
    • 5 is added to self.large since it's larger than the smallest in self.large.
  • Rebalance the heaps.
  • self.small = [-3, -1], self.large = [5, 3, 3] (two 3's are marked delayed) after rebalancing.
  • Calculate the median. The median is -1.

Slide to Fourth Window: [-3,5,3]

  • Continue this process until the end of the array.

Final Results

  • Once we apply the above operations for each window, we get the list of medians = [1, -1, -1, 3, 5, 6].

Efficiency and Complexity

  • This process avoids full sorting of each window:
    • Adding and removing elements and rebalancing takes O(log k) time due to heaps.
    • The process is repeated for each of the n - k + 1 windows.
    • Thus, the overall time complexity is O(n log k).
  • Space complexity remains O(k) due to storage in heaps and the delayed removal hash map.

By using the MedianFinder class and the medianSlidingWindow method, we can see how to efficiently calculate the median of each sliding window by leveraging min-heaps, max-heaps, and delayed removals, avoiding the inefficiency of sorting each window.

Solution Implementation

1import heapq
2from collections import defaultdict
3from typing import List
4
5class MedianFinder:
6    def __init__(self, k: int):
7        self.k = k  # size of the sliding window
8        self.small_max_heap = []  # max heap to store the smaller half of numbers
9        self.large_min_heap = []  # min heap to store the larger half of numbers
10        self.delayed = defaultdict(int)  # count of delayed elements for lazy deletion
11        self.small_size = 0  # size of small_max_heap (not necessarily len(small_max_heap))
12        self.large_size = 0  # size of large_min_heap (not necessarily len(large_min_heap))
13
14    def add_num(self, num: int):
15        # Add a number to one of the heaps
16        if not self.small_max_heap or num <= -self.small_max_heap[0]:
17            heapq.heappush(self.small_max_heap, -num)
18            self.small_size += 1
19        else:
20            heapq.heappush(self.large_min_heap, num)
21            self.large_size += 1
22        self.rebalance()
23
24    def find_median(self) -> float:
25        # Find the median of the current numbers
26        return -self.small_max_heap[0] if self.k % 2 == 1 else (-self.small_max_heap[0] + self.large_min_heap[0]) / 2
27
28    def remove_num(self, num: int):
29        # Lazily remove a number (the number will be removed later during pruning)
30        self.delayed[num] += 1
31        if num <= -self.small_max_heap[0]:
32            self.small_size -= 1
33            if num == -self.small_max_heap[0]:
34                self.prune(self.small_max_heap)
35        else:
36            self.large_size -= 1
37            if num == self.large_min_heap[0]:
38                self.prune(self.large_min_heap)
39        self.rebalance()
40
41    def prune(self, heap: List[int]):
42        # Remove elements that have been flagged for removal
43        while heap and self.delayed[heap[0]] > 0:
44            self.delayed[heap[0]] -= 1
45            if self.delayed[heap[0]] == 0:
46                del self.delayed[heap[0]]
47            heapq.heappop(heap)
48
49    def rebalance(self):
50        # Balance the two heaps to maintain the invariant small_max_heap.size() > large_min_heap.size()
51        while self.small_size > self.large_size + 1:
52            heapq.heappush(self.large_min_heap, -heapq.heappop(self.small_max_heap))
53            self.small_size -= 1
54            self.large_size += 1
55            self.prune(self.small_max_heap)
56        while self.small_size < self.large_size:
57            heapq.heappush(self.small_max_heap, -heapq.heappop(self.large_min_heap))
58            self.large_size -= 1
59            self.small_size += 1
60            self.prune(self.large_min_heap)
61
62class Solution:
63    def median_sliding_window(self, nums: List[int], k: int) -> List[float]:
64        finder = MedianFinder(k)
65        for x in nums[:k]:
66            finder.add_num(x)
67        medians = [finder.find_median()]
68        for i in range(k, len(nums)):
69            finder.add_num(nums[i])
70            finder.remove_num(nums[i - k])
71            medians.append(finder.find_median())
72        return medians
73
1class MedianFinder {
2    private PriorityQueue<Integer> minHeap = new PriorityQueue<>(); // holds all the larger elements
3    private PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder()); // holds the smaller elements, reversed to act as a max-heap
4    private Map<Integer, Integer> countMap = new HashMap<>(); // keeps track of the number of instances a number has been delayed for removal
5    private int minHeapSize = 0;
6    private int maxHeapSize = 0;
7    private final int windowSize; // the size of the sliding window
8
9    public MedianFinder(int k) {
10        this.windowSize = k;
11    }
12
13    public void addNum(int num) {
14        // Add the number to the appropriate heap
15        if (maxHeap.isEmpty() || num <= maxHeap.peek()) {
16            maxHeap.offer(num);
17            maxHeapSize++;
18        } else {
19            minHeap.offer(num);
20            minHeapSize++;
21        }
22        rebalanceHeaps(); // Ensure both heaps remain in a valid state
23    }
24
25    public double findMedian() {
26        // If odd window size, the median is the top of the max heap, otherwise, the median is the average of the tops of both heaps
27        return (windowSize & 1) == 1 ? maxHeap.peek() : ((double) maxHeap.peek() + minHeap.peek()) / 2.0;
28    }
29
30    public void removeNum(int num) {
31        countMap.put(num, countMap.getOrDefault(num, 0) + 1); // Mark the number for delayed removal
32        if (num <= maxHeap.peek()) {
33            maxHeapSize--;
34            if (num == maxHeap.peek()) {
35                pruneHeap(maxHeap); // Remove elements that should be delayed from the maxHeap
36            }
37        } else {
38            minHeapSize--;
39            if (num == minHeap.peek()) {
40                pruneHeap(minHeap); // Remove elements that should be delayed from the minHeap
41            }
42        }
43        rebalanceHeaps(); // Ensure both heaps remain in a valid state
44    }
45
46    private void pruneHeap(PriorityQueue<Integer> heap) {
47        // Remove elements from the heap that were marked for delayed removal
48        while (!heap.isEmpty() && countMap.containsKey(heap.peek())) {
49            int heapTop = heap.peek();
50            countMap.put(heapTop, countMap.get(heapTop) - 1);
51
52            if (countMap.get(heapTop) == 0) {
53                countMap.remove(heapTop); // remove the entry from the map if the count goes to zero
54            }
55
56            heap.poll(); // remove the top element in the heap as it's already accounted for in the delayed removal
57        }
58    }
59
60    private void rebalanceHeaps() {
61        // Maintain the property that maxHeap has the same number of elements as minHeap, or 1 more
62        if (maxHeapSize > minHeapSize + 1) {
63            minHeap.offer(maxHeap.poll());
64            maxHeapSize--;
65            minHeapSize++;
66            pruneHeap(maxHeap); // Ensure maxHeap is pruned
67        } else if (maxHeapSize < minHeapSize) {
68            maxHeap.offer(minHeap.poll());
69            minHeapSize--;
70            maxHeapSize++;
71            pruneHeap(minHeap); // Ensure minHeap is pruned
72        }
73    }
74}
75
76class Solution {
77    public double[] medianSlidingWindow(int[] nums, int k) {
78        MedianFinder finder = new MedianFinder(k);
79        // Populate the MedianFinder with the initial sliding window
80        for (int i = 0; i < k; ++i) {
81            finder.addNum(nums[i]);
82        }
83
84        int n = nums.length;
85        double[] medians = new double[n - k + 1]; // Array to hold the medians
86        medians[0] = finder.findMedian(); // Find the median for the first window
87
88        // Move the sliding window and calculate the median for each window
89        for (int i = k; i < n; ++i) {
90            finder.addNum(nums[i]); // Add the new element to the sliding window
91            finder.removeNum(nums[i - k]); // Remove the oldest element from the sliding window
92            medians[i - k + 1] = finder.findMedian(); // Find the median for the current window
93        }
94        return medians; // Return the array of medians
95    }
96}
97
1#include <functional>
2#include <queue>
3#include <unordered_map>
4#include <vector>
5
6class MedianFinder {
7public:
8    // Constructor that initializes the finder with a window size 'k'.
9    MedianFinder(int k) : windowSize(k), smallSize(0), largeSize(0) {}
10
11    // Adds a number into the data structure.
12    void addNum(int num) {
13        if (small.empty() || num <= small.top()) {
14            small.push(num);
15            ++smallSize;
16        } else {
17            large.push(num);
18            ++largeSize;
19        }
20        // Ensure the heaps are balanced after the insertion.
21        rebalanceHeaps();
22    }
23
24    // Removes a number from the data structure.
25    void removeNum(int num) {
26        delayed[num]++;
27        if (num <= small.top()) {
28            --smallSize;
29            if (num == small.top()) {
30                prune(small);
31            }
32        } else {
33            --largeSize;
34            if (num == large.top()) {
35                prune(large);
36            }
37        }
38        // Ensure the heaps are balanced after the removal.
39        rebalanceHeaps();
40    }
41
42    // Returns the median of current numbers.
43    double findMedian() {
44        if (windowSize & 1) {
45            return small.top();
46        } else {
47            return (static_cast<double>(small.top()) + large.top()) * 0.5;
48        }
49    }
50
51private:
52    // Max heap to store the smaller half of the elements
53    std::priority_queue<int> small;
54    // Min heap to store the larger half of the elements
55    std::priority_queue<int, std::vector<int>, std::greater<int>> large;
56    // Hashmap to keep track of delayed removals when numbers are out of the current window
57    std::unordered_map<int, int> delayed;
58    // Sizes of the heaps
59    int smallSize, largeSize;
60    // Window size
61    int windowSize;
62
63    // Prunes/removes elements from the heap that should be delayed-removed.
64    template <typename T>
65    void prune(T& heap) {
66        while (!heap.empty() && delayed[heap.top()]) {
67            if (--delayed[heap.top()] == 0) {
68                delayed.erase(heap.top());
69            }
70            heap.pop();
71        }
72    }
73
74    // Balances the two heaps to maintain the median property.
75    void rebalanceHeaps() {
76        if (smallSize > largeSize + 1) {
77            large.push(small.top());
78            small.pop();
79            --smallSize;
80            ++largeSize;
81            // Re-prune in case the top element is meant to be delayed-removed.
82            prune(small);
83        } else if (smallSize < largeSize) {
84            small.push(large.top());
85            large.pop();
86            ++smallSize;
87            --largeSize;
88            // Re-prune in case the top element is meant to be delayed-removed.
89            prune(large);
90        }
91    }
92};
93
94class Solution {
95public:
96    // Calculates the median for each window of size 'k' in the array.
97    std::vector<double> medianSlidingWindow(std::vector<int>& nums, int k) {
98        MedianFinder finder(k);
99        // Initialize by adding the first 'k' elements.
100        for (int i = 0; i < k; ++i) {
101            finder.addNum(nums[i]);
102        }
103        // Store the first median.
104        std::vector<double> medians = {finder.findMedian()};
105        // Slide the window, one element at a time, maintaining the median.
106        for (int i = k; i < nums.size(); ++i) {
107            finder.addNum(nums[i]); // Add the new element to the window.
108            finder.removeNum(nums[i - k]); // Remove the element that is no longer in the window.
109            // Store the new median.
110            medians.push_back(finder.findMedian());
111        }
112        return medians;
113    }
114};
115
1// Create an alias for the comparison function type.
2type CompareFunction<T> = (a: T, b: T) => number;
3
4// Define PriorityQueue class template.
5class PriorityQueue<T> {
6    private data: T[] = [];
7    private compare: CompareFunction<T>;
8
9    constructor(compareFunction: CompareFunction<T>) {
10        this.compare = compareFunction;
11    }
12
13    private leftChildIndex(parentIndex: number): number {
14        return 2 * parentIndex + 1;
15    }
16
17    private rightChildIndex(parentIndex: number): number {
18        return 2 * parentIndex + 2;
19    }
20
21    private parentIndex(childIndex: number): number {
22        return Math.floor((childIndex - 1) / 2);
23    }
24
25    private hasLeftChild(index: number): boolean {
26        return this.leftChildIndex(index) < this.data.length;
27    }
28
29    private hasRightChild(index: number): boolean {
30        return this.rightChildIndex(index) < this.data.length;
31    }
32
33    private hasParent(index: number): boolean {
34        return this.parentIndex(index) >= 0;
35    }
36
37    private leftChild(index: number): T {
38        return this.data[this.leftChildIndex(index)];
39    }
40
41    private rightChild(index: number): T {
42        return this.data[this.rightChildIndex(index)];
43    }
44
45    private parent(index: number): T {
46        return this.data[this.parentIndex(index)];
47    }
48
49    private swap(indexOne: number, indexTwo: number): void {
50        let temp = this.data[indexOne];
51        this.data[indexOne] = this.data[indexTwo];
52        this.data[indexTwo] = temp;
53    }
54
55    private heapifyUp(): void {
56        let index = this.data.length - 1;
57        while (this.hasParent(index) && this.compare(this.parent(index), this.data[index]) > 0) {
58            this.swap(this.parentIndex(index), index);
59            index = this.parentIndex(index);
60        }
61    }
62
63    private heapifyDown(): void {
64        let index = 0;
65        while (this.hasLeftChild(index)) {
66            let smallerChildIndex = this.leftChildIndex(index);
67            if (this.hasRightChild(index) && this.compare(this.rightChild(index), this.leftChild(index)) < 0) {
68                smallerChildIndex = this.rightChildIndex(index);
69            }
70
71            if (this.compare(this.data[index], this.data[smallerChildIndex]) < 0) {
72                break;
73            } else {
74                this.swap(index, smallerChildIndex);
75            }
76            index = smallerChildIndex;
77        }
78    }
79
80    public isEmpty(): boolean {
81        return this.data.length === 0;
82    }
83
84    public peek(): T | undefined {
85        if (this.data.length === 0) return undefined;
86        return this.data[0];
87    }
88
89    public push(value: T): void {
90        this.data.push(value);
91        this.heapifyUp();
92    }
93
94    public pop(): T | undefined {
95        if (this.data.length === 0) return undefined;
96        if (this.data.length === 1) return this.data.pop();
97
98        const item = this.data[0];
99        this.data[0] = this.data.pop()!;
100        this.heapifyDown();
101        return item;
102    }
103
104    public size(): number {
105        return this.data.length;
106    }
107}
108
109// A max heap comparator for numbers.
110const maxHeapCompare: CompareFunction<number> = (a, b) => b - a;
111
112// A min heap comparator for numbers.
113const minHeapCompare: CompareFunction<number> = (a, b) => a - b;
114
115// Small values are stored in a max heap.
116const small: PriorityQueue<number> = new PriorityQueue(maxHeapCompare);
117// Large values are stored in a min heap.
118const large: PriorityQueue<number> = new PriorityQueue(minHeapCompare);
119
120// Map to keep track of values that need to be pruned from heaps.
121const delayed = new Map<number, number>();
122
123let windowSize = 0;
124let smallSize = 0;
125let largeSize = 0;
126
127function addNum(num: number): void {
128    if (small.isEmpty() || num <= small.peek()!) {
129        small.push(num);
130        smallSize++;
131    } else {
132        large.push(num);
133        largeSize++;
134    }
135    rebalanceHeaps();
136}
137
138function removeNum(num: number): void {
139    delayed.set(num, (delayed.get(num) || 0) + 1);
140    if (num <= small.peek()!) {
141        smallSize--;
142        if (num === small.peek()) {
143            prune(small);
144        }
145    } else {
146        largeSize--;
147        if (num === large.peek()) {
148            prune(large);
149        }
150    }
151    rebalanceHeaps();
152}
153
154function findMedian(): number {
155    if (windowSize % 2 === 1) {
156        return small.peek()!;
157    } else {
158        return (small.peek()! + large.peek()!) / 2.0;
159    }
160}
161
162function prune(heap: PriorityQueue<number>): void {
163    while (!heap.isEmpty() && delayed.get(heap.peek()!)! > 0) {
164        delayed.set(heap.peek()!, delayed.get(heap.peek()!)! - 1);
165        if (delayed.get(heap.peek()!) === 0) {
166            delayed.delete(heap.peek()!);
167        }
168        heap.pop();
169    }
170}
171
172function rebalanceHeaps(): void {
173    if (smallSize > largeSize + 1) {
174        large.push(small.pop()!);
175        smallSize--;
176        largeSize++;
177        prune(small);
178    } else if (smallSize < largeSize) {
179        small.push(large.pop()!);
180        smallSize++;
181        largeSize--;
182        prune(large);
183    }
184}
185
186function medianSlidingWindow(nums: number[], k: number): number[] {
187    windowSize = k;
188    const medians: number[] = [];
189
190    for (let i = 0; i < k; ++i) {
191        addNum(nums[i]);
192    }
193
194    medians.push(findMedian());
195
196    for (let i = k; i < nums.length; ++i) {
197        addNum(nums[i]);
198        removeNum(nums[i - k]);
199        medians.push(findMedian());
200    }
201
202    return medians;
203}
204
Not Sure What to Study? Take the 2-min Quiz:

What is the worst case running time for finding an element in a binary tree (not necessarily binary search tree) of size n?

Time and Space Complexity

Time Complexity: The primary operations involving computational complexity in the MedianFinder class are add_num, find_median, remove_num, prune, and rebalance. Each of these functions relies on heap operations which generally have a time complexity of O(log n) for insertion and removal, where n is the number of elements in the heap.

  • The add_num function inserts an element into one of the heaps. This operation is O(log k), as at most one element is added to a heap with no more than k elements.
  • The remove_num function involves decrementing the counter and possibly removing the top element from a heap which also has a complexity of O(log k).
  • The prune function uses a while loop which can, in the worst-case scenario, run as many times as there are elements in pq. However, each pop operation is O(log k). The actual frequency of pruning depends on the elements being removed and the state of the delayed dictionary.
  • The rebalance function ensures both heaps remain balanced after adding or removing an element. It may call heappush and heappop at most twice if the sizes of heaps differ by more than one. Hence, rebalancing operations are also O(log k) each.

The medianSlidingWindow function in the Solution class calls the add_num and remove_num methods of MedianFinder for each element in nums. Since nums has n elements, and each operation is O(log k), the overall time complexity for processing the sliding window across all the elements is O(n log k).

Space Complexity: The space complexity mainly comes from the data stored in MedianFinder, specifically the two heaps and the delayed elements dictionary.

  • Two heaps are maintained, one for the smaller half and one for the larger half of the numbers, and they can contain up to k elements combined. Therefore, the space complexity for the heaps is O(k).
  • The defaultdict, named delayed, holds the count of the elements delayed for removal. In the worst case, this could hold each unique number once, so the space complexity for delayed is O(k) (assuming the sliding window could contain up to k unique elements).

Combined, the space complexity of the MedianFinder class is O(k).

Considering both the heaps and the additional structures like delayed, the overall space complexity of the algorithm is O(k).

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which of the following is equvalent to O(3*2^n + n^3 + n!+ log n)?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫