703. Kth Largest Element in a Stream


Problem Description

In this problem, we are asked to design a KthLargest class that can determine the kth largest element in a stream of numbers. Importantly, when we refer to the kth largest element, we mean according to sorted order, not that the element is distinct from the others. The numbers can be repeated in the stream.

The class should be able to handle two types of actions:

  1. Initialization (__init__): When an instance of the class is created, it should be initialized with an integer k and an array of integers nums. The k represents the position of the largest element we are interested in, and nums is the initial stream of numbers.

  2. Adding New Elements (add): The class should provide a method to add a new integer val to the stream. After adding the new integer, this method should return the current kth largest element from the stream.

Intuition

The straightforward approach to find the kth largest element would be to sort the stream of numbers each time we insert a new element and then pick the kth largest. However, sorting every time we insert a new element leads to a less efficient solution. To solve the problem more optimally, we can use a data structure that does the hard work for us - a Min Heap.

The reason behind choosing a Min Heap is its useful property: the smallest element is always at the root and can be accessed in constant time. If we maintain a Min Heap of exactly k largest elements from the current stream, the root of the heap (the smallest element in the heap) is our desired kth largest element.

Here's how we construct our solution using this intuition:

  1. Initialize a Min Heap (self.q) and a variable self.size to store k.
  2. Insert the initial stream of numbers to the Min Heap using the add method.
    • During each insert, we push the new value into the heap.
    • If the heap size exceeds k, it means we have more than k elements, so we can remove the smallest one (the root).
    • The kth largest element will then be the new root of the heap.
  3. The add method maintains the Min Heap invariant and ensures it always contains the k largest elements after each insertion.
  4. Finally, we can return the kth largest element by looking at the root of the Min Heap.

Using this approach, we avoid the need to sort the entire stream every time, thereby improving the efficiency of finding the kth largest element in the stream.

Learn more about Tree, Binary Search Tree, Binary Tree, Data Stream and Heap (Priority Queue) patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece๏ผš

Consider the classic dynamic programming of longest increasing subsequence:

Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.

For example, the length of LIS for [50, 3, 10, 7, 40, 80] is 4 and LIS is [3, 7, 40, 80].

What is the recurrence relation?

Solution Approach

To implement the solution, we employ Python's heapq module, which provides an implementation of the Min Heap data structure.

Here's a step-by-step breakdown of the implementation:

  1. When an instance of KthLargest class is created, the __init__ method is called with two parameters: k and nums.

    • k is stored in self.size which represents the position of the kth largest element we are interested in.
    • self.q is initialized as an empty list, which will be used as the Min Heap.
  2. The nums list is iterated over and each number is added to the Min Heap using the add method.

  3. The add method handles inserting a new element into the Min Heap and ensures the Min Heap contains only the k largest elements.

    • heappush(self.q, val) is used to add the new value val to the Min Heap.
    • We check if the size of the heap exceeds k by comparing len(self.q) to self.size.
    • If the heap size is greater than k, we remove the smallest element (root of the Min Heap) using heappop(self.q) to ensure that only the k largest elements remain.
    • This keeps the heap size at max k elements at all times.
    • The root element, which can be accessed with self.q[0], is the kth largest element because all elements bigger than it are also in the heap.

By following this approach, we ensure that:

  • The add operation has a time complexity of O(log k) since it involves heap operations which work in logarithmic time relative to the number of elements in the heap.
  • The space complexity of the solution is O(k) because the Min Heap holds at most k elements at any given time.

This provides a balanced and efficient way to continuously find the kth largest element in the stream without having to sort the entire array each time a new element is added.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

A person thinks of a number between 1 and 1000. You may ask any number questions to them, provided that the question can be answered with either "yes" or "no".

What is the minimum number of questions you needed to ask so that you are guaranteed to know the number that the person is thinking?

Example Walkthrough

Let us consider an example to illustrate the solution approach.

Suppose k is 3, and the initial array of numbers nums is [4, 5, 8, 2]. We want to design a KthLargest class where we can add elements and always be able to find the 3rd largest element in the current stream.

Here is a step-by-step walkthrough of how the KthLargest class processes the stream of numbers:

  1. We create an instance of KthLargest by passing in our values k=3 and nums=[4,5,8,2]. The __init__ method initializes self.size to 3 and self.q as an empty Min Heap.

  2. We add each element of nums into the Min Heap using the add method, ensuring that after every addition, the Min Heap holds at most k largest elements. Initially, the Min Heap (self.q) is empty, and we proceed as follows:

    • Add 4: self.q becomes [4].
    • Add 5: self.q becomes [4, 5].
    • Add 8: self.q becomes [4, 5, 8].
    • Add 2: self.q remains [4, 5, 8] because 2 is not among the 3 largest elements.

Now the Min Heap contains k largest elements, the kth largest (3rd largest) among them is 4 (the root of the Min Heap).

  1. If we call the add method to insert a new element, let's say 3, we again ensure that the Min Heap only contains the k largest elements.

    • Add 3: self.q remains [4, 5, 8] because 3 is not among the 3 largest elements. The 3rd largest element is still 4.
  2. Let's insert a larger element, 9. The Min Heap needs to accommodate this change, and the smallest element in the heap should be removed if the heap's size exceeds k.

    • Add 9: self.q becomes [5, 8, 9] after removing 4 because the size exceeded k. Now the 3rd largest element is 5.

So after adding the element 9, our Min Heap has elements [5, 8, 9], and the 3rd largest (kth largest) element is 5.

The class now effectively and efficiently maintains the 3rd largest element as we add new values into the stream. The add method updates the Min Heap in logarithmic time, making the operation swift compared to the linear time complexity of sorting the numbers upon each insertion.

Solution Implementation

1from heapq import heappush, heappop
2
3class KthLargest:
4    def __init__(self, k: int, nums: List[int]):
5        # Initialize a min heap to store the k largest elements
6        self.min_heap = []
7        # Store the size k to know the kth largest value
8        self.k = k
9        # Add the initial elements to the heap
10        for num in nums:
11            self.add(num)
12
13    def add(self, val: int) -> int:
14        # Add the new value to the min heap
15        heappush(self.min_heap, val)
16        # If the size of the heap exceeds k, remove the smallest element
17        if len(self.min_heap) > self.k:
18            heappop(self.min_heap)
19        # The root of the min heap is the kth largest value
20        return self.min_heap[0]
21
22# Example usage:
23# kthLargest = KthLargest(3, [4, 5, 8, 2])
24# print(kthLargest.add(3))   # returns 4
25# print(kthLargest.add(5))   # returns 5
26# print(kthLargest.add(10))  # returns 5
27# print(kthLargest.add(9))   # returns 8
28# print(kthLargest.add(4))   # returns 8
29
1import java.util.PriorityQueue;
2
3// Class to find the kth largest element in a stream of numbers
4class KthLargest {
5
6    // Priority queue to maintain the smallest 'k' elements seen so far
7    private PriorityQueue<Integer> minHeap;
8
9    // The kth position we are interested in for the largest element
10    private int k;
11
12    /**
13     * Constructor to initialize the data structure and populate with initial elements.
14     *
15     * @param k    The kth position to track in the list of largest elements.
16     * @param nums An array of initial numbers to be added to the kth largest tracker.
17     */
18    public KthLargest(int k, int[] nums) {
19        // Initialize a min-heap with the capacity to hold 'k' elements
20        this.minHeap = new PriorityQueue<>(k);
21        this.k = k;
22
23        // Add the initial elements to the kth largest tracker
24        for (int num : nums) {
25            add(num);
26        }
27    }
28
29    /**
30     * Adds a new number into the stream and returns the kth largest element.
31     *
32     * @param val The new number to be added to the stream.
33     * @return The kth largest element after adding the new number.
34     */
35    public int add(int val) {
36        // Always add the new value to the min-heap
37        minHeap.offer(val);
38
39        // If the size of the min-heap exceeds 'k', remove the smallest element
40        if (minHeap.size() > k) {
41            minHeap.poll();
42        }
43
44        // The root of the min-heap represents the kth largest element
45        return minHeap.peek();
46    }
47}
48
49/* Usage example:
50 * KthLargest kthLargest = new KthLargest(3, new int[]{4, 5, 8, 2});
51 * kthLargest.add(3);   // returns 4
52 * kthLargest.add(5);   // returns 5
53 * kthLargest.add(10);  // returns 5
54 * kthLargest.add(9);   // returns 8
55 * kthLargest.add(4);   // returns 8
56 */
57
1#include <queue>
2#include <vector>
3
4// Define a class to find kth largest element in a stream of numbers.
5class KthLargest {
6public:
7    // Declare a min heap to keep track of the k largest elements.
8    std::priority_queue<int, std::vector<int>, std::greater<>> minHeap;
9    // Store the value of 'k' to know which largest element to keep track of.
10    int kthSize;
11
12    // Constructor for initializing the KthLargest class.
13    // k - The kth position to track.
14    // nums - Initial list of numbers from which we find the kth largest element.
15    KthLargest(int k, std::vector<int>& nums) {
16        kthSize = k;
17        // Add initial numbers to the heap
18        for (int num : nums) {
19            add(num);
20        }
21    }
22
23    // Function to add a number to the stream and return the kth largest element.
24    int add(int val) {
25        // Add the new number to the min heap.
26        minHeap.push(val);
27        // If the heap size is greater than k, remove the smallest element.
28        if (minHeap.size() > kthSize) {
29            minHeap.pop();
30        }
31        // Return the kth largest element (top of the min heap).
32        return minHeap.top();
33    }
34};
35
36/**
37 * The provided code snippet illustrates the use of the KthLargest class.
38 * It demonstrates the instantiation of a KthLargest object and subsequent calls to its add method.
39 */
40
1type ComparatorFunction = (a: number, b: number) => number;
2
3let kthLargestSize: number;
4let minHeap: MinHeap;
5
6interface MinHeap {
7    data: number[];
8    comparator: ComparatorFunction;
9    heapify: () => void;
10    peek: () => number | null;
11    offer: (value: number) => void;
12    poll: () => number | null;
13    bubbleUp: (index: number) => void;
14    bubbleDown: (index: number) => void;
15    swap: (index1: number, index2: number) => void;
16    size: () => number;
17}
18
19// Initializing the Kth largest element finder
20function KthLargestInit(k: number, nums: number[]): void {
21    kthLargestSize = k;
22    minHeap = createMinHeap();
23    nums.forEach(num => {
24        kthLargestAdd(num);
25    });
26}
27
28// Function to add a new value to the data collection
29function kthLargestAdd(val: number): number {
30    minHeap.offer(val);
31    if (minHeap.size() > kthLargestSize) {
32        minHeap.poll();
33    }
34    return minHeap.peek()!;
35}
36
37// Create a MinHeap with default values and a comparator
38function createMinHeap(): MinHeap {
39    const heap: MinHeap = {
40        data: [],
41        comparator: (a, b) => a - b,
42        heapify() {
43            if (this.size() < 2) return;
44            for (let i = 1; i < this.size(); i++) {
45                this.bubbleUp(i);
46            }
47        },
48        peek() {
49            return this.size() === 0 ? null : this.data[0];
50        },
51        // Adds a new value and bubbles it up to maintain heap property
52        offer(value) {
53            this.data.push(value);
54            this.bubbleUp(this.size() - 1);
55        },
56        // Removes and returns the root value of the heap
57        poll() {
58            if (this.size() === 0) {
59                return null;
60            }
61            const result = this.data[0];
62            const last = this.data.pop();
63            if (this.size() !== 0) {
64                this.data[0] = last!;
65                this.bubbleDown(0);
66            }
67            return result;
68        },
69        // Bubbles a value up the heap until the heap property is restored
70        bubbleUp(index) {
71            while (index > 0) {
72                const parentIndex = (index - 1) >> 1;
73                if (this.comparator(this.data[index], this.data[parentIndex]) < 0) {
74                    this.swap(index, parentIndex);
75                    index = parentIndex;
76                } else {
77                    break;
78                }
79            }
80        },
81        // Bubbles a value down the heap until the heap property is restored
82        bubbleDown(index) {
83            const lastIndex = this.size() - 1;
84            while (true) {
85                let findIndex = index;
86                const leftIndex = index * 2 + 1;
87                const rightIndex = index * 2 + 2;
88                if (
89                    leftIndex <= lastIndex &&
90                    this.comparator(this.data[leftIndex], this.data[findIndex]) < 0
91                ) {
92                    findIndex = leftIndex;
93                }
94                if (
95                    rightIndex <= lastIndex &&
96                    this.comparator(this.data[rightIndex], this.data[findIndex]) < 0
97                ) {
98                    findIndex = rightIndex;
99                }
100                if (index !== findIndex) {
101                    this.swap(index, findIndex);
102                    index = findIndex;
103                } else {
104                    break;
105                }
106            }
107        },
108        // Swaps two values in the heap
109        swap(index1, index2) {
110            [this.data[index1], this.data[index2]] = [this.data[index2], this.data[index1]];
111        },
112        // Returns the number of elements in the heap
113        size() {
114            return this.data.length;
115        }
116    };
117
118    heap.heapify();
119    return heap;
120}
121
122// Example of usage
123KthLargestInit(3, [4, 5, 8, 2]);
124console.log(kthLargestAdd(3)); // Should return the kth largest element after adding 3
125console.log(kthLargestAdd(5)); // Should return the kth largest element after adding 5
126console.log(kthLargestAdd(10)); // Should return the kth largest element after adding 10
127console.log(kthLargestAdd(9)); // Should return the kth largest element after adding 9
128console.log(kthLargestAdd(4)); // Should return the kth largest element after adding 4
129
Not Sure What to Study? Take the 2-min Quiz๏ผš

Which algorithm is best for finding the shortest distance between two points in an unweighted graph?

Time and Space Complexity

Time Complexity:

  • The __init__(self, k: int, nums: List[int]) method has a time complexity of O(n * log(k)) where n is the length of the nums list. This is because for every element in nums, the add operation is called which takes O(log(k)) time due to the heap operation and we perform this n times.

  • The add(self, val: int) -> int method has a time complexity of O(log(k)). This is because the heappush operation could take up to O(log(k)) time to maintain the heap properties when adding a new element, and similarly heappop operation, which is called only when the heap size exceeds k, takes O(log(k)).

Space Complexity:

  • The space complexity of the whole class is O(k) since the heap that is maintained by the class never contains more than k elements at any time.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Consider the classic dynamic programming of longest increasing subsequence:

Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.

For example, the length of LIS for [50, 3, 10, 7, 40, 80] is 4 and LIS is [3, 7, 40, 80].

What is the recurrence relation?


Recommended Readings


Got a question?ย Ask the Teaching Assistantย anything you don't understand.

Still not clear? Ask in the Forum, ย Discordย orย Submitย the part you don't understand to our editors.

โ†
โ†‘TA ๐Ÿ‘จโ€๐Ÿซ