Facebook Pixel

703. Kth Largest Element in a Stream

Problem Description

You need to design a class that tracks the kth highest test score from a stream of scores in real-time. This is useful for a university admissions office to dynamically determine cut-off marks as new applicants submit their scores.

The task is to implement a KthLargest class with the following functionality:

  1. Constructor KthLargest(int k, int[] nums):

    • Initializes the object with an integer k (the position of the highest score you want to track)
    • Takes an initial array of test scores nums
  2. Method int add(int val):

    • Adds a new test score val to the existing collection of scores
    • Returns the kth largest element after adding this new score

Important Note: The kth largest element means the kth highest score when all scores are sorted in descending order. For example, if scores are [100, 90, 85, 80, 75] and k=3, the 3rd largest element is 85.

The solution uses a min heap (priority queue) of size k to efficiently maintain the k largest elements. The root of this min heap will always be the kth largest element:

  • During initialization, elements from nums are added to the heap
  • When adding a new score, it's pushed to the heap
  • If the heap size exceeds k, the smallest element (root of min heap) is removed
  • The root of the heap (accessible via self.min_q[0]) is always the kth largest element

This approach ensures that we only keep track of the k largest elements at any time, making it memory efficient and providing O(log k) time complexity for each add operation.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that we don't need to keep track of all scores - we only care about the k largest ones. If we have more than k scores, any score smaller than the kth largest will never be relevant to our answer.

Think of it like maintaining a "top k leaderboard". If k=3 and we want to track the top 3 scores, once we have scores like [100, 90, 85], any new score below 85 won't affect who's in 3rd place unless it's higher than 85.

A min heap is perfect for this because:

  1. We can limit its size to exactly k elements
  2. The smallest element in our "top k" collection (which is the kth largest overall) sits at the root of the min heap
  3. When a new score comes in, we add it to the heap and if we now have more than k elements, we remove the smallest one

Why specifically a min heap and not a max heap? Because we want quick access to the kth largest element (the "cutoff" score). In a min heap of size k containing the k largest elements:

  • The root is the minimum among these k elements
  • This minimum is exactly the kth largest element overall
  • We can access it in O(1) time with min_q[0]

For example, if k=3 and we have scores [100, 95, 90, 85, 80], our min heap would contain [90, 95, 100]. The root (90) is the 3rd largest score. When we add a new score like 92, the heap becomes [90, 92, 95, 100], we pop 90, and now [92, 95, 100] gives us the new 3rd largest score of 92.

This approach elegantly handles the dynamic nature of the problem - as new scores arrive, we automatically maintain exactly what we need without sorting the entire list each time.

Learn more about Tree, Binary Search Tree, Binary Tree, Data Stream and Heap (Priority Queue) patterns.

Solution Approach

The solution uses a min heap (priority queue) to efficiently maintain the k largest elements from the stream of scores.

Data Structure Choice

We use Python's heapq module which implements a min heap. The key property we leverage is that in a min heap of size k containing the k largest elements, the root element is always the kth largest overall.

Implementation Details

1. Initialization (__init__ method):

  • Store the value of k as an instance variable
  • Initialize an empty min heap self.min_q = []
  • Process the initial array nums by calling self.add(x) for each element
  • This ensures the heap is properly initialized with at most k elements

2. Adding Elements (add method): The add method performs three key operations:

heappush(self.min_q, val)  # Step 1: Add new element
if len(self.min_q) > self.k:  # Step 2: Check size
    heappop(self.min_q)  # Step 3: Remove smallest if needed
return self.min_q[0]  # Step 4: Return kth largest
  • Step 1: Push the new value onto the heap using heappush. This maintains the heap property in O(log n) time
  • Step 2: Check if the heap size exceeds k
  • Step 3: If size > k, remove the smallest element (root) using heappop. This ensures we keep only the k largest elements
  • Step 4: Return the root element self.min_q[0], which is the kth largest element

Time Complexity Analysis

  • Initialization: O(n × log k) where n is the length of the initial nums array. Each element is added using add, which takes O(log k) time
  • Add operation: O(log k) for both heappush and potentially heappop operations
  • Getting kth largest: O(1) as we simply access the root element

Space Complexity

O(k) - we maintain at most k elements in the heap at any time, regardless of how many total elements are processed.

Example Walkthrough

If k=3 and initial nums=[4, 5, 8, 2]:

  1. After initialization, heap contains [4, 5, 8] (the 3 largest)
  2. Call add(3): heap becomes [3, 4, 5, 8], pop smallest → [4, 5, 8], return 4
  3. Call add(10): heap becomes [4, 5, 8, 10], pop smallest → [5, 8, 10], return 5

The beauty of this approach is that it automatically maintains exactly what we need - the k largest elements - with optimal time and space complexity.

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example with k=3 (tracking the 3rd highest score).

Initial Setup:

  • k = 3
  • nums = [4, 5, 8, 2]

Step 1: Initialization We process each element from nums:

  • Add 4: heap = [4], size = 1 ≤ k, no removal needed
  • Add 5: heap = [4, 5], size = 2 ≤ k, no removal needed
  • Add 8: heap = [4, 5, 8], size = 3 = k, no removal needed
  • Add 2: heap = [2, 4, 5, 8], size = 4 > k, remove smallest (2) → heap = [4, 5, 8]

After initialization, our min heap contains [4, 5, 8] where 4 is at the root.

Step 2: Add new score 3

  • Push 3 to heap: [3, 4, 5, 8]
  • Size = 4 > k = 3, so pop smallest: remove 3 → heap = [4, 5, 8]
  • Return heap[0] = 4 (the 3rd largest among all scores: 8, 5, 4, 3, 2)

Step 3: Add new score 10

  • Push 10 to heap: [4, 5, 8, 10]
  • Size = 4 > k = 3, so pop smallest: remove 4 → heap = [5, 8, 10]
  • Return heap[0] = 5 (the 3rd largest among all scores: 10, 8, 5, 4, 3, 2)

Step 4: Add new score 9

  • Push 9 to heap: [5, 8, 9, 10]
  • Size = 4 > k = 3, so pop smallest: remove 5 → heap = [8, 9, 10]
  • Return heap[0] = 8 (the 3rd largest among all scores: 10, 9, 8, 5, 4, 3, 2)

Key Observation: The min heap always maintains exactly the k largest elements, and its root (smallest element in the heap) is precisely the kth largest element overall. This gives us O(1) access to the answer we need.

Solution Implementation

1import heapq
2from typing import List
3
4
5class KthLargest:
6    """
7    A class to find the kth largest element in a stream.
8    Uses a min-heap of size k to efficiently track the k largest elements.
9    """
10
11    def __init__(self, k: int, nums: List[int]):
12        """
13        Initialize the KthLargest object.
14      
15        Args:
16            k: The position of the largest element to track (1-indexed)
17            nums: Initial list of numbers in the stream
18        """
19        self.k = k
20        self.min_heap = []  # Min-heap to store the k largest elements
21      
22        # Add all initial numbers to the heap
23        for num in nums:
24            self.add(num)
25
26    def add(self, val: int) -> int:
27        """
28        Add a new value to the stream and return the kth largest element.
29      
30        The min-heap maintains exactly k elements (the k largest seen so far).
31        The root of the min-heap is always the kth largest element.
32      
33        Args:
34            val: The new value to add to the stream
35          
36        Returns:
37            The kth largest element after adding the new value
38        """
39        # Add the new value to the heap
40        heapq.heappush(self.min_heap, val)
41      
42        # If heap size exceeds k, remove the smallest element
43        # This ensures we only keep the k largest elements
44        if len(self.min_heap) > self.k:
45            heapq.heappop(self.min_heap)
46      
47        # The root of the min-heap is the kth largest element
48        return self.min_heap[0]
49
50
51# Your KthLargest object will be instantiated and called as such:
52# obj = KthLargest(k, nums)
53# param_1 = obj.add(val)
54
1class KthLargest {
2    // Min heap to maintain the k largest elements
3    // The root (smallest in heap) will be the kth largest overall
4    private PriorityQueue<Integer> minHeap;
5    private int k;
6
7    /**
8     * Constructor to initialize the KthLargest object
9     * @param k The position of the largest element to track (1-indexed)
10     * @param nums Initial array of numbers to process
11     */
12    public KthLargest(int k, int[] nums) {
13        this.k = k;
14        // Initialize min heap with initial capacity of k for efficiency
15        this.minHeap = new PriorityQueue<>(k);
16      
17        // Add all initial numbers to the heap
18        for (int num : nums) {
19            add(num);
20        }
21    }
22
23    /**
24     * Adds a value to the stream and returns the kth largest element
25     * @param val The value to add to the stream
26     * @return The kth largest element after adding the new value
27     */
28    public int add(int val) {
29        // Add the new value to the min heap
30        minHeap.offer(val);
31      
32        // If heap size exceeds k, remove the smallest element
33        // This ensures we only keep the k largest elements
34        if (minHeap.size() > k) {
35            minHeap.poll();
36        }
37      
38        // The root of the min heap is the kth largest element
39        return minHeap.peek();
40    }
41}
42
43/**
44 * Your KthLargest object will be instantiated and called as such:
45 * KthLargest obj = new KthLargest(k, nums);
46 * int param_1 = obj.add(val);
47 */
48
1class KthLargest {
2public:
3    /**
4     * Constructor to initialize the KthLargest finder
5     * @param k: The position of the largest element to track (1-indexed)
6     * @param nums: Initial array of numbers to process
7     */
8    KthLargest(int k, vector<int>& nums) {
9        this->k_ = k;
10      
11        // Add all initial numbers to the heap
12        for (int num : nums) {
13            add(num);
14        }
15    }
16  
17    /**
18     * Add a new value to the stream and return the kth largest element
19     * @param val: The new value to add to the stream
20     * @return: The kth largest element after adding the new value
21     */
22    int add(int val) {
23        // Add the new value to the min heap
24        min_heap_.push(val);
25      
26        // Maintain heap size at most k elements
27        // The smallest element in a heap of size k is the kth largest overall
28        if (min_heap_.size() > static_cast<size_t>(k_)) {
29            min_heap_.pop();  // Remove the smallest element
30        }
31      
32        // The top of the min heap is the kth largest element
33        return min_heap_.top();
34    }
35  
36private:
37    int k_;  // The k value for finding kth largest
38  
39    // Min heap to maintain the k largest elements
40    // Using greater<int> comparator creates a min heap
41    priority_queue<int, vector<int>, greater<int>> min_heap_;
42};
43
44/**
45 * Your KthLargest object will be instantiated and called as such:
46 * KthLargest* obj = new KthLargest(k, nums);
47 * int param_1 = obj->add(val);
48 */
49
1// Global variables to maintain the state of KthLargest functionality
2let k: number = 0;
3let minHeap = new MinPriorityQueue();
4
5/**
6 * Initializes the data structure with k and an initial array of numbers
7 * Maintains only the k largest elements in a min heap
8 * @param targetK - The kth position to track
9 * @param nums - Initial array of numbers
10 */
11function initializeKthLargest(targetK: number, nums: number[]): void {
12    // Set the target k value
13    k = targetK;
14  
15    // Clear any existing elements in the heap
16    minHeap = new MinPriorityQueue();
17  
18    // Add all initial numbers to the heap
19    for (const num of nums) {
20        addNumber(num);
21    }
22}
23
24/**
25 * Adds a new value to the data structure and returns the kth largest element
26 * Uses a min heap of size k to efficiently track the k largest elements
27 * @param val - The value to add
28 * @returns The kth largest element after adding the new value
29 */
30function addNumber(val: number): number {
31    // Add the new value to the min heap
32    minHeap.enqueue(val);
33  
34    // If heap size exceeds k, remove the smallest element
35    // This ensures we only keep the k largest elements
36    if (minHeap.size() > k) {
37        minHeap.dequeue();
38    }
39  
40    // The root of the min heap is the kth largest element
41    return minHeap.front().element;
42}
43

Time and Space Complexity

Time Complexity:

  • Initialization (__init__): O(n * log k) where n is the length of the initial nums array. Each element in nums calls the add method, which performs a heap push operation O(log k) and potentially a heap pop operation O(log k).
  • Add operation (add): O(log k) for each call. The heappush operation takes O(log k) time since the heap size is at most k+1, and the heappop operation (when needed) also takes O(log k) time.

Space Complexity: O(k)

  • The min heap self.min_q stores at most k elements at any time. After each add operation, if the heap size exceeds k, the smallest element is removed, maintaining the heap size at exactly k elements (assuming at least k elements have been added).
  • The additional space used for the integer k and other variables is O(1), which doesn't affect the overall space complexity.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Not Handling Edge Case When Initial Array Has Fewer Than k Elements

The Pitfall: The most common mistake is assuming the heap will always have exactly k elements. When the initial nums array has fewer than k elements, or even when it's empty, accessing self.min_heap[0] in the add method will raise an IndexError.

Example of the Problem:

# This will crash!
kthLargest = KthLargest(3, [4, 5])  # Only 2 elements, but k=3
result = kthLargest.add(3)  # IndexError when accessing self.min_heap[0]

The Solution: Check if the heap has at least k elements before returning the root:

def add(self, val: int) -> int:
    heapq.heappush(self.min_heap, val)
  
    if len(self.min_heap) > self.k:
        heapq.heappop(self.min_heap)
  
    # Only return if we have at least k elements
    if len(self.min_heap) == self.k:
        return self.min_heap[0]
    else:
        # Could return None or raise an exception
        # Based on problem requirements
        return None  # or return -1 as a sentinel value

2. Using a Max Heap Instead of Min Heap

The Pitfall: Developers might intuitively think "I need the k largest, so I should use a max heap." This leads to incorrect logic where they try to negate values or use complex workarounds.

Example of the Problem:

# Wrong approach - negating values for max heap
def add(self, val: int) -> int:
    heapq.heappush(self.min_heap, -val)  # Negating for max heap
    if len(self.min_heap) > self.k:
        heapq.heappop(self.min_heap)
    return -self.min_heap[0]  # Need to negate back

Why This is Wrong: While this might seem to work, it complicates the logic unnecessarily and can lead to bugs with edge cases (like integer overflow with very large negative numbers).

The Correct Understanding: A min heap of size k containing the k largest elements naturally has the kth largest at its root. This is because all other elements in the heap are larger than or equal to the root.

3. Forgetting to Handle Duplicate Values

The Pitfall: Some implementations try to avoid duplicates or handle them specially, which is unnecessary and incorrect for this problem.

Example of Incorrect Thinking:

def add(self, val: int) -> int:
    # Wrong: trying to avoid duplicates
    if val not in self.min_heap:
        heapq.heappush(self.min_heap, val)
    # This changes the problem semantics!

The Solution: Simply treat duplicates as separate values. If you have scores [90, 90, 85, 80] and k=2, the 2nd largest is 90 (not 85).

4. Inefficient Initialization

The Pitfall: Some developers try to optimize initialization by sorting first or by using heapify:

def __init__(self, k: int, nums: List[int]):
    self.k = k
    # Inefficient: sorting entire array
    nums.sort(reverse=True)
    self.min_heap = nums[:k]
    heapq.heapify(self.min_heap)

Why This is Problematic:

  • Sorting takes O(n log n) time, which is worse than O(n log k)
  • Taking the first k elements after sorting in descending order gives you a max heap structure, not a min heap
  • This breaks the invariant that the smallest of the k largest should be at the root

The Correct Approach: Use the add method for each element during initialization, which properly maintains the heap invariant and handles all edge cases consistently.

Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

What are the two properties the problem needs to have for dynamic programming to be applicable? (Select 2)


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More