992. Subarrays with K Different Integers

HardArrayHash TableCountingSliding Window

Problem Description

The problem requires us to find the total number of subarrays from a given integer array nums such that each subarray has exactly k distinct integers. A subarray is defined as a contiguous sequence of elements within an array. For instance, in the array [1,2,3,1,2], one subarray could be [1,2,3], which contains 3 distinct integers, and if k equals 3, this would be considered a "good subarray."

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

To solve this problem, we can use a two-pointer technique commonly used for sliding window problems. The core idea behind the solution is to create a sliding window that counts the occurrence of each number in the nums array and track when the number of distinct integers within the window reaches k. The function f(k) helps manage this by marking for each index i the furthest left position j such that the subarray nums[j:i+1] contains at most k distinct integers.

The counter cnt maintains the frequency of each number in the current window, and the array pos is used to record the starting position for the window that satisfies our condition for each index i.

The trick here is to call the helper function f(k) twice: once for k and once for k-1 to find the number of windows with exactly k distinct integers and subtract the number of windows with exactly k-1 distinct integers.

The reason behind this is that by finding the difference between these two counts, we effectively calculate the number of new subarrays that have exactly k distinct integers, as any window with k-1 distinct integers cannot contribute to the count of windows with exactly k.

Finally, the summation sum(a - b for a, b in zip(f(k - 1), f(k))) calculates the total number of "good subarrays," by taking the difference of starting positions of subarrays with k and k-1 distinct elements for each index, which corresponds to the count of "good subarrays" that end at that index.

Learn more about Sliding Window patterns.

Solution Approach

The solution uses a helper function f(k) to identify the last starting position of a subarray ending at each index i that contains at most k distinct integers. This function is called twice, once with k and another with k-1.

Here’s the detailed algorithm:

Initialize pos, which is an array that will hold for each index i the furthest left position j such that nums[j:i+1] has at most k distinct elements.
Create a Counter object cnt that will help us count the frequency of each element in the window [j, i].
Iterate through nums using i as the index for the current end of the window. For each i, increment the count of nums[i] in cnt.
Use another pointer j to keep track of the start of the window. If the count of distinct numbers in cnt exceeds k, remove elements from the start (increment j) until we're back within the limit of k distinct integers. Decrement the count of the element nums[j] in cnt, if the count drops to zero, remove that element from cnt.
Set the current pos[i] to j, which at this point is the left-most index for which the window [j, i] contains at most k distinct integers.
The function f(k) returns the pos array which we use to calculate the number of good subarrays.

For the final count, we subtract the positions found for k-1 from the positions found for k. The zip function is used to iterate over positions of both k and k-1 simultaneously, and for each pair of positions, we subtract the position for k-1 from the position for k to get the number of new subarrays that have exactly k distinct integers.

The comprehension sum(a - b for a, b in zip(f(k - 1), f(k))) iterates through pairs of positions (a, b) from pos arrays returned by f(k-1) and f(k) respectively and calculates the difference. This difference represents the number of subarrays that end at each index having exactly k distinct integers. The sum function adds up these differences to get the total count of good subarrays.

In summary, the algorithm effectively combines a sliding window technique with a difference array approach to efficiently calculate the number of good subarrays. The sliding window ensures we consider only valid subarrays, and the difference between f(k-1) and f(k) positions allows us to count exactly those subarrays with the required number of distinct integers.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's illustrate the solution approach with an example. Consider the array nums = [1, 2, 1, 3, 4] and k = 3. We're looking for the total number of subarrays with exactly k distinct integers.

Following our algorithm:

We initialize pos = [0, 0, 0, 0, 0] which will store the furthest left starting positions for subarrays with at most k distinct integers.
We create a frequency counter cnt which is initially empty.
Then, we iterate over nums. Let's walk through a few iterations:
- For i = 0 (element 1), pos[0] = 0 because [1] has 1 distinct integer.
- For i = 1 (element 2), pos[1] = 0 because [1, 2] has 2 distinct integers.
- For i = 2 (element 1), since 1 is already in cnt, pos[2] = 0 because [1, 2, 1] still has 2 distinct integers.
When i = 3 (element 3), cnt shows we have 3 distinct integers so far (1, 2, 3). We do not need to adjust j, so pos[3] = 0.
When i = 4 (element 4), cnt would now have 4 distinct integers. Since our limit is k, we increment j to ensure we don't exceed k distinct integers.
- Our count becomes k+1 when i = 4. We then increment j to 1 to discard the first element (1), updating the counter. Now we have 3 distinct numbers again (2, 3, 4), and pos[4] = 1.
Finally, after calling function f(k) using our nums array and k = 3, we get a pos array telling us the starting positions that form valid windows, which are [0, 0, 0, 0, 1].
For k-1 = 2, calling function f(k-1), we would similarly get [0, 0, 0, 1, 3].
We then zip these arrays and subtract the second from the first for each position: [0-0, 0-0, 0-0, 0-1, 1-3] = [0, 0, 0, -1, -2].
The sum of these differences is 0 + 0 + 0 - 1 - 2 = -3. However, we need to take its absolute value because we're interested in the count, not the signed difference: abs(-3) = 3. These differences represent how many new "good subarrays" we get at each index as we extend our window.

So, in the example nums = [1, 2, 1, 3, 4] with k = 3, there are 3 subarrays with exactly 3 distinct integers: [1, 2, 1, 3], [2, 1, 3, 4], and [1, 2, 3, 4]. And this matches our final count of 3. This demonstrates the use of the sliding window to isolate the regions of interest and the subtraction of counts to find a precise number of qualifying subarrays.

Solution Implementation

1from collections import Counter
2from typing import List
3
4class Solution:
5    def subarraysWithKDistinct(self, nums: List[int], k: int) -> int:
6        # Helper function to calculate the number of subarrays with at most k distinct elements
7        def at_most_k_distinct(k):
8            # Initialize the position list to store the starting index of subarrays
9            start_positions = [0] * len(nums)
10            # Counter to keep track of the frequencies of elements in the current window
11            count = Counter()
12            left = 0  # Initialize the left pointer of the window
13            # Iterate through the 'nums' list
14            for right, value in enumerate(nums):
15                count[value] += 1  # Increment the frequency of the current number
16                # Shrink the window from the left if there are more than 'k' distinct elements
17                while len(count) > k:
18                    count[nums[left]] -= 1
19                    # Remove the leftmost element from the counter if its frequency drops to zero
20                    if count[nums[left]] == 0:
21                        del count[nums[left]]
22                    left += 1  # Move the left pointer to the right
23                # Store the latest valid starting position where the window contains at most 'k' distinct elements
24                start_positions[right] = left
25            return start_positions
26
27        # Calculate number of subarrays with exactly 'k' distinct elements by subtracting
28        # the number of subarrays with at most 'k-1' distinct elements from those with at most 'k'
29        return sum(end_at_most_k - end_at_most_k_minus_one for end_at_most_k, end_at_most_k_minus_one in zip(at_most_k_distinct(k), at_most_k_distinct(k - 1)))
30
31# Example usage:
32# sol = Solution()
33# result = sol.subarraysWithKDistinct([1,2,1,2,3], 2)
34# print(result)  # Output: 7
35

1class Solution {
2    public int subarraysWithKDistinct(int[] nums, int k) {
3        // Find the positions with exactly k distinct elements and k-1 distinct elements
4        int[] positionsWithKDistinct = findPositionsWithDistinctElements(nums, k);
5        int[] positionsWithKMinusOneDistinct = findPositionsWithDistinctElements(nums, k - 1);
6      
7        // Initialize answer to hold the total number of subarrays with k distinct elements
8        int answer = 0;
9      
10        // Calculate the difference between positions to get the count of subarrays with exactly k distinct elements.
11        for (int i = 0; i < nums.length; i++) {
12            answer += positionsWithKDistinct[i] - positionsWithKMinusOneDistinct[i];
13        }
14      
15        return answer;
16    }
17
18    private int[] findPositionsWithDistinctElements(int[] nums, int k) {
19        int n = nums.length; // Total number of elements in the input array
20        int[] count = new int[n + 1]; // An array to keep track of counts of each distinct element
21        int[] positions = new int[n]; // An array to store positions
22      
23        int distinctCount = 0; // A variable to keep track of current number of distinct elements
24      
25        // Two pointers - 'start' and 'end' to keep track of current window
26        for (int end = 0, start = 0; end < n; end++) {
27            if (++count[nums[end]] == 1) { // If it's a new distinct element
28                distinctCount++;
29            }
30            // Shrink the window from the left if we have more than 'k' distinct elements
31            while (distinctCount > k) {
32                if (--count[nums[start]] == 0) { // If we've removed one distinct element
33                    distinctCount--;
34                }
35                start++;
36            }
37            positions[end] = start; // Update the start position for current window size
38        }
39        return positions; // Return the positions array that helps in calculating the result
40    }
41}
42

1#include <vector>
2#include <cstring>
3
4class Solution {
5public:
6    // Function to calculate the number of subarrays with exactly k distinct elements
7    int subarraysWithKDistinct(vector<int>& nums, int k) {
8        vector<int> subarrayStartsWithK = countSubarraysStartingPoint(nums, k);
9        vector<int> subarrayStartsWithKMinusOne = countSubarraysStartingPoint(nums, k - 1);
10        int totalSubarrays = 0;
11
12        // Calculate the difference between the number of subarrays starting with k and k-1 distinct numbers
13        for (int i = 0; i < nums.size(); ++i) {
14            totalSubarrays += subarrayStartsWithKMinusOne[i] - subarrayStartsWithK[i];
15        }
16
17        return totalSubarrays;
18    }
19
20    // Helper function to find the earliest starting point of subarrays with at most k distinct elements for each ending point
21    vector<int> countSubarraysStartingPoint(vector<int>& nums, int k) {
22        int n = nums.size(); // Size of the input array
23        vector<int> startPos(n); // Vector to store the starting positions
24        int count[n + 1]; // Array to store the count of each number
25        memset(count, 0, sizeof(count)); // Initialize the count array with zeros
26        int distinctNums = 0; // Number of distinct elements
27      
28        // Two pointers technique: 'i' is the end pointer, 'j' is the start pointer
29        for (int i = 0, j = 0; i < n; ++i) {
30            // If we encounter a new element (count is 0), increase the number of distinct elements
31            if (++count[nums[i]] == 1) {
32                ++distinctNums;
33            }
34          
35            // If distinct elements exceed k, move the start pointer to reduce the number of distinct elements
36            for (; distinctNums > k; ++j) {
37                // If after decrement count goes to zero, then one distinct element is removed
38                if (--count[nums[j]] == 0) {
39                    --distinctNums;
40                }
41            }
42          
43            // Record the starting position for the subarray ending at 'i' which has at most k distinct elements
44            startPos[i] = j;
45        }
46
47        return startPos;
48    }
49};
50

1// Define the function to calculate the number of subarrays with exactly k distinct elements
2function subarraysWithKDistinct(nums: number[], k: number): number {
3    let subarrayStartsWithK = countSubarraysStartingPoint(nums, k);
4    let subarrayStartsWithKMinusOne = countSubarraysStartingPoint(nums, k - 1);
5    let totalSubarrays = 0;
6
7    // Calculate the difference between the number of subarrays starting with k and k-1 distinct numbers
8    for (let i = 0; i < nums.length; ++i) {
9        totalSubarrays += subarrayStartsWithKMinusOne[i] - subarrayStartsWithK[i];
10    }
11
12    return totalSubarrays;
13}
14
15// Helper function to find the earliest starting point of subarrays with at most k distinct elements for each ending point
16function countSubarraysStartingPoint(nums: number[], k: number): number[] {
17    let n = nums.length; // Size of the input array
18    let startPos = Array(n); // Array to store the starting positions for subarrays
19    let count: number[] = Array(n + 1).fill(0); // Initialize an array to store the count of each number, filled with zeros
20    let distinctNums = 0; // Number of distinct elements
21
22    // Two pointers technique: 'end' is the end pointer, 'start' is the start pointer
23    for (let end = 0, start = 0; end < n; ++end) {
24        // If a new element is detected (count is 0), increase the number of distinct elements
25        if (++count[nums[end]] == 1) {
26            ++distinctNums;
27        }
28      
29        // If distinct elements exceed k, move the start pointer to reduce the number of distinct elements
30        for (; distinctNums > k; ++start) {
31            // If after decrementing count goes to zero, then one distinct element is removed
32            if (--count[nums[start]] == 0) {
33                --distinctNums;
34            }
35        }
36      
37        // Record the starting position for the subarray ending at 'end' which has at most k distinct elements
38        startPos[end] = start;
39    }
40
41    return startPos;
42}
43

Time and Space Complexity

Time Complexity

The given Python function computes the number of subarrays with exactly k distinct elements by calculating the position of pointers for k and k-1 distinct elements. The inner function f(k) goes through all elements of nums and uses a counter cnt to keep track of the number of distinct elements. Here's the breakdown of time complexity:

The f(k) function loops over every element in nums exactly once with two nested loops. However, the inner while-loop does not start over for each outer iteration but continues from the last position, effectively visiting each element of the array only once. Thus, the time complexity for the f(k) function is O(n), where n is the length of nums.
Since f(k) is called twice, once with k and once with k-1, the total time for these calls is 2 * O(n) = O(n). The contribution of these calls to the time complexity does not change the order O(n).
The final part of the code calculates the sum of differences between the positions for k and k-1 distinct elements, which takes O(n) time as well.

Combining all parts, the final time complexity of the subarraysWithKDistinct function is O(n) since all operations are linear with respect to the length of nums.

Space Complexity

The space complexity of the function depends on the size of the data structures used:

The counter cnt can hold at most min(k, n) different integers, where n is the length of nums and k is the number of distinct elements we are looking for. So, it uses O(min(k, n)) space.
The array pos is of size n, resulting in O(n) space.
Auxiliary space for indices and temporary variables is O(1).

So the total space complexity is the maximum space used by any of these components, which is O(n) + O(min(k, n)). Since k is the constraint on distinct numbers and can be at most n, the dominant term is O(n). Hence, the space complexity of the entire function is O(n).

Learn more about how to find time and space complexity quickly using problem constraints.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which two pointer techniques do you use to check if a string is a palindrome?

Fast-slow pointers

Two pointers moving in opposite direction

Sliding window

Prefix sum

I don't know