992. Subarrays with K Different Integers
Problem Description
The problem requires us to find the total number of subarrays from a given integer array nums
such that each subarray has exactly k
distinct integers. A subarray is defined as a contiguous sequence of elements within an array. For instance, in the array [1,2,3,1,2]
, one subarray could be [1,2,3]
, which contains 3 distinct integers, and if k
equals 3, this would be considered a "good subarray."
Intuition
To solve this problem, we can use a two-pointer technique commonly used for sliding window problems. The core idea behind the solution is to create a sliding window that counts the occurrence of each number in the nums
array and track when the number of distinct integers within the window reaches k
. The function f(k)
helps manage this by marking for each index i
the furthest left position j
such that the subarray nums[j:i+1]
contains at most k
distinct integers.
The counter cnt
maintains the frequency of each number in the current window, and the array pos
is used to record the starting position for the window that satisfies our condition for each index i
.
The trick here is to call the helper function f(k)
twice: once for k
and once for k-1
to find the number of windows with exactly k
distinct integers and subtract the number of windows with exactly k-1
distinct integers.
The reason behind this is that by finding the difference between these two counts, we effectively calculate the number of new subarrays that have exactly k
distinct integers, as any window with k-1
distinct integers cannot contribute to the count of windows with exactly k
.
Finally, the summation sum(a - b for a, b in zip(f(k - 1), f(k)))
calculates the total number of "good subarrays," by taking the difference of starting positions of subarrays with k
and k-1
distinct elements for each index, which corresponds to the count of "good subarrays" that end at that index.
Learn more about Sliding Window patterns.
Solution Approach
The solution uses a helper function f(k)
to identify the last starting position of a subarray ending at each index i
that contains at most k
distinct integers. This function is called twice, once with k
and another with k-1
.
Hereโs the detailed algorithm:
-
Initialize
pos
, which is an array that will hold for each indexi
the furthest left positionj
such thatnums[j:i+1]
has at mostk
distinct elements. -
Create a
Counter
objectcnt
that will help us count the frequency of each element in the window[j, i]
. -
Iterate through
nums
usingi
as the index for the current end of the window. For eachi
, increment the count ofnums[i]
incnt
. -
Use another pointer
j
to keep track of the start of the window. If the count of distinct numbers incnt
exceedsk
, remove elements from the start (incrementj
) until we're back within the limit ofk
distinct integers. Decrement the count of the elementnums[j]
incnt
, if the count drops to zero, remove that element fromcnt
. -
Set the current
pos[i]
toj
, which at this point is the left-most index for which the window[j, i]
contains at mostk
distinct integers. -
The function
f(k)
returns thepos
array which we use to calculate the number of good subarrays.
For the final count, we subtract the positions found for k-1
from the positions found for k
. The zip
function is used to iterate over positions of both k
and k-1
simultaneously, and for each pair of positions, we subtract the position for k-1
from the position for k
to get the number of new subarrays that have exactly k
distinct integers.
- The comprehension
sum(a - b for a, b in zip(f(k - 1), f(k)))
iterates through pairs of positions(a, b)
frompos
arrays returned byf(k-1)
andf(k)
respectively and calculates the difference. This difference represents the number of subarrays that end at each index having exactlyk
distinct integers. Thesum
function adds up these differences to get the total count of good subarrays.
In summary, the algorithm effectively combines a sliding window technique with a difference array approach to efficiently calculate the number of good subarrays. The sliding window ensures we consider only valid subarrays, and the difference between f(k-1)
and f(k)
positions allows us to count exactly those subarrays with the required number of distinct integers.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's illustrate the solution approach with an example. Consider the array nums = [1, 2, 1, 3, 4]
and k = 3
. We're looking for the total number of subarrays with exactly k
distinct integers.
Following our algorithm:
-
We initialize
pos = [0, 0, 0, 0, 0]
which will store the furthest left starting positions for subarrays with at mostk
distinct integers. -
We create a frequency counter
cnt
which is initially empty. -
Then, we iterate over
nums
. Let's walk through a few iterations:- For
i = 0
(element 1),pos[0] = 0
because[1]
has 1 distinct integer. - For
i = 1
(element 2),pos[1] = 0
because[1, 2]
has 2 distinct integers. - For
i = 2
(element 1), since 1 is already incnt
,pos[2] = 0
because[1, 2, 1]
still has 2 distinct integers.
- For
-
When
i = 3
(element 3),cnt
shows we have 3 distinct integers so far (1, 2, 3
). We do not need to adjustj
, sopos[3] = 0
. -
When
i = 4
(element 4),cnt
would now have 4 distinct integers. Since our limit isk
, we incrementj
to ensure we don't exceedk
distinct integers.- Our count becomes
k+1
wheni = 4
. We then incrementj
to1
to discard the first element (1
), updating the counter. Now we have 3 distinct numbers again (2, 3, 4
), andpos[4] = 1
.
- Our count becomes
-
Finally, after calling function
f(k)
using ournums
array andk = 3
, we get apos
array telling us the starting positions that form valid windows, which are[0, 0, 0, 0, 1]
. -
For
k-1 = 2
, calling functionf(k-1)
, we would similarly get[0, 0, 0, 1, 3]
. -
We then zip these arrays and subtract the second from the first for each position:
[0-0, 0-0, 0-0, 0-1, 1-3] = [0, 0, 0, -1, -2]
. -
The sum of these differences is
0 + 0 + 0 - 1 - 2 = -3
. However, we need to take its absolute value because we're interested in the count, not the signed difference:abs(-3) = 3
. These differences represent how many new "good subarrays" we get at each index as we extend our window.
So, in the example nums = [1, 2, 1, 3, 4]
with k = 3
, there are 3 subarrays with exactly 3 distinct integers: [1, 2, 1, 3]
, [2, 1, 3, 4]
, and [1, 2, 3, 4]
. And this matches our final count of 3. This demonstrates the use of the sliding window to isolate the regions of interest and the subtraction of counts to find a precise number of qualifying subarrays.
Solution Implementation
1from collections import Counter
2from typing import List
3
4class Solution:
5 def subarraysWithKDistinct(self, nums: List[int], k: int) -> int:
6 # Helper function to calculate the number of subarrays with at most k distinct elements
7 def at_most_k_distinct(k):
8 # Initialize the position list to store the starting index of subarrays
9 start_positions = [0] * len(nums)
10 # Counter to keep track of the frequencies of elements in the current window
11 count = Counter()
12 left = 0 # Initialize the left pointer of the window
13 # Iterate through the 'nums' list
14 for right, value in enumerate(nums):
15 count[value] += 1 # Increment the frequency of the current number
16 # Shrink the window from the left if there are more than 'k' distinct elements
17 while len(count) > k:
18 count[nums[left]] -= 1
19 # Remove the leftmost element from the counter if its frequency drops to zero
20 if count[nums[left]] == 0:
21 del count[nums[left]]
22 left += 1 # Move the left pointer to the right
23 # Store the latest valid starting position where the window contains at most 'k' distinct elements
24 start_positions[right] = left
25 return start_positions
26
27 # Calculate number of subarrays with exactly 'k' distinct elements by subtracting
28 # the number of subarrays with at most 'k-1' distinct elements from those with at most 'k'
29 return sum(end_at_most_k - end_at_most_k_minus_one for end_at_most_k, end_at_most_k_minus_one in zip(at_most_k_distinct(k), at_most_k_distinct(k - 1)))
30
31# Example usage:
32# sol = Solution()
33# result = sol.subarraysWithKDistinct([1,2,1,2,3], 2)
34# print(result) # Output: 7
35
1class Solution {
2 public int subarraysWithKDistinct(int[] nums, int k) {
3 // Find the positions with exactly k distinct elements and k-1 distinct elements
4 int[] positionsWithKDistinct = findPositionsWithDistinctElements(nums, k);
5 int[] positionsWithKMinusOneDistinct = findPositionsWithDistinctElements(nums, k - 1);
6
7 // Initialize answer to hold the total number of subarrays with k distinct elements
8 int answer = 0;
9
10 // Calculate the difference between positions to get the count of subarrays with exactly k distinct elements.
11 for (int i = 0; i < nums.length; i++) {
12 answer += positionsWithKDistinct[i] - positionsWithKMinusOneDistinct[i];
13 }
14
15 return answer;
16 }
17
18 private int[] findPositionsWithDistinctElements(int[] nums, int k) {
19 int n = nums.length; // Total number of elements in the input array
20 int[] count = new int[n + 1]; // An array to keep track of counts of each distinct element
21 int[] positions = new int[n]; // An array to store positions
22
23 int distinctCount = 0; // A variable to keep track of current number of distinct elements
24
25 // Two pointers - 'start' and 'end' to keep track of current window
26 for (int end = 0, start = 0; end < n; end++) {
27 if (++count[nums[end]] == 1) { // If it's a new distinct element
28 distinctCount++;
29 }
30 // Shrink the window from the left if we have more than 'k' distinct elements
31 while (distinctCount > k) {
32 if (--count[nums[start]] == 0) { // If we've removed one distinct element
33 distinctCount--;
34 }
35 start++;
36 }
37 positions[end] = start; // Update the start position for current window size
38 }
39 return positions; // Return the positions array that helps in calculating the result
40 }
41}
42
1#include <vector>
2#include <cstring>
3
4class Solution {
5public:
6 // Function to calculate the number of subarrays with exactly k distinct elements
7 int subarraysWithKDistinct(vector<int>& nums, int k) {
8 vector<int> subarrayStartsWithK = countSubarraysStartingPoint(nums, k);
9 vector<int> subarrayStartsWithKMinusOne = countSubarraysStartingPoint(nums, k - 1);
10 int totalSubarrays = 0;
11
12 // Calculate the difference between the number of subarrays starting with k and k-1 distinct numbers
13 for (int i = 0; i < nums.size(); ++i) {
14 totalSubarrays += subarrayStartsWithKMinusOne[i] - subarrayStartsWithK[i];
15 }
16
17 return totalSubarrays;
18 }
19
20 // Helper function to find the earliest starting point of subarrays with at most k distinct elements for each ending point
21 vector<int> countSubarraysStartingPoint(vector<int>& nums, int k) {
22 int n = nums.size(); // Size of the input array
23 vector<int> startPos(n); // Vector to store the starting positions
24 int count[n + 1]; // Array to store the count of each number
25 memset(count, 0, sizeof(count)); // Initialize the count array with zeros
26 int distinctNums = 0; // Number of distinct elements
27
28 // Two pointers technique: 'i' is the end pointer, 'j' is the start pointer
29 for (int i = 0, j = 0; i < n; ++i) {
30 // If we encounter a new element (count is 0), increase the number of distinct elements
31 if (++count[nums[i]] == 1) {
32 ++distinctNums;
33 }
34
35 // If distinct elements exceed k, move the start pointer to reduce the number of distinct elements
36 for (; distinctNums > k; ++j) {
37 // If after decrement count goes to zero, then one distinct element is removed
38 if (--count[nums[j]] == 0) {
39 --distinctNums;
40 }
41 }
42
43 // Record the starting position for the subarray ending at 'i' which has at most k distinct elements
44 startPos[i] = j;
45 }
46
47 return startPos;
48 }
49};
50
1// Define the function to calculate the number of subarrays with exactly k distinct elements
2function subarraysWithKDistinct(nums: number[], k: number): number {
3 let subarrayStartsWithK = countSubarraysStartingPoint(nums, k);
4 let subarrayStartsWithKMinusOne = countSubarraysStartingPoint(nums, k - 1);
5 let totalSubarrays = 0;
6
7 // Calculate the difference between the number of subarrays starting with k and k-1 distinct numbers
8 for (let i = 0; i < nums.length; ++i) {
9 totalSubarrays += subarrayStartsWithKMinusOne[i] - subarrayStartsWithK[i];
10 }
11
12 return totalSubarrays;
13}
14
15// Helper function to find the earliest starting point of subarrays with at most k distinct elements for each ending point
16function countSubarraysStartingPoint(nums: number[], k: number): number[] {
17 let n = nums.length; // Size of the input array
18 let startPos = Array(n); // Array to store the starting positions for subarrays
19 let count: number[] = Array(n + 1).fill(0); // Initialize an array to store the count of each number, filled with zeros
20 let distinctNums = 0; // Number of distinct elements
21
22 // Two pointers technique: 'end' is the end pointer, 'start' is the start pointer
23 for (let end = 0, start = 0; end < n; ++end) {
24 // If a new element is detected (count is 0), increase the number of distinct elements
25 if (++count[nums[end]] == 1) {
26 ++distinctNums;
27 }
28
29 // If distinct elements exceed k, move the start pointer to reduce the number of distinct elements
30 for (; distinctNums > k; ++start) {
31 // If after decrementing count goes to zero, then one distinct element is removed
32 if (--count[nums[start]] == 0) {
33 --distinctNums;
34 }
35 }
36
37 // Record the starting position for the subarray ending at 'end' which has at most k distinct elements
38 startPos[end] = start;
39 }
40
41 return startPos;
42}
43
Time and Space Complexity
Time Complexity
The given Python function computes the number of subarrays with exactly k
distinct elements by calculating the position of pointers for k
and k-1
distinct elements. The inner function f(k)
goes through all elements of nums
and uses a counter cnt
to keep track of the number of distinct elements. Here's the breakdown of time complexity:
-
The
f(k)
function loops over every element innums
exactly once with two nested loops. However, the inner while-loop does not start over for each outer iteration but continues from the last position, effectively visiting each element of the array only once. Thus, the time complexity for thef(k)
function isO(n)
, wheren
is the length ofnums
. -
Since
f(k)
is called twice, once withk
and once withk-1
, the total time for these calls is2 * O(n) = O(n)
. The contribution of these calls to the time complexity does not change the orderO(n)
. -
The final part of the code calculates the sum of differences between the positions for
k
andk-1
distinct elements, which takesO(n)
time as well.
Combining all parts, the final time complexity of the subarraysWithKDistinct
function is O(n)
since all operations are linear with respect to the length of nums
.
Space Complexity
The space complexity of the function depends on the size of the data structures used:
-
The counter
cnt
can hold at mostmin(k, n)
different integers, wheren
is the length ofnums
andk
is the number of distinct elements we are looking for. So, it usesO(min(k, n))
space. -
The array
pos
is of sizen
, resulting inO(n)
space. -
Auxiliary space for indices and temporary variables is
O(1)
.
So the total space complexity is the maximum space used by any of these components, which is O(n) + O(min(k, n))
. Since k
is the constraint on distinct numbers and can be at most n
, the dominant term is O(n)
. Hence, the space complexity of the entire function is O(n)
.
Learn more about how to find time and space complexity quickly using problem constraints.
Which algorithm is best for finding the shortest distance between two points in an unweighted graph?
Recommended Readings
https algomonster s3 us east 2 amazonaws com cover_photos stack svg Sliding Window Maximum Monotonic Stack We have an array and a sliding window defined by a start index and an end index The sliding window moves from left of the array to right There are always k elements in
Patterns The Shortest Path Algorithm for Coding Interviews The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we can determine the
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Got a question?ย Ask the Monster Assistantย anything you don't understand.
Still not clear? ย Submitย the part you don't understand to our editors. Or join ourย Discord and ask the community.