1093. Statistics from a Large Sample
Problem Description
You are provided with an array called count
representing a large sample of integers where each element's index (k
) corresponds to an integer, and the value at each index (count[k]
) corresponds to how many times k
occurs in the sample. The range of integers is from 0 to 255. Your task is to calculate various statistics about the sample: the minimum value (minimum
), maximum value (maximum
), mean, median, and mode of the sample. The mean is the sum of all the integer occurrences multiplied by their respective values divided by the total number of elements. The median is the middle value once the sample is sorted, or if the sample size is even, then it's the average of the two middle values. The mode is the value that appears most frequently in the sample, and it's given that there will be a single, unique mode. The result should be returned as an array of floating-point numbers containing the statistics in the order they were mentioned.
Intuition
The solution first involves finding the minimum and maximum values present in the sample. Since the count
array represents how many times each integer in the range [0, 255] occurs, we iterate through this array and look for the first and last indexes that have a non-zero count which represents the minimum and maximum, respectively.
Next, we calculate the mean by summing up each integer multiplied by its occurrences (k * count[k]
) and then dividing by the total number of elements in the sample (cnt
).
Calculating the median is a bit trickier because we need to consider whether the total number of elements is odd or even. We use a helper function find(i)
to find the i-th smallest value in the sample according to its sorted position. If the total number of elements is odd, we want the element that is in the middle position. If it is even, we average the two middle elements, which involves finding the cnt // 2
-th and the cnt // 2 + 1
-th elements if you sort the sample.
Lastly, the mode is the integer value that has the highest count in the count
array. As we iterate through the count
array to perform other operations, we also keep track of the mode by comparing the current integers' count.
Combining all these steps, the function calculates and returns the statistics as a list [minimum, maximum, mean, median, mode]
.
Learn more about Math patterns.
Solution Approach
The sampleStats
function in the provided solution uses a straightforward algorithm to calculate the various statistics for the given sample.
-
Initialization of Variables: The variables
mi
(minimum),mx
(maximum),s
(sum),cnt
(count), andmode
are initialized. Theinf
keyword in Python represents an infinite number, used here to make the initial minimum as high as possible. Themx
is initialized to -1 to later find the maximum which will certainly be higher than -1. The sums
and countcnt
are initialized to 0 to calculate the mean. The mode is initialized to 0. -
Finding Minimum, Maximum, Sum, Count, and Mode: A single loop iterates through all possible integer values (0 to 255) as indices in the
count
array. If an indexk
has a non-zero count (x
), the algorithm does several things:- Updates minimum and maximum using
min(mi, k)
andmax(mx, k)
functions. - Adds
k * x
to the sums
. - Increases the total count
cnt
byx
. - Checks if the current count
x
is greater than the count of the current mode and if so, updates the mode.
- Updates minimum and maximum using
-
Finding the Median: The median calculation depends on whether the total count
cnt
is odd or even. A helper nested functionfind(i)
is defined which returns thei
-th smallest value in the sample (if the sample were sorted). For an odd total count, it finds the middle value. For an even total count, it finds the average of the two middle values. This is done by calling thefind
function accordingly.
The helper function find
works by iterating through the count
array again. It accumulates the total number of elements seen so far in a temporary count t
and checks if it is greater than or equal to the target i
. When it reaches or passes the target, it returns the current integer value k
as the i
-th smallest value.
- Returning the Result:
Finally, the function returns a list with the calculated statistics:
[mi, mx, s / cnt, median, mode]
, corresponding to minimum, maximum, mean, median, and mode.
The pattern used here is mainly iterating through the count array to gather all the needed statistics, utilizing a single pass wherever possible to optimize performance and then leveraging aggregated data to derive mean and median.
Understanding this solution approach is mainly about recognizing that the array index itself represents the integer value in the sample, and the increment in the array represents its frequency or count in the sample.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's consider a simple example to illustrate the solution approach. Suppose we have an array count
which has non-zero values at indices 2, 3, and 4, which represent the integers 2, 3, and 4 in the sample. The count
array looks like this:
1count = [0, 0, 1, 2, 1, 0, 0, ... 0]
This means the integer 2 occurs once, 3 occurs twice, and 4 occurs once in our sample. Given that the rest of the count
array contains zeros, we will ignore them for brevity. Now, let's walk through the solution approach:
-
Initialization of Variables: The variables
mi
,mx
,s
,cnt
, andmode
are initialized. We start withmi
set to infinity,mx
to -1,s
andcnt
to 0, andmode
to 0. -
Finding Minimum, Maximum, Sum, Count, and Mode: We iterate through the indices 0 to 255 of the
count
array. Here's the breakdown:- At index 2,
count[2]
is 1. We updatemi
to min(inf, 2) which is 2,mx
to max(-1, 2) which is 2, add2 * 1
tos
to make it 2, add 1 tocnt
to make it 1, and update the mode to 2 since 1 > 0. - At index 3,
count[3]
is 2. We updatemi
(remains 2),mx
to max(2, 3) which is 3, add3 * 2
tos
to make it 8, add 2 tocnt
to make it 3, and update mode to 3 since 2 > 1. - At index 4,
count[4]
is 1. We updatemi
(remains 2),mx
to max(3, 4) which is 4, add4 * 1
tos
to make it 12, add 1 tocnt
to make it 4, and mode remains 3 (since count of 4 is not greater than count of 3).
- At index 2,
-
Finding the Median: The total count
cnt
is 4, which is even. We use ourfind
function to locate the middle values. We need the average of the 2nd and 3rd smallest values.- Invoking
find(2)
will iterate throughcount
and sum up the counts until it equals or surpasses 2. It surpasses 2 at index 3, thus the 2nd smallest value is 3. - Invoking
find(3)
will iterate again until the sum surpasses 3, which will happen at index 3 as well. Thus, the 3rd smallest value is also 3. - The average of these is 3, so the median is 3.
- Invoking
-
Returning the Result: We calculate the mean as
s / cnt
which is12 / 4
equals 3. Our final statistics array is[mi, mx, mean, median, mode]
, which in this case is[2, 4, 3, 3, 3]
.
From this example, you see how the count array index's value is used as an integer from the sample and the array's value at that index is its frequency. By iterating through the count array, we can gather all the information we need for our statistics using well-organized loops and conditional statements.
Solution Implementation
1from math import inf
2from typing import List
3
4class Solution:
5 def sampleStats(self, count: List[int]) -> List[float]:
6 # Helper function to find the index of the i-th number in the data stream
7 def find(i: int) -> int:
8 total = 0
9 for index, frequency in enumerate(count):
10 total += frequency
11 if total >= i:
12 return index
13
14 # Initialize minimum and maximum to represent the range of the data stream
15 minimum = inf
16 maximum = -1
17 # Initialize variables for sum and count to compute mean
18 sum_values = 0
19 total_count = 0
20 # Initialize the mode value
21 mode = 0
22
23 # Loop through count array to compute min, max, sum, total_count, and mode
24 for value, frequency in enumerate(count):
25 if frequency:
26 minimum = min(minimum, value)
27 maximum = max(maximum, value)
28 sum_values += value * frequency
29 total_count += frequency
30 # Update mode if the current frequency is greater than the max frequency found so far
31 if frequency > count[mode]:
32 mode = value
33
34 # Compute median
35 if total_count & 1: # If the total number of elements is odd
36 median = find(total_count // 2 + 1)
37 else:
38 # If the number of elements is even, average the middle two elements
39 median = (find(total_count // 2) + find(total_count // 2 + 1)) / 2
40
41 # Compute mean
42 mean = sum_values / total_count
43
44 # Return a list containing the minimum, maximum, mean, median, and mode
45 return [float(minimum), float(maximum), mean, median, float(mode)]
46
1class Solution {
2 private int[] elementCounts; // This array holds the count for each number.
3
4 public double[] sampleStats(int[] count) {
5 this.elementCounts = count;
6 int min = Integer.MAX_VALUE, max = Integer.MIN_VALUE;
7 long sum = 0; // Used to calculate the mean
8 int totalCount = 0; // Total number of elements
9 int mode = 0; // The number that appears the most frequently
10
11 // Iterate through the count array to find minimum, maximum, sum, total count, and mode
12 for (int number = 0; number < elementCounts.length; ++number) {
13 if (elementCounts[number] > 0) {
14 min = Math.min(min, number);
15 max = Math.max(max, number);
16 sum += (long) number * elementCounts[number];
17 totalCount += elementCounts[number];
18 if (elementCounts[number] > elementCounts[mode]) {
19 mode = number;
20 }
21 }
22 }
23
24 // Calculate median
25 double median = totalCount % 2 == 1
26 ? findKthElement(totalCount / 2 + 1)
27 : (findKthElement(totalCount / 2) +
28 findKthElement(totalCount / 2 + 1)) / 2.0;
29
30 // Calculate mean
31 double mean = sum * 1.0 / totalCount;
32
33 // Return the results as an array: [min, max, mean, median, mode]
34 return new double[] {min, max, mean, median, mode};
35 }
36
37 // Helper function to find the kth element when the count array is treated like an expanded array
38 private int findKthElement(int k) {
39 int elementsSoFar = 0;
40
41 // Iterate through elementCounts and keep adding until we reach or pass the kth element
42 for (int number = 0;; ++number) {
43 elementsSoFar += elementCounts[number];
44 if (elementsSoFar >= k) {
45 return number;
46 }
47 }
48 }
49}
50
1class Solution {
2public:
3 vector<double> sampleStats(vector<int>& count) {
4 // Lambda to find the value at the ith position when the array is considered sorted.
5 auto findValueAtPosition = [&](int position) -> int {
6 for (int value = 0, accumulated = 0;; ++value) {
7 accumulated += count[value];
8 if (accumulated >= position) {
9 return value;
10 }
11 }
12 };
13
14 // Initialize minimum and maximum values to extreme values.
15 int minimumValue = INT_MAX, maximumValue = INT_MIN;
16
17 // Initialize variables to calculate the sum and mode.
18 long long sum = 0;
19 int totalCount = 0, modeValue = 0;
20
21 // Loop through the count array to find the minimum, maximum, total count and mode.
22 for (int value = 0; value < count.size(); ++value) {
23 if (count[value] > 0) {
24 minimumValue = min(minimumValue, value);
25 maximumValue = max(maximumValue, value);
26 sum += static_cast<long long>(value) * count[value];
27 totalCount += count[value];
28 if (count[value] > count[modeValue]) {
29 modeValue = value;
30 }
31 }
32 }
33
34 // Calculate median using the findValueAtPosition lambda.
35 double median;
36 if (totalCount % 2 == 1) {
37 // If the total count is odd, select the middle value directly.
38 median = findValueAtPosition(totalCount / 2 + 1);
39 } else {
40 // If the total count is even, take the average of the two middle values.
41 median = (findValueAtPosition(totalCount / 2) + findValueAtPosition(totalCount / 2 + 1)) / 2.0;
42 }
43
44 // Calculate mean by dividing the sum by the total count.
45 double mean = sum / static_cast<double>(totalCount);
46
47 // Store the stats results as a vector of doubles and return.
48 return vector<double>{static_cast<double>(minimumValue),
49 static_cast<double>(maximumValue),
50 mean,
51 median,
52 static_cast<double>(modeValue)};
53 }
54};
55
1// Helper function to find the ith smallest number based on the cumulative frequency
2function findIthNumber(numbersCount: number[], ith: number): number {
3 for (let k = 0, total = 0; ; ++k) {
4 total += numbersCount[k];
5 if (total >= ith) {
6 return k;
7 }
8 }
9}
10
11// Main function to calculate statistics from a sample provided as an array of counts for each value
12function sampleStats(numbersCount: number[]): number[] {
13 let min = Number.MAX_SAFE_INTEGER; // Initialize minimum to a large number
14 let max = -1; // Initialize maximum to a small number
15 let sum = 0; // Sum of all numbers
16 let totalCount = 0; // Total count of all numbers
17 let mode = 0; // Mode value of the numbers
18
19 // Iterate over each count index to calculate min, max, sum, and mode
20 for (let k = 0; k < numbersCount.length; ++k) {
21 if (numbersCount[k] > 0) {
22 min = Math.min(min, k);
23 max = Math.max(max, k);
24 sum += k * numbersCount[k];
25 totalCount += numbersCount[k];
26 if (numbersCount[k] > numbersCount[mode]) {
27 mode = k;
28 }
29 }
30 }
31
32 // Calculate median based on whether the count is odd or even
33 let median =
34 totalCount % 2 === 1
35 ? findIthNumber(numbersCount, (totalCount >> 1) + 1)
36 : (findIthNumber(numbersCount, totalCount >> 1) + findIthNumber(numbersCount, (totalCount >> 1) + 1)) / 2;
37
38 // Return an array containing min, max, mean, median, mode
39 return [min, max, sum / totalCount, median, mode];
40}
41
Time and Space Complexity
Time Complexity:
The time complexity of the code can be analyzed by looking at the number of operations performed relative to the number of elements n
in the count
list:
- The loop to find the minimum (
mi
) and maximum (mx
) values, the total count (cnt
), the sum (s
), and the mode runs once for each of then
elements:O(n)
. - The
find
function is called either 2 or 3 times depending on whethercnt
is odd or even. Each call tofind
runs inO(n)
because, in the worst-case scenario, it iterates over the entirecount
list. Thus, the worst case for finding the median isO(n)
. - In summary, the overall time complexity for this piece of code is
O(n)
+O(n)
for median calculation, which simplifies toO(n)
since both are linear operations.
Space Complexity:
- The space complexity is
O(1)
because the space used does not depend on the input sizen
. The variablesmi
,mx
,s
,cnt
, andmode
use a constant amount of space. - The
find
function uses a constant amount of space as well since the variablest
andk
are of fixed size.
In conclusion, the code exhibits a linear time complexity (O(n)
) and constant space complexity (O(1)
).
Learn more about how to find time and space complexity quickly using problem constraints.
Which data structure is used in a depth first search?
Recommended Readings
Math for Technical Interviews How much math do I need to know for technical interviews The short answer is about high school level math Computer science is often associated with math and some universities even place their computer science department under the math faculty However the reality is that you
Patterns The Shortest Path Algorithm for Coding Interviews The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we can determine the
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Got a question?ย Ask the Monster Assistantย anything you don't understand.
Still not clear? ย Submitย the part you don't understand to our editors. Or join ourย Discord and ask the community.