1093. Statistics from a Large Sample
Problem Description
You are given an array count
of length 256 where count[k]
represents how many times the integer k
appears in a large sample of integers. All integers in the sample are in the range [0, 255]
.
Your task is to calculate five statistical measures from this sample:
- Minimum: The smallest value that appears in the sample (has count > 0)
- Maximum: The largest value that appears in the sample (has count > 0)
- Mean: The average value, calculated as the sum of all elements divided by the total number of elements
- Median:
- If the total number of elements is odd, the median is the middle element when all elements are sorted
- If the total number of elements is even, the median is the average of the two middle elements when sorted
- Mode: The value that appears most frequently in the sample (guaranteed to be unique)
Return these five statistics as an array of floating-point numbers: [minimum, maximum, mean, median, mode]
.
For example, if count = [0, 1, 3, 4, 0, 0, ...]
, this means:
- Value 0 appears 0 times
- Value 1 appears 1 time
- Value 2 appears 3 times
- Value 3 appears 4 times
- And so on...
The actual sample would be [1, 2, 2, 2, 3, 3, 3, 3]
with 8 total elements.
Answers within 10^-5
of the actual answer will be accepted.
Intuition
Since we're working with a frequency array rather than the actual sample, we need to think about how to extract statistics without reconstructing the entire sample.
For minimum and maximum, we simply need to find the first and last indices where count[k] > 0
. As we iterate through the count array, we can track these values.
For mean, we don't need to reconstruct the sample. We can calculate the sum directly by multiplying each value k
by its frequency count[k]
. The total sum would be 0 * count[0] + 1 * count[1] + 2 * count[2] + ...
. We also accumulate the total count of elements. The mean is then total_sum / total_count
.
For mode, we need to find the index with the highest count. As we iterate through the array, we can keep track of which index has the maximum frequency seen so far.
The median is the trickiest part. If we were to reconstruct the sorted sample, it would look like [0, 0, ..., 1, 1, ..., 2, 2, ...]
where each value k
appears count[k]
times. To find the median:
- If total count is odd, we need the element at position
(total_count // 2) + 1
- If total count is even, we need elements at positions
total_count // 2
and(total_count // 2) + 1
, then average them
Instead of reconstructing the array, we can create a helper function that finds the k-th element in the sorted sample. This function accumulates counts until reaching the desired position. When the cumulative count reaches or exceeds our target position, we've found the value at that position.
By processing the count array in a single pass (except for median calculation), we can efficiently compute all statistics without explicitly creating the large sample array.
Learn more about Math patterns.
Solution Approach
The solution implements the intuition discussed above with a clean single-pass approach for most statistics and a helper function for finding the median.
Helper Function for Finding k-th Element:
def find(i: int) -> int:
t = 0
for k, x in enumerate(count):
t += x
if t >= i:
return k
This function finds the value at the i-th position in the sorted sample. It accumulates counts until the cumulative total t
reaches or exceeds position i
, then returns the current index k
.
Main Processing Loop:
mi, mx = inf, -1
s = cnt = 0
mode = 0
for k, x in enumerate(count):
if x: # Only process if count[k] > 0
mi = min(mi, k) # Update minimum
mx = max(mx, k) # Update maximum
s += k * x # Add to sum (value * frequency)
cnt += x # Add to total count
if x > count[mode]: # Update mode if current frequency is higher
mode = k
The loop iterates through each index k
and its count x
:
- Minimum: Initialize to infinity, update to the smallest
k
wherex > 0
- Maximum: Initialize to -1, update to the largest
k
wherex > 0
- Sum: Accumulate
k * x
for calculating mean later - Total Count: Accumulate all frequencies
- Mode: Track the index with the highest frequency
Median Calculation:
median = ( find(cnt // 2 + 1) if cnt & 1 else (find(cnt // 2) + find(cnt // 2 + 1)) / 2 )
- If total count is odd (
cnt & 1
checks the least significant bit), find the middle element at positioncnt // 2 + 1
- If total count is even, find the two middle elements at positions
cnt // 2
andcnt // 2 + 1
, then average them
Final Return:
return [mi, mx, s / cnt, median, mode]
Returns the five statistics where mean is calculated as s / cnt
.
The time complexity is O(n)
for the main loop plus O(n)
for each median lookup, where n = 256. The space complexity is O(1)
as we only use a few variables.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example with count = [0, 1, 3, 4, 0, 0, ...]
(remaining values are 0).
This represents the sample: [1, 2, 2, 2, 3, 3, 3, 3]
(1 appears once, 2 appears 3 times, 3 appears 4 times).
Step 1: Initialize variables
mi = inf
,mx = -1
s = 0
,cnt = 0
mode = 0
Step 2: Process the count array
k = 0, x = 0: Skip (x = 0)
k = 1, x = 1: (x > 0, so process)
mi = min(inf, 1) = 1
mx = max(-1, 1) = 1
s = 0 + 1*1 = 1
cnt = 0 + 1 = 1
count[mode] = 0 < 1
, somode = 1
k = 2, x = 3:
mi = min(1, 2) = 1
mx = max(1, 2) = 2
s = 1 + 2*3 = 7
cnt = 1 + 3 = 4
count[mode] = 1 < 3
, somode = 2
k = 3, x = 4:
mi = min(1, 3) = 1
mx = max(2, 3) = 3
s = 7 + 3*4 = 19
cnt = 4 + 4 = 8
count[mode] = 3 < 4
, somode = 3
k = 4 to 255: All have x = 0, so skip
Step 3: Calculate mean
mean = s / cnt = 19 / 8 = 2.375
Step 4: Calculate median
- Total count is 8 (even), so we need positions 4 and 5
find(4)
: Accumulate counts: 0→1→4. When we reach k=2, cumulative = 4, which equals our target, so return 2find(5)
: Accumulate counts: 0→1→4→8. When we reach k=3, cumulative = 8 ≥ 5, so return 3median = (2 + 3) / 2 = 2.5
Step 5: Return results
[1, 3, 2.375, 2.5, 3]
This matches our expectations:
- Minimum: 1 (smallest value present)
- Maximum: 3 (largest value present)
- Mean: (1 + 2 + 2 + 2 + 3 + 3 + 3 + 3) / 8 = 2.375
- Median: Middle of sorted [1, 2, 2, 2, 3, 3, 3, 3] = average of 4th and 5th elements = (2 + 3) / 2 = 2.5
- Mode: 3 (appears 4 times, most frequent)
Solution Implementation
1class Solution:
2 def sampleStats(self, count: List[int]) -> List[float]:
3 """
4 Calculate statistics for a sample represented as a frequency array.
5
6 Args:
7 count: Array where count[i] represents the frequency of value i
8
9 Returns:
10 List containing [minimum, maximum, mean, median, mode]
11 """
12
13 def find_kth_element(k: int) -> int:
14 """
15 Find the k-th smallest element (1-indexed) in the frequency array.
16
17 Args:
18 k: The position to find (1-indexed)
19
20 Returns:
21 The value at the k-th position
22 """
23 cumulative_count = 0
24 for value, frequency in enumerate(count):
25 cumulative_count += frequency
26 if cumulative_count >= k:
27 return value
28
29 # Initialize statistics variables
30 minimum_value = float('inf')
31 maximum_value = -1
32 total_sum = 0
33 total_count = 0
34 mode_value = 0
35
36 # Single pass to calculate min, max, sum, count, and mode
37 for value, frequency in enumerate(count):
38 if frequency > 0:
39 # Update minimum and maximum
40 minimum_value = min(minimum_value, value)
41 maximum_value = max(maximum_value, value)
42
43 # Update sum and count for mean calculation
44 total_sum += value * frequency
45 total_count += frequency
46
47 # Update mode if current frequency is higher
48 if frequency > count[mode_value]:
49 mode_value = value
50
51 # Calculate median based on whether total count is odd or even
52 if total_count & 1: # Odd number of elements
53 median = find_kth_element(total_count // 2 + 1)
54 else: # Even number of elements
55 median = (find_kth_element(total_count // 2) +
56 find_kth_element(total_count // 2 + 1)) / 2.0
57
58 # Calculate mean
59 mean = total_sum / total_count
60
61 # Return all statistics as floats
62 return [float(minimum_value), float(maximum_value), mean, median, float(mode_value)]
63
1class Solution {
2 private int[] frequencyCount;
3
4 /**
5 * Calculate statistics for a large sample represented as a frequency array
6 * @param count Array where count[i] represents the frequency of value i
7 * @return Array containing [minimum, maximum, mean, median, mode]
8 */
9 public double[] sampleStats(int[] count) {
10 this.frequencyCount = count;
11
12 // Initialize minimum to large value and maximum to small value
13 int minimum = 1 << 30; // Large initial value (2^30)
14 int maximum = -1;
15
16 // Variables for calculating mean
17 long sum = 0; // Sum of all values (using long to prevent overflow)
18 int totalCount = 0; // Total number of elements
19
20 // Variable for tracking mode (most frequent value)
21 int mode = 0;
22
23 // Iterate through all possible values to collect statistics
24 for (int value = 0; value < count.length; value++) {
25 if (count[value] > 0) {
26 // Update minimum and maximum values
27 minimum = Math.min(minimum, value);
28 maximum = Math.max(maximum, value);
29
30 // Accumulate sum for mean calculation
31 sum += 1L * value * count[value]; // Cast to long to prevent overflow
32
33 // Count total number of elements
34 totalCount += count[value];
35
36 // Update mode if current value has higher frequency
37 if (count[value] > count[mode]) {
38 mode = value;
39 }
40 }
41 }
42
43 // Calculate median based on whether total count is odd or even
44 double median;
45 if (totalCount % 2 == 1) {
46 // Odd count: median is the middle element
47 median = findKthElement(totalCount / 2 + 1);
48 } else {
49 // Even count: median is average of two middle elements
50 median = (findKthElement(totalCount / 2) + findKthElement(totalCount / 2 + 1)) / 2.0;
51 }
52
53 // Calculate mean as sum divided by count
54 double mean = sum * 1.0 / totalCount;
55
56 // Return all statistics in specified order
57 return new double[] {minimum, maximum, mean, median, mode};
58 }
59
60 /**
61 * Find the k-th element (1-indexed) in the frequency distribution
62 * @param k The position of the element to find (1-indexed)
63 * @return The value at the k-th position
64 */
65 private int findKthElement(int k) {
66 int cumulativeCount = 0;
67
68 // Iterate through values and accumulate counts
69 for (int value = 0; ; value++) {
70 cumulativeCount += frequencyCount[value];
71
72 // Return value when cumulative count reaches or exceeds k
73 if (cumulativeCount >= k) {
74 return value;
75 }
76 }
77 }
78}
79
1class Solution {
2public:
3 vector<double> sampleStats(vector<int>& count) {
4 // Lambda function to find the k-th smallest element (1-indexed)
5 auto findKthElement = [&](int k) -> int {
6 int cumulativeCount = 0;
7 for (int value = 0; value < count.size(); ++value) {
8 cumulativeCount += count[value];
9 if (cumulativeCount >= k) {
10 return value;
11 }
12 }
13 return -1; // Should never reach here
14 };
15
16 // Initialize statistics variables
17 int minValue = INT_MAX;
18 int maxValue = INT_MIN;
19 long long sum = 0;
20 int totalCount = 0;
21 int modeValue = 0;
22
23 // Single pass to calculate min, max, sum, total count, and mode
24 for (int value = 0; value < count.size(); ++value) {
25 if (count[value] > 0) {
26 // Update minimum value
27 minValue = min(minValue, value);
28
29 // Update maximum value
30 maxValue = max(maxValue, value);
31
32 // Add to sum (using long long to prevent overflow)
33 sum += static_cast<long long>(value) * count[value];
34
35 // Update total count
36 totalCount += count[value];
37
38 // Update mode (value with highest frequency)
39 if (count[value] > count[modeValue]) {
40 modeValue = value;
41 }
42 }
43 }
44
45 // Calculate median
46 double median;
47 if (totalCount % 2 == 1) {
48 // Odd number of elements: median is the middle element
49 median = findKthElement(totalCount / 2 + 1);
50 } else {
51 // Even number of elements: median is the average of two middle elements
52 int leftMiddle = findKthElement(totalCount / 2);
53 int rightMiddle = findKthElement(totalCount / 2 + 1);
54 median = (leftMiddle + rightMiddle) / 2.0;
55 }
56
57 // Calculate mean
58 double mean = static_cast<double>(sum) / totalCount;
59
60 // Return all statistics: [minimum, maximum, mean, median, mode]
61 return {
62 static_cast<double>(minValue),
63 static_cast<double>(maxValue),
64 mean,
65 median,
66 static_cast<double>(modeValue)
67 };
68 }
69};
70
1/**
2 * Calculates statistical measures from a frequency count array
3 * @param count - Array where count[i] represents the frequency of value i
4 * @returns Array containing [minimum, maximum, mean, median, mode]
5 */
6function sampleStats(count: number[]): number[] {
7 /**
8 * Finds the value at the specified position in the sorted sequence
9 * @param targetPosition - The position to find (1-indexed)
10 * @returns The value at the target position
11 */
12 const findValueAtPosition = (targetPosition: number): number => {
13 let cumulativeCount = 0;
14
15 for (let value = 0; value < count.length; value++) {
16 cumulativeCount += count[value];
17
18 // When cumulative count reaches or exceeds target position, we found our value
19 if (cumulativeCount >= targetPosition) {
20 return value;
21 }
22 }
23
24 // This should never be reached given valid input
25 return -1;
26 };
27
28 // Initialize statistical variables
29 let minimum = Number.MAX_SAFE_INTEGER;
30 let maximum = -1;
31 let sum = 0;
32 let totalCount = 0;
33 let mode = 0;
34
35 // Single pass to calculate min, max, sum, total count, and mode
36 for (let value = 0; value < count.length; value++) {
37 if (count[value] > 0) {
38 // Update minimum value
39 minimum = Math.min(minimum, value);
40
41 // Update maximum value
42 maximum = Math.max(maximum, value);
43
44 // Add to sum (value * frequency)
45 sum += value * count[value];
46
47 // Add frequency to total count
48 totalCount += count[value];
49
50 // Update mode if this value has higher frequency
51 if (count[value] > count[mode]) {
52 mode = value;
53 }
54 }
55 }
56
57 // Calculate median based on whether total count is odd or even
58 let median: number;
59 if (totalCount % 2 === 1) {
60 // Odd count: median is the middle element
61 const middlePosition = Math.floor(totalCount / 2) + 1;
62 median = findValueAtPosition(middlePosition);
63 } else {
64 // Even count: median is average of two middle elements
65 const leftMiddlePosition = totalCount / 2;
66 const rightMiddlePosition = leftMiddlePosition + 1;
67 median = (findValueAtPosition(leftMiddlePosition) + findValueAtPosition(rightMiddlePosition)) / 2;
68 }
69
70 // Calculate mean
71 const mean = sum / totalCount;
72
73 // Return all statistics as an array
74 return [minimum, maximum, mean, median, mode];
75}
76
Time and Space Complexity
Time Complexity: O(n + k)
where n
is the length of the count array and k
is the total number of elements (sum of all counts).
- The first loop iterates through the entire
count
array once to calculate minimum, maximum, sum, total count, and mode, which takesO(n)
time. - The
find
function is called at most 2 times for calculating the median. Each call tofind
iterates through thecount
array in the worst case, takingO(n)
time per call. - However, the total iterations across all
find
calls is bounded by the position we're looking for, which in the worst case iscnt // 2 + 1
. Since we need to iterate through counts that sum up to at mostcnt // 2 + 1
elements, this contributesO(n)
time in the worst case. - Overall time complexity:
O(n) + O(n) = O(n)
Space Complexity: O(1)
- The algorithm uses only a constant amount of extra space for variables:
mi
,mx
,s
,cnt
,mode
,median
, andt
(inside thefind
function). - The
find
function doesn't use any additional data structures that scale with input size. - No recursive calls are made, so there's no additional stack space usage.
- The space used doesn't depend on the size of the input array or the values within it.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Off-by-One Error in Median Calculation
Pitfall: A common mistake is using 0-indexed positioning when finding the median, leading to incorrect results. Since the sample is conceptually a sorted array with 1-based indexing for positions, using 0-based indexing would select wrong elements.
Example of Wrong Implementation:
# WRONG: Using 0-based indexing if total_count & 1: median = find_kth_element(total_count // 2) # Wrong! else: median = (find_kth_element(total_count // 2 - 1) + find_kth_element(total_count // 2)) / 2.0 # Wrong!
Correct Solution:
# CORRECT: Using 1-based indexing for positions if total_count & 1: median = find_kth_element(total_count // 2 + 1) # Middle element else: median = (find_kth_element(total_count // 2) + find_kth_element(total_count // 2 + 1)) / 2.0
2. Integer Division in Mean/Median Calculation
Pitfall: In Python 2 or when not careful with types, integer division might truncate decimal values, causing precision loss.
Example of Wrong Implementation:
# WRONG: May lose precision in some Python versions mean = total_sum // total_count # Integer division median = (val1 + val2) // 2 # Integer division for even case
Correct Solution:
# CORRECT: Use float division mean = total_sum / total_count # Float division median = (val1 + val2) / 2.0 # Explicitly use float division
3. Not Handling Empty Frequency Arrays Properly
Pitfall: While the problem guarantees at least one element, not checking for empty arrays in the find_kth_element
function could cause it to return None
or raise an error if called with invalid k values.
Example of Wrong Implementation:
def find_kth_element(k: int) -> int:
cumulative_count = 0
for value, frequency in enumerate(count):
cumulative_count += frequency
if cumulative_count >= k:
return value
# No return statement if k is invalid!
Correct Solution:
def find_kth_element(k: int) -> int:
cumulative_count = 0
for value, frequency in enumerate(count):
cumulative_count += frequency
if cumulative_count >= k:
return value
return -1 # Or raise an exception for invalid k
4. Mode Initialization Problem
Pitfall: Initializing mode_value = 0
assumes that index 0 has the highest frequency initially. If count[0] = 0
but there's a non-zero count elsewhere, the comparison if frequency > count[mode_value]
would compare against 0, which works correctly. However, a more robust approach would handle edge cases explicitly.
Potential Issue:
# If all counts are 0 (empty sample), mode_value stays at 0
mode_value = 0
for value, frequency in enumerate(count):
if frequency > count[mode_value]:
mode_value = value
More Robust Solution:
mode_value = -1
max_frequency = 0
for value, frequency in enumerate(count):
if frequency > 0 and frequency > max_frequency:
max_frequency = frequency
mode_value = value
5. Floating Point Precision
Pitfall: Not returning consistent float types might cause type mismatches in some test frameworks.
Example of Wrong Implementation:
# WRONG: Mixed types return [minimum_value, maximum_value, mean, median, mode_value] # mode_value is int
Correct Solution:
# CORRECT: All values as floats
return [float(minimum_value), float(maximum_value), mean, median, float(mode_value)]
Which technique can we use to find the middle of a linked list?
Recommended Readings
Math for Technical Interviews How much math do I need to know for technical interviews The short answer is about high school level math Computer science is often associated with math and some universities even place their computer science department under the math faculty However the reality is that you
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Don’t Miss This!