Facebook Pixel

1093. Statistics from a Large Sample

MediumArrayMathProbability and Statistics
Leetcode Link

Problem Description

You are given an array count of length 256 where count[k] represents how many times the integer k appears in a large sample of integers. All integers in the sample are in the range [0, 255].

Your task is to calculate five statistical measures from this sample:

  1. Minimum: The smallest value that appears in the sample (has count > 0)
  2. Maximum: The largest value that appears in the sample (has count > 0)
  3. Mean: The average value, calculated as the sum of all elements divided by the total number of elements
  4. Median:
    • If the total number of elements is odd, the median is the middle element when all elements are sorted
    • If the total number of elements is even, the median is the average of the two middle elements when sorted
  5. Mode: The value that appears most frequently in the sample (guaranteed to be unique)

Return these five statistics as an array of floating-point numbers: [minimum, maximum, mean, median, mode].

For example, if count = [0, 1, 3, 4, 0, 0, ...], this means:

  • Value 0 appears 0 times
  • Value 1 appears 1 time
  • Value 2 appears 3 times
  • Value 3 appears 4 times
  • And so on...

The actual sample would be [1, 2, 2, 2, 3, 3, 3, 3] with 8 total elements.

Answers within 10^-5 of the actual answer will be accepted.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

Since we're working with a frequency array rather than the actual sample, we need to think about how to extract statistics without reconstructing the entire sample.

For minimum and maximum, we simply need to find the first and last indices where count[k] > 0. As we iterate through the count array, we can track these values.

For mean, we don't need to reconstruct the sample. We can calculate the sum directly by multiplying each value k by its frequency count[k]. The total sum would be 0 * count[0] + 1 * count[1] + 2 * count[2] + .... We also accumulate the total count of elements. The mean is then total_sum / total_count.

For mode, we need to find the index with the highest count. As we iterate through the array, we can keep track of which index has the maximum frequency seen so far.

The median is the trickiest part. If we were to reconstruct the sorted sample, it would look like [0, 0, ..., 1, 1, ..., 2, 2, ...] where each value k appears count[k] times. To find the median:

  • If total count is odd, we need the element at position (total_count // 2) + 1
  • If total count is even, we need elements at positions total_count // 2 and (total_count // 2) + 1, then average them

Instead of reconstructing the array, we can create a helper function that finds the k-th element in the sorted sample. This function accumulates counts until reaching the desired position. When the cumulative count reaches or exceeds our target position, we've found the value at that position.

By processing the count array in a single pass (except for median calculation), we can efficiently compute all statistics without explicitly creating the large sample array.

Learn more about Math patterns.

Solution Approach

The solution implements the intuition discussed above with a clean single-pass approach for most statistics and a helper function for finding the median.

Helper Function for Finding k-th Element:

def find(i: int) -> int:
    t = 0
    for k, x in enumerate(count):
        t += x
        if t >= i:
            return k

This function finds the value at the i-th position in the sorted sample. It accumulates counts until the cumulative total t reaches or exceeds position i, then returns the current index k.

Main Processing Loop:

mi, mx = inf, -1
s = cnt = 0
mode = 0
for k, x in enumerate(count):
    if x:  # Only process if count[k] > 0
        mi = min(mi, k)      # Update minimum
        mx = max(mx, k)      # Update maximum
        s += k * x           # Add to sum (value * frequency)
        cnt += x             # Add to total count
        if x > count[mode]:  # Update mode if current frequency is higher
            mode = k

The loop iterates through each index k and its count x:

  • Minimum: Initialize to infinity, update to the smallest k where x > 0
  • Maximum: Initialize to -1, update to the largest k where x > 0
  • Sum: Accumulate k * x for calculating mean later
  • Total Count: Accumulate all frequencies
  • Mode: Track the index with the highest frequency

Median Calculation:

median = (
    find(cnt // 2 + 1) if cnt & 1 else (find(cnt // 2) + find(cnt // 2 + 1)) / 2
)
  • If total count is odd (cnt & 1 checks the least significant bit), find the middle element at position cnt // 2 + 1
  • If total count is even, find the two middle elements at positions cnt // 2 and cnt // 2 + 1, then average them

Final Return:

return [mi, mx, s / cnt, median, mode]

Returns the five statistics where mean is calculated as s / cnt.

The time complexity is O(n) for the main loop plus O(n) for each median lookup, where n = 256. The space complexity is O(1) as we only use a few variables.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example with count = [0, 1, 3, 4, 0, 0, ...] (remaining values are 0).

This represents the sample: [1, 2, 2, 2, 3, 3, 3, 3] (1 appears once, 2 appears 3 times, 3 appears 4 times).

Step 1: Initialize variables

  • mi = inf, mx = -1
  • s = 0, cnt = 0
  • mode = 0

Step 2: Process the count array

k = 0, x = 0: Skip (x = 0)

k = 1, x = 1: (x > 0, so process)

  • mi = min(inf, 1) = 1
  • mx = max(-1, 1) = 1
  • s = 0 + 1*1 = 1
  • cnt = 0 + 1 = 1
  • count[mode] = 0 < 1, so mode = 1

k = 2, x = 3:

  • mi = min(1, 2) = 1
  • mx = max(1, 2) = 2
  • s = 1 + 2*3 = 7
  • cnt = 1 + 3 = 4
  • count[mode] = 1 < 3, so mode = 2

k = 3, x = 4:

  • mi = min(1, 3) = 1
  • mx = max(2, 3) = 3
  • s = 7 + 3*4 = 19
  • cnt = 4 + 4 = 8
  • count[mode] = 3 < 4, so mode = 3

k = 4 to 255: All have x = 0, so skip

Step 3: Calculate mean

  • mean = s / cnt = 19 / 8 = 2.375

Step 4: Calculate median

  • Total count is 8 (even), so we need positions 4 and 5
  • find(4): Accumulate counts: 0→1→4. When we reach k=2, cumulative = 4, which equals our target, so return 2
  • find(5): Accumulate counts: 0→1→4→8. When we reach k=3, cumulative = 8 ≥ 5, so return 3
  • median = (2 + 3) / 2 = 2.5

Step 5: Return results [1, 3, 2.375, 2.5, 3]

This matches our expectations:

  • Minimum: 1 (smallest value present)
  • Maximum: 3 (largest value present)
  • Mean: (1 + 2 + 2 + 2 + 3 + 3 + 3 + 3) / 8 = 2.375
  • Median: Middle of sorted [1, 2, 2, 2, 3, 3, 3, 3] = average of 4th and 5th elements = (2 + 3) / 2 = 2.5
  • Mode: 3 (appears 4 times, most frequent)

Solution Implementation

1class Solution:
2    def sampleStats(self, count: List[int]) -> List[float]:
3        """
4        Calculate statistics for a sample represented as a frequency array.
5      
6        Args:
7            count: Array where count[i] represents the frequency of value i
8          
9        Returns:
10            List containing [minimum, maximum, mean, median, mode]
11        """
12      
13        def find_kth_element(k: int) -> int:
14            """
15            Find the k-th smallest element (1-indexed) in the frequency array.
16          
17            Args:
18                k: The position to find (1-indexed)
19              
20            Returns:
21                The value at the k-th position
22            """
23            cumulative_count = 0
24            for value, frequency in enumerate(count):
25                cumulative_count += frequency
26                if cumulative_count >= k:
27                    return value
28      
29        # Initialize statistics variables
30        minimum_value = float('inf')
31        maximum_value = -1
32        total_sum = 0
33        total_count = 0
34        mode_value = 0
35      
36        # Single pass to calculate min, max, sum, count, and mode
37        for value, frequency in enumerate(count):
38            if frequency > 0:
39                # Update minimum and maximum
40                minimum_value = min(minimum_value, value)
41                maximum_value = max(maximum_value, value)
42              
43                # Update sum and count for mean calculation
44                total_sum += value * frequency
45                total_count += frequency
46              
47                # Update mode if current frequency is higher
48                if frequency > count[mode_value]:
49                    mode_value = value
50      
51        # Calculate median based on whether total count is odd or even
52        if total_count & 1:  # Odd number of elements
53            median = find_kth_element(total_count // 2 + 1)
54        else:  # Even number of elements
55            median = (find_kth_element(total_count // 2) + 
56                     find_kth_element(total_count // 2 + 1)) / 2.0
57      
58        # Calculate mean
59        mean = total_sum / total_count
60      
61        # Return all statistics as floats
62        return [float(minimum_value), float(maximum_value), mean, median, float(mode_value)]
63
1class Solution {
2    private int[] frequencyCount;
3
4    /**
5     * Calculate statistics for a large sample represented as a frequency array
6     * @param count Array where count[i] represents the frequency of value i
7     * @return Array containing [minimum, maximum, mean, median, mode]
8     */
9    public double[] sampleStats(int[] count) {
10        this.frequencyCount = count;
11      
12        // Initialize minimum to large value and maximum to small value
13        int minimum = 1 << 30;  // Large initial value (2^30)
14        int maximum = -1;
15      
16        // Variables for calculating mean
17        long sum = 0;           // Sum of all values (using long to prevent overflow)
18        int totalCount = 0;     // Total number of elements
19      
20        // Variable for tracking mode (most frequent value)
21        int mode = 0;
22      
23        // Iterate through all possible values to collect statistics
24        for (int value = 0; value < count.length; value++) {
25            if (count[value] > 0) {
26                // Update minimum and maximum values
27                minimum = Math.min(minimum, value);
28                maximum = Math.max(maximum, value);
29              
30                // Accumulate sum for mean calculation
31                sum += 1L * value * count[value];  // Cast to long to prevent overflow
32              
33                // Count total number of elements
34                totalCount += count[value];
35              
36                // Update mode if current value has higher frequency
37                if (count[value] > count[mode]) {
38                    mode = value;
39                }
40            }
41        }
42      
43        // Calculate median based on whether total count is odd or even
44        double median;
45        if (totalCount % 2 == 1) {
46            // Odd count: median is the middle element
47            median = findKthElement(totalCount / 2 + 1);
48        } else {
49            // Even count: median is average of two middle elements
50            median = (findKthElement(totalCount / 2) + findKthElement(totalCount / 2 + 1)) / 2.0;
51        }
52      
53        // Calculate mean as sum divided by count
54        double mean = sum * 1.0 / totalCount;
55      
56        // Return all statistics in specified order
57        return new double[] {minimum, maximum, mean, median, mode};
58    }
59
60    /**
61     * Find the k-th element (1-indexed) in the frequency distribution
62     * @param k The position of the element to find (1-indexed)
63     * @return The value at the k-th position
64     */
65    private int findKthElement(int k) {
66        int cumulativeCount = 0;
67      
68        // Iterate through values and accumulate counts
69        for (int value = 0; ; value++) {
70            cumulativeCount += frequencyCount[value];
71          
72            // Return value when cumulative count reaches or exceeds k
73            if (cumulativeCount >= k) {
74                return value;
75            }
76        }
77    }
78}
79
1class Solution {
2public:
3    vector<double> sampleStats(vector<int>& count) {
4        // Lambda function to find the k-th smallest element (1-indexed)
5        auto findKthElement = [&](int k) -> int {
6            int cumulativeCount = 0;
7            for (int value = 0; value < count.size(); ++value) {
8                cumulativeCount += count[value];
9                if (cumulativeCount >= k) {
10                    return value;
11                }
12            }
13            return -1; // Should never reach here
14        };
15      
16        // Initialize statistics variables
17        int minValue = INT_MAX;
18        int maxValue = INT_MIN;
19        long long sum = 0;
20        int totalCount = 0;
21        int modeValue = 0;
22      
23        // Single pass to calculate min, max, sum, total count, and mode
24        for (int value = 0; value < count.size(); ++value) {
25            if (count[value] > 0) {
26                // Update minimum value
27                minValue = min(minValue, value);
28              
29                // Update maximum value
30                maxValue = max(maxValue, value);
31              
32                // Add to sum (using long long to prevent overflow)
33                sum += static_cast<long long>(value) * count[value];
34              
35                // Update total count
36                totalCount += count[value];
37              
38                // Update mode (value with highest frequency)
39                if (count[value] > count[modeValue]) {
40                    modeValue = value;
41                }
42            }
43        }
44      
45        // Calculate median
46        double median;
47        if (totalCount % 2 == 1) {
48            // Odd number of elements: median is the middle element
49            median = findKthElement(totalCount / 2 + 1);
50        } else {
51            // Even number of elements: median is the average of two middle elements
52            int leftMiddle = findKthElement(totalCount / 2);
53            int rightMiddle = findKthElement(totalCount / 2 + 1);
54            median = (leftMiddle + rightMiddle) / 2.0;
55        }
56      
57        // Calculate mean
58        double mean = static_cast<double>(sum) / totalCount;
59      
60        // Return all statistics: [minimum, maximum, mean, median, mode]
61        return {
62            static_cast<double>(minValue),
63            static_cast<double>(maxValue),
64            mean,
65            median,
66            static_cast<double>(modeValue)
67        };
68    }
69};
70
1/**
2 * Calculates statistical measures from a frequency count array
3 * @param count - Array where count[i] represents the frequency of value i
4 * @returns Array containing [minimum, maximum, mean, median, mode]
5 */
6function sampleStats(count: number[]): number[] {
7    /**
8     * Finds the value at the specified position in the sorted sequence
9     * @param targetPosition - The position to find (1-indexed)
10     * @returns The value at the target position
11     */
12    const findValueAtPosition = (targetPosition: number): number => {
13        let cumulativeCount = 0;
14      
15        for (let value = 0; value < count.length; value++) {
16            cumulativeCount += count[value];
17          
18            // When cumulative count reaches or exceeds target position, we found our value
19            if (cumulativeCount >= targetPosition) {
20                return value;
21            }
22        }
23      
24        // This should never be reached given valid input
25        return -1;
26    };
27  
28    // Initialize statistical variables
29    let minimum = Number.MAX_SAFE_INTEGER;
30    let maximum = -1;
31    let sum = 0;
32    let totalCount = 0;
33    let mode = 0;
34  
35    // Single pass to calculate min, max, sum, total count, and mode
36    for (let value = 0; value < count.length; value++) {
37        if (count[value] > 0) {
38            // Update minimum value
39            minimum = Math.min(minimum, value);
40          
41            // Update maximum value
42            maximum = Math.max(maximum, value);
43          
44            // Add to sum (value * frequency)
45            sum += value * count[value];
46          
47            // Add frequency to total count
48            totalCount += count[value];
49          
50            // Update mode if this value has higher frequency
51            if (count[value] > count[mode]) {
52                mode = value;
53            }
54        }
55    }
56  
57    // Calculate median based on whether total count is odd or even
58    let median: number;
59    if (totalCount % 2 === 1) {
60        // Odd count: median is the middle element
61        const middlePosition = Math.floor(totalCount / 2) + 1;
62        median = findValueAtPosition(middlePosition);
63    } else {
64        // Even count: median is average of two middle elements
65        const leftMiddlePosition = totalCount / 2;
66        const rightMiddlePosition = leftMiddlePosition + 1;
67        median = (findValueAtPosition(leftMiddlePosition) + findValueAtPosition(rightMiddlePosition)) / 2;
68    }
69  
70    // Calculate mean
71    const mean = sum / totalCount;
72  
73    // Return all statistics as an array
74    return [minimum, maximum, mean, median, mode];
75}
76

Time and Space Complexity

Time Complexity: O(n + k) where n is the length of the count array and k is the total number of elements (sum of all counts).

  • The first loop iterates through the entire count array once to calculate minimum, maximum, sum, total count, and mode, which takes O(n) time.
  • The find function is called at most 2 times for calculating the median. Each call to find iterates through the count array in the worst case, taking O(n) time per call.
  • However, the total iterations across all find calls is bounded by the position we're looking for, which in the worst case is cnt // 2 + 1. Since we need to iterate through counts that sum up to at most cnt // 2 + 1 elements, this contributes O(n) time in the worst case.
  • Overall time complexity: O(n) + O(n) = O(n)

Space Complexity: O(1)

  • The algorithm uses only a constant amount of extra space for variables: mi, mx, s, cnt, mode, median, and t (inside the find function).
  • The find function doesn't use any additional data structures that scale with input size.
  • No recursive calls are made, so there's no additional stack space usage.
  • The space used doesn't depend on the size of the input array or the values within it.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Off-by-One Error in Median Calculation

Pitfall: A common mistake is using 0-indexed positioning when finding the median, leading to incorrect results. Since the sample is conceptually a sorted array with 1-based indexing for positions, using 0-based indexing would select wrong elements.

Example of Wrong Implementation:

# WRONG: Using 0-based indexing
if total_count & 1:
    median = find_kth_element(total_count // 2)  # Wrong!
else:
    median = (find_kth_element(total_count // 2 - 1) + 
              find_kth_element(total_count // 2)) / 2.0  # Wrong!

Correct Solution:

# CORRECT: Using 1-based indexing for positions
if total_count & 1:
    median = find_kth_element(total_count // 2 + 1)  # Middle element
else:
    median = (find_kth_element(total_count // 2) + 
              find_kth_element(total_count // 2 + 1)) / 2.0

2. Integer Division in Mean/Median Calculation

Pitfall: In Python 2 or when not careful with types, integer division might truncate decimal values, causing precision loss.

Example of Wrong Implementation:

# WRONG: May lose precision in some Python versions
mean = total_sum // total_count  # Integer division
median = (val1 + val2) // 2      # Integer division for even case

Correct Solution:

# CORRECT: Use float division
mean = total_sum / total_count           # Float division
median = (val1 + val2) / 2.0            # Explicitly use float division

3. Not Handling Empty Frequency Arrays Properly

Pitfall: While the problem guarantees at least one element, not checking for empty arrays in the find_kth_element function could cause it to return None or raise an error if called with invalid k values.

Example of Wrong Implementation:

def find_kth_element(k: int) -> int:
    cumulative_count = 0
    for value, frequency in enumerate(count):
        cumulative_count += frequency
        if cumulative_count >= k:
            return value
    # No return statement if k is invalid!

Correct Solution:

def find_kth_element(k: int) -> int:
    cumulative_count = 0
    for value, frequency in enumerate(count):
        cumulative_count += frequency
        if cumulative_count >= k:
            return value
    return -1  # Or raise an exception for invalid k

4. Mode Initialization Problem

Pitfall: Initializing mode_value = 0 assumes that index 0 has the highest frequency initially. If count[0] = 0 but there's a non-zero count elsewhere, the comparison if frequency > count[mode_value] would compare against 0, which works correctly. However, a more robust approach would handle edge cases explicitly.

Potential Issue:

# If all counts are 0 (empty sample), mode_value stays at 0
mode_value = 0
for value, frequency in enumerate(count):
    if frequency > count[mode_value]:
        mode_value = value

More Robust Solution:

mode_value = -1
max_frequency = 0
for value, frequency in enumerate(count):
    if frequency > 0 and frequency > max_frequency:
        max_frequency = frequency
        mode_value = value

5. Floating Point Precision

Pitfall: Not returning consistent float types might cause type mismatches in some test frameworks.

Example of Wrong Implementation:

# WRONG: Mixed types
return [minimum_value, maximum_value, mean, median, mode_value]  # mode_value is int

Correct Solution:

# CORRECT: All values as floats
return [float(minimum_value), float(maximum_value), mean, median, float(mode_value)]
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which technique can we use to find the middle of a linked list?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More