Facebook Pixel

1086. High Five 🔒

Problem Description

You are given a list of student scores where each entry contains a student ID and a score. Multiple scores can exist for the same student. Your task is to calculate each student's "top five average" score.

The top five average for a student is calculated by:

  1. Taking their 5 highest scores
  2. Summing these 5 scores
  3. Dividing by 5 using integer division (discarding any remainder)

The input items is a list where each element is [IDáµ¢, scoreáµ¢] representing a score for a student with that ID.

You need to return a list of pairs [IDâ±¼, topFiveAverageâ±¼] where:

  • Each pair represents a student ID and their calculated top five average
  • The results must be sorted by student ID in increasing order
  • Only students who have at least one score should appear in the output

For example, if a student with ID 1 has scores [90, 85, 95, 88, 92, 87, 91], their top 5 scores would be [95, 92, 91, 90, 88], giving a sum of 456, and their top five average would be 456 // 5 = 91.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that we need to group all scores by student ID first, since the input list contains scores scattered across different students. Once we have all scores for each student grouped together, we can easily find their top 5 scores.

We can think of this problem in two phases:

  1. Collecting phase: As we iterate through the input, we gather all scores for each student. A dictionary (or hashmap) is perfect for this - we use the student ID as the key and maintain a list of all their scores as the value.

  2. Processing phase: For each student, we need their 5 highest scores. Instead of sorting all their scores (which would take O(n log n) time), we can use a more efficient approach with nlargest(5, scores) which finds the 5 largest elements in O(n log 5) = O(n) time using a heap internally.

The reason we track the maximum student ID during the collecting phase is to ensure we process students in order from ID 1 to the maximum ID. This guarantees our output is sorted by student ID as required.

For each student ID from 1 to max, we check if they have any scores. If they do, we:

  • Extract their top 5 scores using nlargest(5, xs)
  • Sum these scores and perform integer division by 5
  • Add the [ID, average] pair to our result

This approach is efficient because we only sort/process the scores we need (top 5) rather than sorting all scores for each student.

Learn more about Sorting and Heap (Priority Queue) patterns.

Solution Approach

The implementation uses a dictionary-based approach to group and process student scores:

Step 1: Initialize Data Structures

  • Create a defaultdict(list) to store scores for each student ID. This allows us to append scores without checking if the key exists.
  • Initialize a variable m to track the maximum student ID encountered.

Step 2: Group Scores by Student ID

for i, x in items:
    d[i].append(x)
    m = max(m, i)
  • Iterate through each [student_id, score] pair in the input
  • Append each score to the list associated with that student's ID in the dictionary
  • Update the maximum student ID seen

Step 3: Calculate Top Five Averages

for i in range(1, m + 1):
    if xs := d[i]:
        avg = sum(nlargest(5, xs)) // 5
        ans.append([i, avg])
  • Iterate through all possible student IDs from 1 to the maximum ID found
  • For each ID, check if the student has any scores using the walrus operator :=
  • If scores exist:
    • Use nlargest(5, xs) to efficiently find the 5 highest scores. This function uses a min-heap internally and runs in O(n) time where n is the number of scores for that student
    • Sum these top 5 scores
    • Perform integer division by 5 to get the average
    • Append [student_id, average] to the result list

Key Implementation Details:

  • The nlargest function from the heapq module is crucial here - it's more efficient than sorting all scores when we only need the top 5
  • Integer division // ensures we get the floor of the average as required
  • By iterating from 1 to max ID, we automatically ensure the output is sorted by student ID
  • The walrus operator := allows us to both assign and check if a student has scores in a single expression

Time Complexity: O(n + m*k) where n is the total number of scores, m is the number of unique students, and k is the average number of scores per student.

Space Complexity: O(n) to store all scores in the dictionary.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to illustrate the solution approach.

Input: items = [[1, 90], [1, 85], [2, 88], [1, 95], [2, 92], [1, 88], [2, 87], [1, 92], [2, 91], [1, 87], [2, 85]]

Step 1: Group scores by student ID

We iterate through each [student_id, score] pair and build our dictionary:

Processing [1, 90]: d = {1: [90]}, m = 1
Processing [1, 85]: d = {1: [90, 85]}, m = 1
Processing [2, 88]: d = {1: [90, 85], 2: [88]}, m = 2
Processing [1, 95]: d = {1: [90, 85, 95], 2: [88]}, m = 2
Processing [2, 92]: d = {1: [90, 85, 95], 2: [88, 92]}, m = 2
Processing [1, 88]: d = {1: [90, 85, 95, 88], 2: [88, 92]}, m = 2
Processing [2, 87]: d = {1: [90, 85, 95, 88], 2: [88, 92, 87]}, m = 2
Processing [1, 92]: d = {1: [90, 85, 95, 88, 92], 2: [88, 92, 87]}, m = 2
Processing [2, 91]: d = {1: [90, 85, 95, 88, 92], 2: [88, 92, 87, 91]}, m = 2
Processing [1, 87]: d = {1: [90, 85, 95, 88, 92, 87], 2: [88, 92, 87, 91]}, m = 2
Processing [2, 85]: d = {1: [90, 85, 95, 88, 92, 87], 2: [88, 92, 87, 91, 85]}, m = 2

After grouping:

  • Student 1 has scores: [90, 85, 95, 88, 92, 87]
  • Student 2 has scores: [88, 92, 87, 91, 85]
  • Maximum student ID: m = 2

Step 2: Calculate top five averages for each student

We iterate from ID 1 to ID 2:

For Student ID 1:

  • Scores: [90, 85, 95, 88, 92, 87]
  • Apply nlargest(5, scores) → [95, 92, 90, 88, 87]
  • Sum: 95 + 92 + 90 + 88 + 87 = 452
  • Average: 452 // 5 = 90
  • Add to result: [[1, 90]]

For Student ID 2:

  • Scores: [88, 92, 87, 91, 85]
  • Apply nlargest(5, scores) → [92, 91, 88, 87, 85] (all 5 scores)
  • Sum: 92 + 91 + 88 + 87 + 85 = 443
  • Average: 443 // 5 = 88
  • Add to result: [[1, 90], [2, 88]]

Final Output: [[1, 90], [2, 88]]

The result is automatically sorted by student ID since we processed IDs in increasing order from 1 to 2.

Solution Implementation

1from collections import defaultdict
2from heapq import nlargest
3from typing import List
4
5class Solution:
6    def highFive(self, items: List[List[int]]) -> List[List[int]]:
7        # Dictionary to store scores for each student ID
8        student_scores = defaultdict(list)
9        max_student_id = 0
10      
11        # Group scores by student ID and track the maximum ID
12        for student_id, score in items:
13            student_scores[student_id].append(score)
14            max_student_id = max(max_student_id, student_id)
15      
16        # List to store the final results
17        result = []
18      
19        # Process each student from ID 1 to max_student_id
20        for student_id in range(1, max_student_id + 1):
21            # Check if this student has any scores
22            scores = student_scores[student_id]
23            if scores:
24                # Calculate average of top 5 scores (integer division)
25                top_five_scores = nlargest(5, scores)
26                average = sum(top_five_scores) // 5
27                result.append([student_id, average])
28      
29        return result
30
1class Solution {
2    public int[][] highFive(int[][] items) {
3        // Array to store min-heaps for each student ID (IDs range from 1 to 100)
4        PriorityQueue<Integer>[] studentScores = new PriorityQueue[101];
5        int uniqueStudents = 0;
6        final int TOP_SCORES_COUNT = 5;
7      
8        // Process each score entry
9        for (int[] item : items) {
10            int studentId = item[0];
11            int score = item[1];
12          
13            // Initialize min-heap for new student
14            if (studentScores[studentId] == null) {
15                uniqueStudents++;
16                studentScores[studentId] = new PriorityQueue<>(TOP_SCORES_COUNT);
17            }
18          
19            // Add score to student's min-heap
20            studentScores[studentId].offer(score);
21          
22            // Keep only top 5 scores by removing the smallest when size exceeds 5
23            if (studentScores[studentId].size() > TOP_SCORES_COUNT) {
24                studentScores[studentId].poll();
25            }
26        }
27      
28        // Create result array with size equal to number of unique students
29        int[][] result = new int[uniqueStudents][2];
30        int resultIndex = 0;
31      
32        // Calculate average of top 5 scores for each student
33        for (int studentId = 0; studentId < 101; studentId++) {
34            if (studentScores[studentId] == null) {
35                continue;
36            }
37          
38            // Calculate average of the top 5 scores
39            int average = calculateSum(studentScores[studentId]) / TOP_SCORES_COUNT;
40          
41            // Store student ID and average in result
42            result[resultIndex][0] = studentId;
43            result[resultIndex][1] = average;
44            resultIndex++;
45        }
46      
47        return result;
48    }
49  
50    /**
51     * Calculates the sum of all elements in the priority queue
52     * Note: This method empties the queue
53     * 
54     * @param queue The priority queue containing integers
55     * @return The sum of all elements
56     */
57    private int calculateSum(PriorityQueue<Integer> queue) {
58        int sum = 0;
59        while (!queue.isEmpty()) {
60            sum += queue.poll();
61        }
62        return sum;
63    }
64}
65
1class Solution {
2public:
3    vector<vector<int>> highFive(vector<vector<int>>& items) {
4        // Array to store scores for each student ID (1-indexed, max ID is 1000)
5        vector<int> studentScores[1001];
6      
7        // Group scores by student ID
8        for (auto& item : items) {
9            int studentId = item[0];
10            int score = item[1];
11            studentScores[studentId].push_back(score);
12        }
13      
14        vector<vector<int>> result;
15      
16        // Process each student's scores
17        for (int studentId = 1; studentId <= 1000; ++studentId) {
18            if (!studentScores[studentId].empty()) {
19                // Sort scores in descending order to get top 5
20                sort(studentScores[studentId].begin(), 
21                     studentScores[studentId].end(), 
22                     greater<int>());
23              
24                // Calculate sum of top 5 scores
25                int sumOfTopFive = 0;
26                for (int j = 0; j < 5; ++j) {
27                    sumOfTopFive += studentScores[studentId][j];
28                }
29              
30                // Calculate average and add to result
31                int average = sumOfTopFive / 5;
32                result.push_back({studentId, average});
33            }
34        }
35      
36        return result;
37    }
38};
39
1/**
2 * Calculates the average of the top 5 scores for each student ID.
3 * @param items - Array of [studentId, score] pairs
4 * @returns Array of [studentId, averageOfTop5Scores] pairs sorted by studentId
5 */
6function highFive(items: number[][]): number[][] {
7    // Initialize array to store scores for each student ID (1-1000)
8    const scoresById: number[][] = Array(1001)
9        .fill(0)
10        .map(() => Array<number>());
11  
12    // Group scores by student ID
13    for (const [studentId, score] of items) {
14        scoresById[studentId].push(score);
15    }
16  
17    // Calculate average of top 5 scores for each student
18    const result: number[][] = [];
19  
20    for (let studentId = 1; studentId <= 1000; studentId++) {
21        if (scoresById[studentId].length > 0) {
22            // Sort scores in descending order
23            scoresById[studentId].sort((a, b) => b - a);
24          
25            // Take top 5 scores and calculate their sum
26            const top5Scores = scoresById[studentId].slice(0, 5);
27            const sumOfTop5 = top5Scores.reduce((sum, score) => sum + score, 0);
28          
29            // Calculate average (integer division)
30            const averageOfTop5 = Math.floor(sumOfTop5 / 5);
31          
32            // Add student result to output
33            result.push([studentId, averageOfTop5]);
34        }
35    }
36  
37    return result;
38}
39

Time and Space Complexity

Time Complexity: O(n + m * k log k) where n is the number of items, m is the number of unique students, and k is the number of scores per student.

  • Iterating through all items to build the dictionary: O(n)
  • For each unique student (m students), we use nlargest(5, xs) which has complexity O(k log 5) = O(k) for finding the 5 largest elements from k scores using a min-heap approach
  • Since we iterate through all student IDs from 1 to m: O(m * k)
  • Overall: O(n + m * k)

In the worst case where each student has many scores, this could be simplified to O(n log n) if we consider k could be proportional to n/m.

Space Complexity: O(n)

  • The dictionary d stores all n scores across all students: O(n)
  • The ans list stores at most m student-average pairs: O(m)
  • The nlargest function uses internal space of O(5) = O(1) for maintaining a heap of size 5
  • Since m ≤ n (number of students cannot exceed number of items), the overall space complexity is O(n)

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Assuming Students Have At Least 5 Scores

One critical pitfall in this problem is assuming every student has at least 5 scores. The problem statement doesn't guarantee this, and the code could fail or produce incorrect results if a student has fewer than 5 scores.

The Issue: When using nlargest(5, scores), if a student has fewer than 5 scores (say only 3), the function will return all available scores. The subsequent division by 5 would then be incorrect.

Example of the Problem:

# If student 1 has only 3 scores: [90, 85, 88]
top_five_scores = nlargest(5, [90, 85, 88])  # Returns [90, 88, 85]
average = sum([90, 88, 85]) // 5  # 263 // 5 = 52 (incorrect!)
# The correct average should be 263 // 3 = 87

Solution:

def highFive(self, items: List[List[int]]) -> List[List[int]]:
    student_scores = defaultdict(list)
    max_student_id = 0
  
    for student_id, score in items:
        student_scores[student_id].append(score)
        max_student_id = max(max_student_id, student_id)
  
    result = []
  
    for student_id in range(1, max_student_id + 1):
        scores = student_scores[student_id]
        if scores:
            # Handle cases where student has fewer than 5 scores
            top_scores = nlargest(5, scores)
            num_scores = min(len(scores), 5)
            average = sum(top_scores) // num_scores
            result.append([student_id, average])
  
    return result

2. Inefficient Sorting Approach

Another common mistake is sorting all scores for each student when you only need the top 5.

The Issue:

# Inefficient approach
for student_id, scores_list in student_scores.items():
    scores_list.sort(reverse=True)  # O(k log k) where k is number of scores
    average = sum(scores_list[:5]) // 5

This sorts all scores even if a student has hundreds of scores, when we only need the top 5.

Better Approach: Using nlargest(5, scores) is more efficient as it uses a min-heap internally and runs in O(k) time for finding the 5 largest elements, rather than O(k log k) for a full sort.

3. Not Handling Sparse Student IDs

If student IDs are not consecutive (e.g., IDs are 1, 3, 100), iterating from 1 to max_id creates unnecessary iterations.

More Efficient Solution for Sparse IDs:

def highFive(self, items: List[List[int]]) -> List[List[int]]:
    student_scores = defaultdict(list)
  
    for student_id, score in items:
        student_scores[student_id].append(score)
  
    result = []
  
    # Iterate only through existing student IDs
    for student_id in sorted(student_scores.keys()):
        scores = student_scores[student_id]
        top_scores = nlargest(5, scores)
        num_scores = min(len(scores), 5)
        average = sum(top_scores) // num_scores
        result.append([student_id, average])
  
    return result

This approach only processes students that actually exist in the input, making it more efficient when student IDs are sparse.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

In a binary min heap, the minimum element can be found in:


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More