1086. High Five 🔒
Problem Description
You are given a list of student scores where each entry contains a student ID and a score. Multiple scores can exist for the same student. Your task is to calculate each student's "top five average" score.
The top five average for a student is calculated by:
- Taking their 5 highest scores
- Summing these 5 scores
- Dividing by 5 using integer division (discarding any remainder)
The input items
is a list where each element is [IDáµ¢, scoreáµ¢]
representing a score for a student with that ID.
You need to return a list of pairs [IDâ±¼, topFiveAverageâ±¼]
where:
- Each pair represents a student ID and their calculated top five average
- The results must be sorted by student ID in increasing order
- Only students who have at least one score should appear in the output
For example, if a student with ID 1 has scores [90, 85, 95, 88, 92, 87, 91], their top 5 scores would be [95, 92, 91, 90, 88], giving a sum of 456, and their top five average would be 456 // 5 = 91
.
Intuition
The key insight is that we need to group all scores by student ID first, since the input list contains scores scattered across different students. Once we have all scores for each student grouped together, we can easily find their top 5 scores.
We can think of this problem in two phases:
-
Collecting phase: As we iterate through the input, we gather all scores for each student. A dictionary (or hashmap) is perfect for this - we use the student ID as the key and maintain a list of all their scores as the value.
-
Processing phase: For each student, we need their 5 highest scores. Instead of sorting all their scores (which would take
O(n log n)
time), we can use a more efficient approach withnlargest(5, scores)
which finds the 5 largest elements inO(n log 5) = O(n)
time using a heap internally.
The reason we track the maximum student ID during the collecting phase is to ensure we process students in order from ID 1 to the maximum ID. This guarantees our output is sorted by student ID as required.
For each student ID from 1 to max, we check if they have any scores. If they do, we:
- Extract their top 5 scores using
nlargest(5, xs)
- Sum these scores and perform integer division by 5
- Add the
[ID, average]
pair to our result
This approach is efficient because we only sort/process the scores we need (top 5) rather than sorting all scores for each student.
Learn more about Sorting and Heap (Priority Queue) patterns.
Solution Approach
The implementation uses a dictionary-based approach to group and process student scores:
Step 1: Initialize Data Structures
- Create a
defaultdict(list)
to store scores for each student ID. This allows us to append scores without checking if the key exists. - Initialize a variable
m
to track the maximum student ID encountered.
Step 2: Group Scores by Student ID
for i, x in items:
d[i].append(x)
m = max(m, i)
- Iterate through each
[student_id, score]
pair in the input - Append each score to the list associated with that student's ID in the dictionary
- Update the maximum student ID seen
Step 3: Calculate Top Five Averages
for i in range(1, m + 1):
if xs := d[i]:
avg = sum(nlargest(5, xs)) // 5
ans.append([i, avg])
- Iterate through all possible student IDs from 1 to the maximum ID found
- For each ID, check if the student has any scores using the walrus operator
:=
- If scores exist:
- Use
nlargest(5, xs)
to efficiently find the 5 highest scores. This function uses a min-heap internally and runs inO(n)
time where n is the number of scores for that student - Sum these top 5 scores
- Perform integer division by 5 to get the average
- Append
[student_id, average]
to the result list
- Use
Key Implementation Details:
- The
nlargest
function from theheapq
module is crucial here - it's more efficient than sorting all scores when we only need the top 5 - Integer division
//
ensures we get the floor of the average as required - By iterating from 1 to max ID, we automatically ensure the output is sorted by student ID
- The walrus operator
:=
allows us to both assign and check if a student has scores in a single expression
Time Complexity: O(n + m*k)
where n is the total number of scores, m is the number of unique students, and k is the average number of scores per student.
Space Complexity: O(n)
to store all scores in the dictionary.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a concrete example to illustrate the solution approach.
Input: items = [[1, 90], [1, 85], [2, 88], [1, 95], [2, 92], [1, 88], [2, 87], [1, 92], [2, 91], [1, 87], [2, 85]]
Step 1: Group scores by student ID
We iterate through each [student_id, score]
pair and build our dictionary:
Processing [1, 90]: d = {1: [90]}, m = 1 Processing [1, 85]: d = {1: [90, 85]}, m = 1 Processing [2, 88]: d = {1: [90, 85], 2: [88]}, m = 2 Processing [1, 95]: d = {1: [90, 85, 95], 2: [88]}, m = 2 Processing [2, 92]: d = {1: [90, 85, 95], 2: [88, 92]}, m = 2 Processing [1, 88]: d = {1: [90, 85, 95, 88], 2: [88, 92]}, m = 2 Processing [2, 87]: d = {1: [90, 85, 95, 88], 2: [88, 92, 87]}, m = 2 Processing [1, 92]: d = {1: [90, 85, 95, 88, 92], 2: [88, 92, 87]}, m = 2 Processing [2, 91]: d = {1: [90, 85, 95, 88, 92], 2: [88, 92, 87, 91]}, m = 2 Processing [1, 87]: d = {1: [90, 85, 95, 88, 92, 87], 2: [88, 92, 87, 91]}, m = 2 Processing [2, 85]: d = {1: [90, 85, 95, 88, 92, 87], 2: [88, 92, 87, 91, 85]}, m = 2
After grouping:
- Student 1 has scores:
[90, 85, 95, 88, 92, 87]
- Student 2 has scores:
[88, 92, 87, 91, 85]
- Maximum student ID:
m = 2
Step 2: Calculate top five averages for each student
We iterate from ID 1 to ID 2:
For Student ID 1:
- Scores:
[90, 85, 95, 88, 92, 87]
- Apply
nlargest(5, scores)
→[95, 92, 90, 88, 87]
- Sum:
95 + 92 + 90 + 88 + 87 = 452
- Average:
452 // 5 = 90
- Add to result:
[[1, 90]]
For Student ID 2:
- Scores:
[88, 92, 87, 91, 85]
- Apply
nlargest(5, scores)
→[92, 91, 88, 87, 85]
(all 5 scores) - Sum:
92 + 91 + 88 + 87 + 85 = 443
- Average:
443 // 5 = 88
- Add to result:
[[1, 90], [2, 88]]
Final Output: [[1, 90], [2, 88]]
The result is automatically sorted by student ID since we processed IDs in increasing order from 1 to 2.
Solution Implementation
1from collections import defaultdict
2from heapq import nlargest
3from typing import List
4
5class Solution:
6 def highFive(self, items: List[List[int]]) -> List[List[int]]:
7 # Dictionary to store scores for each student ID
8 student_scores = defaultdict(list)
9 max_student_id = 0
10
11 # Group scores by student ID and track the maximum ID
12 for student_id, score in items:
13 student_scores[student_id].append(score)
14 max_student_id = max(max_student_id, student_id)
15
16 # List to store the final results
17 result = []
18
19 # Process each student from ID 1 to max_student_id
20 for student_id in range(1, max_student_id + 1):
21 # Check if this student has any scores
22 scores = student_scores[student_id]
23 if scores:
24 # Calculate average of top 5 scores (integer division)
25 top_five_scores = nlargest(5, scores)
26 average = sum(top_five_scores) // 5
27 result.append([student_id, average])
28
29 return result
30
1class Solution {
2 public int[][] highFive(int[][] items) {
3 // Array to store min-heaps for each student ID (IDs range from 1 to 100)
4 PriorityQueue<Integer>[] studentScores = new PriorityQueue[101];
5 int uniqueStudents = 0;
6 final int TOP_SCORES_COUNT = 5;
7
8 // Process each score entry
9 for (int[] item : items) {
10 int studentId = item[0];
11 int score = item[1];
12
13 // Initialize min-heap for new student
14 if (studentScores[studentId] == null) {
15 uniqueStudents++;
16 studentScores[studentId] = new PriorityQueue<>(TOP_SCORES_COUNT);
17 }
18
19 // Add score to student's min-heap
20 studentScores[studentId].offer(score);
21
22 // Keep only top 5 scores by removing the smallest when size exceeds 5
23 if (studentScores[studentId].size() > TOP_SCORES_COUNT) {
24 studentScores[studentId].poll();
25 }
26 }
27
28 // Create result array with size equal to number of unique students
29 int[][] result = new int[uniqueStudents][2];
30 int resultIndex = 0;
31
32 // Calculate average of top 5 scores for each student
33 for (int studentId = 0; studentId < 101; studentId++) {
34 if (studentScores[studentId] == null) {
35 continue;
36 }
37
38 // Calculate average of the top 5 scores
39 int average = calculateSum(studentScores[studentId]) / TOP_SCORES_COUNT;
40
41 // Store student ID and average in result
42 result[resultIndex][0] = studentId;
43 result[resultIndex][1] = average;
44 resultIndex++;
45 }
46
47 return result;
48 }
49
50 /**
51 * Calculates the sum of all elements in the priority queue
52 * Note: This method empties the queue
53 *
54 * @param queue The priority queue containing integers
55 * @return The sum of all elements
56 */
57 private int calculateSum(PriorityQueue<Integer> queue) {
58 int sum = 0;
59 while (!queue.isEmpty()) {
60 sum += queue.poll();
61 }
62 return sum;
63 }
64}
65
1class Solution {
2public:
3 vector<vector<int>> highFive(vector<vector<int>>& items) {
4 // Array to store scores for each student ID (1-indexed, max ID is 1000)
5 vector<int> studentScores[1001];
6
7 // Group scores by student ID
8 for (auto& item : items) {
9 int studentId = item[0];
10 int score = item[1];
11 studentScores[studentId].push_back(score);
12 }
13
14 vector<vector<int>> result;
15
16 // Process each student's scores
17 for (int studentId = 1; studentId <= 1000; ++studentId) {
18 if (!studentScores[studentId].empty()) {
19 // Sort scores in descending order to get top 5
20 sort(studentScores[studentId].begin(),
21 studentScores[studentId].end(),
22 greater<int>());
23
24 // Calculate sum of top 5 scores
25 int sumOfTopFive = 0;
26 for (int j = 0; j < 5; ++j) {
27 sumOfTopFive += studentScores[studentId][j];
28 }
29
30 // Calculate average and add to result
31 int average = sumOfTopFive / 5;
32 result.push_back({studentId, average});
33 }
34 }
35
36 return result;
37 }
38};
39
1/**
2 * Calculates the average of the top 5 scores for each student ID.
3 * @param items - Array of [studentId, score] pairs
4 * @returns Array of [studentId, averageOfTop5Scores] pairs sorted by studentId
5 */
6function highFive(items: number[][]): number[][] {
7 // Initialize array to store scores for each student ID (1-1000)
8 const scoresById: number[][] = Array(1001)
9 .fill(0)
10 .map(() => Array<number>());
11
12 // Group scores by student ID
13 for (const [studentId, score] of items) {
14 scoresById[studentId].push(score);
15 }
16
17 // Calculate average of top 5 scores for each student
18 const result: number[][] = [];
19
20 for (let studentId = 1; studentId <= 1000; studentId++) {
21 if (scoresById[studentId].length > 0) {
22 // Sort scores in descending order
23 scoresById[studentId].sort((a, b) => b - a);
24
25 // Take top 5 scores and calculate their sum
26 const top5Scores = scoresById[studentId].slice(0, 5);
27 const sumOfTop5 = top5Scores.reduce((sum, score) => sum + score, 0);
28
29 // Calculate average (integer division)
30 const averageOfTop5 = Math.floor(sumOfTop5 / 5);
31
32 // Add student result to output
33 result.push([studentId, averageOfTop5]);
34 }
35 }
36
37 return result;
38}
39
Time and Space Complexity
Time Complexity: O(n + m * k log k)
where n
is the number of items, m
is the number of unique students, and k
is the number of scores per student.
- Iterating through all items to build the dictionary:
O(n)
- For each unique student (m students), we use
nlargest(5, xs)
which has complexityO(k log 5)
=O(k)
for finding the 5 largest elements from k scores using a min-heap approach - Since we iterate through all student IDs from 1 to m:
O(m * k)
- Overall:
O(n + m * k)
In the worst case where each student has many scores, this could be simplified to O(n log n)
if we consider k could be proportional to n/m.
Space Complexity: O(n)
- The dictionary
d
stores all n scores across all students:O(n)
- The
ans
list stores at most m student-average pairs:O(m)
- The
nlargest
function uses internal space ofO(5)
=O(1)
for maintaining a heap of size 5 - Since m ≤ n (number of students cannot exceed number of items), the overall space complexity is
O(n)
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Assuming Students Have At Least 5 Scores
One critical pitfall in this problem is assuming every student has at least 5 scores. The problem statement doesn't guarantee this, and the code could fail or produce incorrect results if a student has fewer than 5 scores.
The Issue:
When using nlargest(5, scores)
, if a student has fewer than 5 scores (say only 3), the function will return all available scores. The subsequent division by 5 would then be incorrect.
Example of the Problem:
# If student 1 has only 3 scores: [90, 85, 88]
top_five_scores = nlargest(5, [90, 85, 88]) # Returns [90, 88, 85]
average = sum([90, 88, 85]) // 5 # 263 // 5 = 52 (incorrect!)
# The correct average should be 263 // 3 = 87
Solution:
def highFive(self, items: List[List[int]]) -> List[List[int]]:
student_scores = defaultdict(list)
max_student_id = 0
for student_id, score in items:
student_scores[student_id].append(score)
max_student_id = max(max_student_id, student_id)
result = []
for student_id in range(1, max_student_id + 1):
scores = student_scores[student_id]
if scores:
# Handle cases where student has fewer than 5 scores
top_scores = nlargest(5, scores)
num_scores = min(len(scores), 5)
average = sum(top_scores) // num_scores
result.append([student_id, average])
return result
2. Inefficient Sorting Approach
Another common mistake is sorting all scores for each student when you only need the top 5.
The Issue:
# Inefficient approach
for student_id, scores_list in student_scores.items():
scores_list.sort(reverse=True) # O(k log k) where k is number of scores
average = sum(scores_list[:5]) // 5
This sorts all scores even if a student has hundreds of scores, when we only need the top 5.
Better Approach:
Using nlargest(5, scores)
is more efficient as it uses a min-heap internally and runs in O(k) time for finding the 5 largest elements, rather than O(k log k) for a full sort.
3. Not Handling Sparse Student IDs
If student IDs are not consecutive (e.g., IDs are 1, 3, 100), iterating from 1 to max_id creates unnecessary iterations.
More Efficient Solution for Sparse IDs:
def highFive(self, items: List[List[int]]) -> List[List[int]]:
student_scores = defaultdict(list)
for student_id, score in items:
student_scores[student_id].append(score)
result = []
# Iterate only through existing student IDs
for student_id in sorted(student_scores.keys()):
scores = student_scores[student_id]
top_scores = nlargest(5, scores)
num_scores = min(len(scores), 5)
average = sum(top_scores) // num_scores
result.append([student_id, average])
return result
This approach only processes students that actually exist in the input, making it more efficient when student IDs are sparse.
In a binary min heap, the minimum element can be found in:
Recommended Readings
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
https assets algo monster cover_photos heap svg Priority Queue and Heap What is the relationship between priority queue and heap Priority Queue is an Abstract Data Type and Heap is the concrete data structure we use to implement a priority queue Priority Queue A priority queue is a data structure
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Don’t Miss This!