1086. High Five


Problem Description

In this problem, we are given a list of students' scores in the form items, where every element items[i] = [ID_i, score_i] corresponds to a score from a student with ID_i. Our task is to compute the average of the top five scores for each student. We must then return an array of pairs result, where result[j] = [ID_j, topFiveAverage_j] signifies the student with ID_j and their average of top five scores. It is important to note that if a student has fewer than five scores, we calculate the average of the scores they have. The result array should be ordered in ascending order by student ID.

To calculate the average, we sum up the top five scores of each student and then perform integer division by 5. Integer division means that after dividing, if we have a decimal value, we drop the remainder and keep only the integer part.

Intuition

When tackling this problem, we can start by thinking about how we can pair students with their scores and how to efficiently compute the top five average. Here is a step-by-step approach to the problem:

  1. Organizing Scores by Students: We can use a dictionary to map each student's ID to a list of their scores. A defaultdict from Python's collections module is perfect for this because it allows us to append scores to each student's list without initialization.

  2. Finding Top Scores: We are only interested in the top five scores for each student. The nlargest function from Python's heapq module can help us to find the top five scores efficiently.

  3. Computing Averages: Once we have the top five scores, we can simply sum them up and then divide by 5 to get the average. This must be integer division as specified by the problem.

  4. Creating the Result List: We need the resulting list to be sorted by student ID. We start with the smallest ID and go up to the largest ID we encountered, checking if the ID has any scores mapped to it. If it has, we compute the average and add it to the result list in a [ID_i, topFiveAverage_i] format.

  5. Returning the Result: After processing all student IDs, we end up with a list of ID and average pairs, which we return as the final answer.

By keeping these steps in mind, we can write a Python program that is both efficient and easy to understand, which is what the given solution code is doing.

Solution Approach

The implementation of the solution follows the intuition and can be understood in multiple steps:

  1. Creating a Dictionary with Default Values: A defaultdict is created to store the scores for each student. The choice of defaultdict is crucial as it initializes a new list automatically if a key (student ID) is not found, avoiding key errors or the need for manual initialization.

    1d = defaultdict(list)
  2. Storing Scores by Student IDs: The solution iterates over each score in the items list, appending the score to the list associated with the student's ID in the dictionary. During this process, we keep track of the maximum student ID encountered to know the range we need to consider for our final result.

    1for i, x in items:
    2    d[i].append(x)
    3    m = max(m, i)
  3. Iterating Over Possible Student IDs: We iterate from 1 to the maximum student ID m + 1 to ensure we cover all possible IDs. We check for each ID if there are scores associated with it.

    1for i in range(1, m + 1):
    2    if xs := d[i]:
  4. Calculating the Top Five Average: For each student with scores, we use the nlargest function to fetch the top five scores efficiently (in O(n log 5) time where n is the number of scores per student). We then sum these scores and perform integer division by 5 to get the average.

    1avg = sum(nlargest(5, xs)) // 5
  5. Storing the Results: The calculated averages along with the student IDs are stored in the result list ans using the append method. This ensures that the results are stored in the ascending order of the student IDs as the loop already iterates in ascending order.

    1ans.append([i, avg])
  6. Returning the Sorted Results: Since our loop iteration ensures IDs are already in ascending order, we can directly return the ans list as our final sorted result.

    1return ans

The provided solution is efficient because it utilizes a dictionary to relate scores with students' IDs, and a heap-based nlargest method to retrieve the top scores. This method minimizes sorting overhead since it does not require sorting all the scores. Furthermore, by keeping a running maximum of encountered student IDs, the solution only looks at IDs that have scores without unnecessarily processing unused IDs, thereby optimizing the performance.

Example Walkthrough

Let's illustrate the solution approach with a small example. Assume our input items is:

1items = [[1, 91], [1, 92], [2, 93], [2, 97], [1, 60], [2, 77], [1, 65], [1, 87], [1, 100], [2, 100], [2, 76]]

This items list consists of scores for two students with IDs 1 and 2. Let's apply our solution approach step-by-step.

  1. Creating a Dictionary with Default Values:

    First, we create a defaultdict to store the scores indexed by student IDs.

    1from collections import defaultdict
    2d = defaultdict(list)
  2. Storing Scores by Student IDs:

    We go through each item in the list. We add each score to the list of scores for the corresponding student ID. Suppose the max_id variable keeps track of the maximum ID encountered so far.

    1max_id = 0
    2for i, x in items:
    3    d[i].append(x)
    4    max_id = max(max_id, i)
    5# Now, d = {1: [91, 92, 60, 65, 87, 100], 2: [93, 97, 77, 100, 76]}
  3. Iterating Over Possible Student IDs:

    Since we've seen that the maximum ID is 2, we will iterate over student IDs from 1 to 2.

    1for i in range(1, max_id + 1):
    2    if xs := d[i]:
  4. Calculating the Top Five Average:

    If a student has scores, we will select the top five scores, sum them, and divide by 5 to get the average.

    1from heapq import nlargest
    2ans = []
    3for i in range(1, max_id + 1):
    4    if xs := d[i]:
    5        avg = sum(nlargest(5, xs)) // 5
    6        # For student 1: nlargest(5, [91, 92, 60, 65, 87, 100]) -> [100, 92, 91, 87, 65], sum -> 435, average -> 435//5 = 87
    7        # For student 2: nlargest(5, [93, 97, 77, 100, 76]) -> [100, 97, 93, 77, 76], sum -> 443, average -> 443//5 = 88
    8        ans.append([i, avg])
  5. Storing the Results:

    We have calculated averages for both student IDs, so we add them to our answer list.

    1ans.append([i, avg])
    2# After iteration, ans = [[1, 87], [2, 88]]
  6. Returning the Sorted Results:

    Lastly, we return the list ans, which is already sorted by student IDs, as our final result.

    1return ans
    2# Result: [[1, 87], [2, 88]]

In this small example, student 1 had six scores and student 2 had five. After taking the top five scores for each, the averages of their top five scores are 87 and 88, respectively. Thus, the final result reflecting each student's ID and their calculated top-five average is:

1[[1, 87], [2, 88]]

Python Solution

1from collections import defaultdict
2from heapq import nlargest
3
4class Solution:
5    def highFive(self, items: List[List[int]]) -> List[List[int]]:
6        # Dictionary to store scores for each ID
7        scores_dict = defaultdict(list)
8        # Variable to keep track of the highest ID number seen
9        max_id = 0
10      
11        # Iterate through the items to populate the scores dictionary
12        for student_id, score in items:
13            # Append the score to the list of scores for the current student_id
14            scores_dict[student_id].append(score)
15            # Update max_id if the current student_id is greater
16            max_id = max(max_id, student_id)
17      
18        # List to store the result
19        result = []
20      
21        # Iterate through all possible IDs from 1 to max_id (inclusive)
22        for i in range(1, max_id + 1):
23            # Check if there are scores for the current ID
24            if scores := scores_dict[i]:
25                # Calculate the average of the top 5 scores
26                average = sum(nlargest(5, scores)) // 5
27                # Append [student_id, average_score] to the result list
28                result.append([i, average])
29      
30        # Return the final list of averages
31        return result
32

Java Solution

1class Solution {
2  
3    public int[][] highFive(int[][] items) {
4        // Initialize an array of lists to store scores for each student
5        List<Integer>[] scoresPerStudent = new List[1001];
6        Arrays.setAll(scoresPerStudent, k -> new ArrayList<>());
7      
8        // Populate the lists with scores. Each student is represented at index i, and score is x
9        for (var item : items) {
10            int studentId = item[0];
11            int score = item[1];
12            scoresPerStudent[studentId].add(score);
13        }
14      
15        // Sort the scores in descending order for each student
16        for (var scores : scoresPerStudent) {
17            scores.sort((a, b) -> b - a);
18        }
19      
20        // Prepare a list to hold the answer
21        List<int[]> averageTopFiveScores = new ArrayList<>();
22      
23        // Iterate through each student's list of scores
24        for (int i = 1; i <= 1000; ++i) {
25            var scores = scoresPerStudent[i];
26            // Ensure that the student has at least one score
27            if (!scores.isEmpty()) {
28                int sum = 0;
29                // Get the top five scores, or all scores if fewer than five, and calculate their sum
30                for (int j = 0; j < Math.min(5, scores.size()); ++j) {
31                    sum += scores.get(j);
32                }
33                // Compute and add the average of the top five scores for the student to the result list
34                averageTopFiveScores.add(new int[] {i, sum / Math.min(5, scores.size())});
35            }
36        }
37      
38        // Convert the list of average scores to a 2D array and return
39        return averageTopFiveScores.toArray(new int[0][]);
40    }
41}
42

C++ Solution

1class Solution {
2public:
3    // Function to calculate the average of the top five scores for each student
4    vector<vector<int>> highFive(vector<vector<int>>& items) {
5        vector<int> scores[1001]; // Array of vectors to store scores for each student
6      
7        // Iterate through all the score items
8        for (auto& item : items) {
9            int studentId = item[0]; // Extract the student ID
10            int score = item[1];     // Extract the score
11            scores[studentId].push_back(score); // Add score to the corresponding student's list
12        }
13      
14        vector<vector<int>> result; // Resultant vector to store the averages of top five scores
15      
16        // Process the scores for each student
17        for (int i = 1; i <= 1000; ++i) { // Assuming student IDs are in the range 1 to 1000
18            if (!scores[i].empty()) { // Ensure that the student has scores recorded
19                sort(scores[i].begin(), scores[i].end(), greater<int>()); // Sort scores in descending order
20              
21                int sumTopFiveScores = 0; // Variable to store the sum of the top five scores
22                for (int j = 0; j < 5; ++j) { // Calculate the sum of top 5 scores
23                    sumTopFiveScores += scores[i][j];
24                }
25                int averageTopFiveScores = sumTopFiveScores / 5; // Compute the average of top five scores
26              
27                result.push_back({i, averageTopFiveScores}); // Add the student ID and average to the result
28            }
29        }
30      
31        return result; // Return the final result
32    }
33};
34

Typescript Solution

1// This function calculates the average of the top 5 scores for each unique ID.
2function highFive(items: number[][]): number[][] {
3    // Initialize an array to store the scores grouped by ID.
4    const scoresById: number[][] = Array(1001)
5        .fill(0)
6        .map(() => Array());
7
8    // Loop through each item (id, score) and group the scores by their ID.
9    for (const [id, score] of items) {
10        scoresById[id].push(score);
11    }
12
13    const averages: number[][] = []; // This will hold the final result.
14
15    // Iterate through the potential range of IDs.
16    for (let i = 1; i <= 1000; ++i) {
17        // Check if the current ID has any scores.
18        if (scoresById[i].length > 0) {
19            // Sort the scores for the current ID in descending order.
20            scoresById[i].sort((a, b) => b - a);
21
22            // Calculate the sum of the top 5 scores.
23            const sumTopFive = scoresById[i].slice(0, 5).reduce((a, b) => a + b, 0);
24
25            // Calculate the average, floor it, and push the ID and average to the result array.
26            averages.push([i, Math.floor(sumTopFive / 5)]);
27        }
28    }
29
30    // Return the array containing ID and average of top 5 scores.
31    return averages;
32}
33

Time and Space Complexity

Time Complexity

The time complexity of the function is determined by several factors:

  1. The time complexity of iterating over items, which has n elements: O(n).
  2. Appending each item's score to the corresponding list in the dictionary: O(1) (amortized) for each insertion.
  3. The max function inside the loop, which has constant time O(1) because it's just comparing two integers.
  4. The final iteration over the range from 1 to m + 1, which is O(m) where m is the maximum id seen.
  5. For each student, the nlargest function is used to obtain the top 5 scores which have a complexity of O(k * log(m)) for each student (k is 5 here, and m is the number of scores for that student).
  6. Summing the scores and calculating the average has constant time complexity O(1).

Since we don't know the relationship between m and n exactly, we combine the terms to reflect the worst case. We can assume each student has at least 5 scores and at most n scores. Thus, the nlargest step ends up with a time complexity of O(n * log(n)) due to at most n calls to nlargest with a list of maximum length n.

Combining these, our overall time complexity is O(n + n * log(n)), which simplifies to O(n * log(n)).

Space Complexity

For space complexity, we have:

  1. A dictionary that holds up to m keys, with each key holding a list of scores. If every score is unique to a student, the space complexity is O(n).
  2. An output list ans that will hold m elements: O(m).

Combining these, we get O(n) + O(m). However, since m cannot exceed n (as there cannot be more student IDs than individual scores), the space complexity simplifies to O(n).


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫