1772. Sort Features by Popularity

MediumArrayHash TableStringSorting
Leetcode Link

Problem Description

In the given problem, we have two lists: features and responses.

features is a list of single-word strings where each word represents a unique feature of a product.

responses is a list of strings where each string may contain several words expressing features liked by a user. These strings represent user feedback from a survey.

The goal is to sort the features array in order of decreasing popularity. Popularity of a feature is defined by the number of responses that mention the feature at least once. One thing to note is that even if a response mentions a feature multiple times, it still only counts once towards the feature's popularity.

Furthermore, if two features have the same popularity, our sorting should prioritize them based on the order in which they appear in the features list.

Finally, we need to return the features sorted by the rules described above.

Intuition

For the solution, we need to do the following:

  1. Count how many times each feature is mentioned in the responses list. To ensure that each feature is counted only once per response, we can create a set of words from each response. This will automatically remove duplicates.

  2. After we have the counts of each feature as per the responses, we need to sort the features list. The sorting is not regular alphabetical or numerical sorting; it is based on the popularity of the features.

Therefore, we need a custom sort that:

  • Sorts features in descending order of their popularity.
  • When features have the same popularity count, they should retain their original relative order as given in the input features array.
  1. To implement this, we use a sorting function with a custom key. In Python, this can be achieved via the sorted function with a lambda function as the key. The lambda function returns the negative count of how many times a feature occurs, thus enabling sorting in descending order based on popularity.

By following this approach, we can efficiently sort the features according to the problem's requirements.

Solution Approach

To implement the solution, the code utilizes a couple of Python-specific techniques and data structures. Here's a step-by-step walkthrough of the solution:

  1. Import the Counter class from the collections module. A Counter is a dictionary subclass designed for counting hashable objects. It's an unordered collection where elements are stored as dictionary keys and their counts as dictionary values.

  2. Initialize the Counter by the variable name cnt. It's going to be used to store the number of times each feature appears in the responses.

  3. Iterate through each response in the responses list using a for loop.

  4. Inside the loop, split the current response r using r.split(), converting it into a list of words. Then convert this list into a set ws to ensure that each word is unique and duplicate mentions of a feature in a single response are eliminated.

  5. Iterate through each word s in the set ws. If it's a word that represents a feature, it should be counted. Increment its count in our Counter by doing cnt[s] += 1.

  6. Now, we need to sort the features list by the count of each feature while preserving the order of features with the same popularity. To do this, we use the sorted function with a custom lambda function as the sorting key: lambda x: -cnt[x]. This lambda function returns the count from the Counter for each element x in features, negated so that the sorting is in descending order.

  7. The sorted function with the custom lambda function will sort the features by descending popularity, and in case of a tie, it will preserve the original order, as it's stable sort (it maintains the relative order of records with equal values).

Here is the final sorted list:

1sorted_features = sorted(features, key=lambda x: -cnt[x])

This Python code gives us the desired output where the features are sorted according to the specified rules.

Example Walkthrough

Let's walk through an example to illustrate the solution approach. Suppose we have the following features list and responses list:

1features = ["storage", "color", "battery", "screen"]
2responses = [
3    "I wish the battery life was longer",
4    "Loving the new screen and high storage capacity",
5    "The color is so vibrant and the screen is the perfect size",
6    "Battery life is great, screen is great, storage just what I needed",
7    "I wish the color options were more vibrant"
8]

Now, let's apply our solution approach step by step:

  1. Count Feature Mentions: First, we need to count how many times each feature is mentioned across all responses. According to the given approach, for each response, we convert it into a set of words so that duplicates within the same response are eliminated.

  2. Initialize Counter: We initialize a Counter object called cnt to keep the counts.

    1from collections import Counter
    2cnt = Counter()
  3. Process Responses: Let's loop through each response and update the Counter with words only if they are in our features list.

    1for r in responses:
    2   ws = set(r.split())
    3   for s in ws:
    4      if s in features:
    5         cnt[s] += 1

    After processing all responses, cnt might look like this:

    1cnt = Counter({'battery': 2, 'screen': 3, 'storage': 2, 'color': 2})

    Here, 'screen' was mentioned in 3 different responses, while 'battery', 'storage', and 'color' were mentioned in 2.

  4. Sort Features: Next, we sort the features by descending popularity using the sorted function with a lambda key that uses the Counter counts.

    1sorted_features = sorted(features, key=lambda x: -cnt[x])

    The sorted_features list will look like this:

    1['screen', 'battery', 'storage', 'color']

    The feature 'screen' comes first because it has the highest count. 'battery', 'storage', and 'color' all have the same count. However, 'battery' appears before 'storage' and 'color' in the original list, so it comes next. Likewise, 'storage' comes before 'color', which is why 'color' is last.

  5. Result: Finally, we have our features sorted by popularity while preserving the order in which they appear in the features list when they have the same popularity:

    1sorted_features = ["screen", "battery", "storage", "color"]

    This sorted list aligns with our problem's requirement of sorting the features by their popularity, ensuring that the ones with the same frequency of mentions maintain the order from the original features list.

Python Solution

1from typing import List
2from collections import Counter
3
4class Solution:
5    def sortFeatures(self, features: List[str], responses: List[str]) -> List[str]:
6        # Create a counter to keep track of the number of times each word appears in the responses
7        counter = Counter()
8      
9        # Iterate through each response in the list of responses
10        for response in responses:
11            # Split the response into words and convert it to a set to remove duplicates
12            unique_words = set(response.split())
13          
14            # Update the count for each unique word in the current response
15            for word in unique_words:
16                counter[word] += 1
17      
18        # Sort the list of features based on their counts in a descending order
19        # Features that do not appear in the counter will have a default count of 0
20        sorted_features = sorted(features, key=lambda feature: -counter[feature])
21      
22        # Return the list of features sorted by their popularity
23        return sorted_features
24
25# Example usage:
26# sol = Solution()
27# features = ["cooler", "lock", "touch"]
28# responses = ["I love the cooler cooler", "lock touch cool", "locker like touch"]
29# print(sol.sortFeatures(features, responses)) # Output: ["touch", "cooler", "lock"]
30

Java Solution

1class Solution {
2    public String[] sortFeatures(String[] features, String[] responses) {
3        // Create a map to count the occurrence of each word in the responses
4        Map<String, Integer> featureCounts = new HashMap<>();
5        for (String response : responses) {
6            // Create a set to store unique words in the current response
7            Set<String> uniqueWords = new HashSet<>();
8            // Split the response into words
9            for (String word : response.split(" ")) {
10                uniqueWords.add(word);
11            }
12            // Increment the count for each unique word
13            for (String word : uniqueWords) {
14                featureCounts.put(word, featureCounts.getOrDefault(word, 0) + 1);
15            }
16        }
17
18        int numFeatures = features.length;
19        // Initialize an array of indexes corresponding to the features array
20        Integer[] indexes = new Integer[numFeatures];
21        for (int i = 0; i < numFeatures; ++i) {
22            indexes[i] = i;
23        }
24
25        // Sort the indexes based on the feature occurrence counts and their original order
26        Arrays.sort(indexes, (index1, index2) -> {
27            int countDifference = featureCounts.getOrDefault(features[index2], 0) - 
28                                  featureCounts.getOrDefault(features[index1], 0);
29            // If counts are equal, sort by the original order
30            return countDifference == 0 ? index1 - index2 : countDifference;
31        });
32
33        // Create an array to hold the sorted features
34        String[] sortedFeatures = new String[numFeatures];
35        for (int i = 0; i < numFeatures; ++i) {
36            sortedFeatures[i] = features[indexes[i]];
37        }
38
39        // Return the sorted array of features
40        return sortedFeatures;
41    }
42}
43

C++ Solution

1#include <vector>
2#include <string>
3#include <unordered_map>
4#include <unordered_set>
5#include <sstream>
6#include <algorithm>
7#include <numeric>
8
9class Solution {
10public:
11    std::vector<std::string> sortFeatures(std::vector<std::string>& features, std::vector<std::string>& responses) {
12        // Map to store the count of each feature mentioned in the responses
13        std::unordered_map<std::string, int> featureCounts;
14      
15        // Iterate through all the responses
16        for (auto& response : responses) {
17            std::stringstream ss(response);
18            std::string word;
19            std::unordered_set<std::string> uniqueWords;
20          
21            // Extract each word from the response and add to a set to ensure uniqueness
22            while (ss >> word) {
23                uniqueWords.insert(word);
24            }
25          
26            // Increase the count for each unique word found in the responses that match a feature
27            for (auto& word : uniqueWords) {
28                featureCounts[word]++;
29            }
30        }
31      
32        int numFeatures = features.size();
33        std::vector<int> indices(numFeatures);
34        // Initialize the indices sequence
35        std::iota(indices.begin(), indices.end(), 0);
36      
37        // Sort the indices based on the counts of the features
38        std::sort(indices.begin(), indices.end(), [&](int index1, int index2) -> bool {
39            int countDifference = featureCounts[features[index1]] - featureCounts[features[index2]];
40            // If counts differ, sort in descending order; if counts are same, sort by index
41            return countDifference > 0 || (countDifference == 0 && index1 < index2);
42        });
43      
44        // Prepare the sorted list of features based on the sorted indices
45        std::vector<std::string> sortedFeatures(numFeatures);
46        for (int i = 0; i < numFeatures; ++i) {
47            sortedFeatures[i] = features[indices[i]];
48        }
49      
50        // Return the sorted list of features
51        return sortedFeatures;
52    }
53};
54

Typescript Solution

1// Required imports or type definitions (if any) would be placed here
2
3let featureCounts: { [feature: string]: number } = {};
4
5// Function to sort features based on their frequency in the given responses
6function sortFeatures(features: string[], responses: string[]): string[] {
7    featureCounts = {}; // Resetting the feature counts for a fresh start
8
9    // Iterate through all the responses to count feature occurrences
10    responses.forEach(response => {
11        let uniqueWords: Set<string> = new Set<string>(); // To ensure uniqueness of words in a response
12        response.split(/\s+/).forEach(word => {
13            uniqueWords.add(word); // Add each word to the set
14        });
15
16        // Update the count for each unique word that matches a feature
17        uniqueWords.forEach(word => {
18            if (features.includes(word)) {
19                if (!featureCounts[word]) featureCounts[word] = 0;
20                featureCounts[word]++;
21            }
22        });
23    });
24
25    // Initialize indices for sorting
26    let indices: number[] = features.map((_, index) => index);
27
28    // Sort indices based on feature occurrence count and their original order
29    indices.sort((index1, index2) => {
30        let count1 = featureCounts[features[index1]] || 0; // Fallback to 0 if not found
31        let count2 = featureCounts[features[index2]] || 0;
32        return count2 - count1 || index1 - index2; // Descending by count, then by original index
33    });
34
35    // Prepare the sorted list of features
36    let sortedFeatures: string[] = indices.map(index => features[index]);
37
38    // Return the list of sorted features
39    return sortedFeatures;
40}
41
42// Example use:
43// const sorted = sortFeatures(["feature1", "feature2"], ["I love feature2", "feature1 is great"]);
44// console.log(sorted); // Logs feature names sorted by their frequency and original order
45

Time and Space Complexity

Time Complexity

The time complexity of the given code snippet is O(N + U + FlogF), where:

  • N is the total number of words across all responses.
  • U is the number of unique words across all responses.
  • F is the number of features.

Here's how the time complexity breaks down:

  1. Splitting each response and adding words to the set has a complexity of O(N), where N is the total number of words. This includes the cost of splitting the response into words and processing each word.
  2. Counting occurrences of each unique word involves a constant-time operation per unique word, resulting in a complexity of O(U).
  3. Sorting the features list is O(FlogF) because we are sorting F features using a comparison sort where features are compared based on their counts.

Space Complexity

The space complexity of the given code snippet is O(U + F), where:

  • U is the number of unique words across all responses. This is because the counter holds counts for each unique word.
  • F is the number of features, which accounts for the space to store the features list.

We have to store the counts for all unique words and the final sorted features list, contributing to the space complexity.


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫