Facebook Pixel

1170. Compare Strings by Frequency of the Smallest Character

Problem Description

You need to understand a special function f(s) that counts how many times the alphabetically smallest character appears in a string s. For example, in the string "dcce", the smallest character is 'c' and it appears 2 times, so f("dcce") = 2.

Given two arrays:

  • words: an array of strings
  • queries: an array of query strings

For each query string in queries, you need to count how many strings in words have a higher f value than the query string's f value.

In other words, for each queries[i], count how many words W in words satisfy the condition: f(queries[i]) < f(W).

Return an array answer where answer[i] contains the count for the i-th query.

Example breakdown:

  • If queries[0] = "abc" with f("abc") = 1 (character 'a' appears once)
  • And words = ["aa", "aaa", "aaaa"] with f values [2, 3, 4] respectively
  • All three words have f values greater than 1, so answer[0] = 3

The solution uses sorting and binary search to efficiently find how many words have f values greater than each query's f value.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key observation is that we need to repeatedly answer "how many values are greater than X?" for different X values. This is a classic pattern that suggests preprocessing and binary search.

If we process each query naively by comparing it against all words every time, we'd have O(queries.length * words.length) comparisons, which could be inefficient for large inputs.

Instead, we can precompute all f values for the words array once and sort them. Why sorting? Because once sorted, all values greater than a certain threshold will be grouped together at the end of the array. This allows us to use binary search to quickly find where this group starts.

For example, if we have sorted f values [1, 2, 2, 3, 5, 7] and we want to count how many are greater than 2, we can use binary search to find that values greater than 2 start at index 3. Therefore, there are 6 - 3 = 3 values greater than 2.

The function bisect_right finds the rightmost position where we can insert a value while maintaining sorted order. This gives us the index where values greater than our query start. Subtracting this index from the total length gives us the count of greater values.

This preprocessing approach reduces our per-query time from O(words.length) to O(log(words.length)), making the solution much more efficient when handling multiple queries.

Learn more about Binary Search and Sorting patterns.

Solution Approach

The solution implements the approach using sorting and binary search as outlined:

Step 1: Implement the function f(s)

The function uses Python's Counter to count character frequencies and iterates through lowercase letters in alphabetical order to find the first character that appears in the string:

def f(s: str) -> int:
    cnt = Counter(s)
    return next(cnt[c] for c in ascii_lowercase if cnt[c])

This guarantees we get the frequency of the lexicographically smallest character.

Step 2: Preprocess the words array

Calculate f(w) for each word w in words and sort these values:

n = len(words)
nums = sorted(f(w) for w in words)

After sorting, nums contains all f values in ascending order, making it suitable for binary search.

Step 3: Process each query

For each query q, we:

  1. Calculate f(q)
  2. Use bisect_right(nums, f(q)) to find the insertion point for f(q) in the sorted array
  3. This insertion point tells us how many values are less than or equal to f(q)
  4. Subtract from n to get the count of values greater than f(q)
return [n - bisect_right(nums, f(q)) for q in queries]

Time Complexity Analysis:

  • Computing f for all words: O(m * k) where m is the number of words and k is the average word length
  • Sorting: O(m * log(m))
  • For each query: O(k + log(m)) where k is for computing f(q) and log(m) is for binary search
  • Total: O(m * k + m * log(m) + p * (k + log(m))) where p is the number of queries

Space Complexity: O(m) for storing the sorted nums array.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to illustrate the solution approach.

Given:

  • words = ["bbb", "cc", "a", "aa"]
  • queries = ["aaa", "bb"]

Step 1: Calculate f values for words

For each word, we find the lexicographically smallest character and count its occurrences:

  • f("bbb"): smallest char is 'b', appears 3 times → f = 3
  • f("cc"): smallest char is 'c', appears 2 times → f = 2
  • f("a"): smallest char is 'a', appears 1 time → f = 1
  • f("aa"): smallest char is 'a', appears 2 times → f = 2

So our f values for words are: [3, 2, 1, 2]

Step 2: Sort the f values

After sorting: nums = [1, 2, 2, 3]

This sorted array enables efficient binary search to find how many values exceed any given threshold.

Step 3: Process each query

Query 1: "aaa"

  • Calculate f("aaa"): smallest char is 'a', appears 3 times → f = 3
  • Use binary search (bisect_right) to find where 3 would be inserted in [1, 2, 2, 3]
  • bisect_right([1, 2, 2, 3], 3) = 4 (would insert after the existing 3)
  • Count of values greater than 3: 4 - 4 = 0
  • No words have f value greater than 3

Query 2: "bb"

  • Calculate f("bb"): smallest char is 'b', appears 2 times → f = 2
  • Use binary search to find where 2 would be inserted in [1, 2, 2, 3]
  • bisect_right([1, 2, 2, 3], 2) = 3 (would insert after both 2's)
  • Count of values greater than 2: 4 - 3 = 1
  • One word ("bbb" with f=3) has f value greater than 2

Final Answer: [0, 1]

The key insight is that bisect_right tells us exactly how many elements are less than or equal to our target value. By subtracting this from the total count, we get the number of elements strictly greater than our target - which is exactly what we need for each query.

Solution Implementation

1from typing import List
2from collections import Counter
3from string import ascii_lowercase
4from bisect import bisect_right
5
6class Solution:
7    def numSmallerByFrequency(self, queries: List[str], words: List[str]) -> List[int]:
8        def calculate_smallest_char_frequency(s: str) -> int:
9            """
10            Calculate the frequency of the lexicographically smallest character in string s.
11          
12            Args:
13                s: Input string
14          
15            Returns:
16                Frequency count of the smallest character
17            """
18            # Count frequency of each character in the string
19            char_counter = Counter(s)
20          
21            # Find the first character in alphabetical order that exists in the string
22            # and return its frequency
23            for char in ascii_lowercase:
24                if char_counter[char] > 0:
25                    return char_counter[char]
26          
27            return 0  # Should never reach here for non-empty strings
28      
29        # Get the total number of words
30        word_count = len(words)
31      
32        # Calculate f(w) for each word and sort the results
33        # This allows us to use binary search later
34        word_frequencies = sorted(calculate_smallest_char_frequency(word) for word in words)
35      
36        # For each query, find how many words have f(w) > f(query)
37        # Using binary search to find the rightmost position where f(query) can be inserted
38        # word_count - position gives us the count of elements greater than f(query)
39        result = []
40        for query in queries:
41            query_frequency = calculate_smallest_char_frequency(query)
42            position = bisect_right(word_frequencies, query_frequency)
43            result.append(word_count - position)
44      
45        return result
46
1class Solution {
2    /**
3     * For each query, count how many words have f(word) > f(query)
4     * where f(s) returns the frequency of the smallest character in string s
5     * 
6     * @param queries Array of query strings
7     * @param words Array of word strings to compare against
8     * @return Array where result[i] is the count of words with f(word) > f(queries[i])
9     */
10    public int[] numSmallerByFrequency(String[] queries, String[] words) {
11        int wordCount = words.length;
12      
13        // Calculate f(word) for each word and store in array
14        int[] wordFrequencies = new int[wordCount];
15        for (int i = 0; i < wordCount; i++) {
16            wordFrequencies[i] = f(words[i]);
17        }
18      
19        // Sort word frequencies for binary search
20        Arrays.sort(wordFrequencies);
21      
22        int queryCount = queries.length;
23        int[] result = new int[queryCount];
24      
25        // For each query, find how many words have greater frequency
26        for (int i = 0; i < queryCount; i++) {
27            int queryFrequency = f(queries[i]);
28          
29            // Binary search to find first position where wordFrequencies[mid] > queryFrequency
30            int left = 0;
31            int right = wordCount;
32          
33            while (left < right) {
34                int mid = (left + right) >> 1;  // Equivalent to (left + right) / 2
35              
36                if (wordFrequencies[mid] > queryFrequency) {
37                    // Found a word with greater frequency, search left half
38                    right = mid;
39                } else {
40                    // Current word frequency <= query frequency, search right half
41                    left = mid + 1;
42                }
43            }
44          
45            // All elements from index 'left' to end have frequency > queryFrequency
46            result[i] = wordCount - left;
47        }
48      
49        return result;
50    }
51  
52    /**
53     * Calculate the frequency of the lexicographically smallest character in a string
54     * 
55     * @param s Input string
56     * @return Frequency of the smallest character
57     */
58    private int f(String s) {
59        // Count frequency of each character
60        int[] charCount = new int[26];
61      
62        for (int i = 0; i < s.length(); i++) {
63            charCount[s.charAt(i) - 'a']++;
64        }
65      
66        // Return frequency of the first (smallest) character that appears
67        for (int frequency : charCount) {
68            if (frequency > 0) {
69                return frequency;
70            }
71        }
72      
73        // Should never reach here for non-empty strings
74        return 0;
75    }
76}
77
1class Solution {
2public:
3    vector<int> numSmallerByFrequency(vector<string>& queries, vector<string>& words) {
4        // Lambda function to calculate frequency of smallest character in a string
5        auto calculateSmallestCharFrequency = [](const string& str) -> int {
6            // Count frequency of each character
7            int charCount[26] = {0};
8            for (char ch : str) {
9                charCount[ch - 'a']++;
10            }
11          
12            // Return frequency of the first (smallest) character that appears
13            for (int frequency : charCount) {
14                if (frequency > 0) {
15                    return frequency;
16                }
17            }
18            return 0;
19        };
20      
21        // Calculate f(word) for each word and store in array
22        int wordCount = words.size();
23        vector<int> wordFrequencies(wordCount);
24        for (int i = 0; i < wordCount; i++) {
25            wordFrequencies[i] = calculateSmallestCharFrequency(words[i]);
26        }
27      
28        // Sort word frequencies for binary search
29        sort(wordFrequencies.begin(), wordFrequencies.end());
30      
31        // Process each query
32        vector<int> result;
33        for (const auto& query : queries) {
34            // Calculate f(query)
35            int queryFrequency = calculateSmallestCharFrequency(query);
36          
37            // Find number of words with f(word) > f(query) using binary search
38            auto position = upper_bound(wordFrequencies.begin(), wordFrequencies.end(), queryFrequency);
39            int countGreater = wordCount - (position - wordFrequencies.begin());
40          
41            result.push_back(countGreater);
42        }
43      
44        return result;
45    }
46};
47
1/**
2 * Counts how many words have a higher frequency of their smallest character
3 * than each query's smallest character frequency
4 * @param queries - Array of query strings to compare
5 * @param words - Array of word strings to compare against
6 * @returns Array where each element represents the count for corresponding query
7 */
8function numSmallerByFrequency(queries: string[], words: string[]): number[] {
9    /**
10     * Calculates the frequency of the lexicographically smallest character in a string
11     * @param str - Input string to analyze
12     * @returns Frequency count of the smallest character
13     */
14    const getSmallestCharFrequency = (str: string): number => {
15        // Initialize frequency array for all 26 lowercase letters
16        const charFrequencies = new Array(26).fill(0);
17      
18        // Count frequency of each character
19        for (const char of str) {
20            const charIndex = char.charCodeAt(0) - 'a'.charCodeAt(0);
21            charFrequencies[charIndex]++;
22        }
23      
24        // Find and return the first non-zero frequency (smallest character's frequency)
25        return charFrequencies.find(frequency => frequency > 0);
26    };
27  
28    // Calculate frequencies for all words and sort in ascending order
29    const wordFrequencies = words.map(getSmallestCharFrequency).sort((a, b) => a - b);
30  
31    // Result array to store counts for each query
32    const result: number[] = [];
33  
34    // Process each query
35    for (const query of queries) {
36        // Get frequency of smallest character in current query
37        const queryFrequency = getSmallestCharFrequency(query);
38      
39        // Binary search to find the first word frequency greater than query frequency
40        let left = 0;
41        let right = wordFrequencies.length;
42      
43        while (left < right) {
44            const mid = (left + right) >> 1; // Equivalent to Math.floor((left + right) / 2)
45          
46            if (wordFrequencies[mid] > queryFrequency) {
47                // Mid element is greater, search in left half
48                right = mid;
49            } else {
50                // Mid element is less than or equal, search in right half
51                left = mid + 1;
52            }
53        }
54      
55        // Count of words with frequency greater than query frequency
56        result.push(wordFrequencies.length - left);
57    }
58  
59    return result;
60}
61

Time and Space Complexity

Time Complexity: O((n + q) × M + n × log n)

The time complexity consists of several parts:

  • Computing f(w) for all words in words: Each call to f() takes O(M) time where M is the maximum string length (creating Counter and iterating through it). This is done n times, resulting in O(n × M).
  • Sorting the nums array: O(n × log n) where n is the length of words.
  • Computing f(q) for all queries and performing binary search: For each of the q queries, we compute f(q) in O(M) time and perform bisect_right in O(log n) time. This gives us O(q × M + q × log n).
  • Total: O(n × M + n × log n + q × M + q × log n) = O((n + q) × M + n × log n + q × log n)

Since typically M < log n for reasonable inputs, the dominant term is O((n + q) × M + n × log n). However, the reference answer simplifies this to O((n + q) × M), which suggests either the sorting cost is considered negligible compared to string processing, or there's an assumption that n × log n is bounded by n × M.

Space Complexity: O(n)

The space complexity includes:

  • nums array storing n frequency values: O(n)
  • Temporary Counter objects created in function f(): O(26) = O(1) (only lowercase letters)
  • Result list: O(q) (not counted in auxiliary space as it's the output)
  • Total auxiliary space: O(n)

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrectly Finding the Lexicographically Smallest Character

A common mistake is using min(s) to find the smallest character, which works but then counting its frequency separately can be inefficient or error-prone:

Incorrect approach:

def f(s: str) -> int:
    smallest = min(s)
    return s.count(smallest)

While this works, iterating through the string multiple times (once for min, once for count) is less efficient than necessary.

Better approach:

def f(s: str) -> int:
    cnt = Counter(s)
    for c in ascii_lowercase:
        if cnt[c] > 0:
            return cnt[c]
    return 0

2. Using bisect_left Instead of bisect_right

This is a critical error that leads to incorrect counts. The difference matters when the query's f-value equals some word's f-value:

Incorrect:

position = bisect_left(word_frequencies, query_frequency)
return word_count - position

Why it's wrong: bisect_left finds the leftmost position to insert, so if query_frequency = 2 and word_frequencies = [1, 2, 2, 3], bisect_left returns 1, giving count = 4 - 1 = 3. But we want values strictly greater than 2, which should be 1 (only the value 3).

Correct:

position = bisect_right(word_frequencies, query_frequency)
return word_count - position

With bisect_right, we get position 3, giving count = 4 - 3 = 1, which is correct.

3. Not Sorting the Word Frequencies

Forgetting to sort the frequencies before using binary search will produce incorrect results:

Incorrect:

word_frequencies = [f(word) for word in words]  # Missing sort!
# Binary search on unsorted array gives wrong results

Correct:

word_frequencies = sorted(f(word) for word in words)

4. Recalculating f-values Multiple Times

Computing the f-value for words inside the query loop is inefficient:

Inefficient:

result = []
for query in queries:
    count = 0
    for word in words:
        if f(query) < f(word):  # Recalculating f(word) every time
            count += 1
    result.append(count)

Efficient: Pre-compute and store all f-values once, then reuse them for all queries.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which data structure is used in a depth first search?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More