1170. Compare Strings by Frequency of the Smallest Character
Problem Description
You need to understand a special function f(s)
that counts how many times the alphabetically smallest character appears in a string s
. For example, in the string "dcce"
, the smallest character is 'c'
and it appears 2 times, so f("dcce") = 2
.
Given two arrays:
words
: an array of stringsqueries
: an array of query strings
For each query string in queries
, you need to count how many strings in words
have a higher f
value than the query string's f
value.
In other words, for each queries[i]
, count how many words W
in words
satisfy the condition: f(queries[i]) < f(W)
.
Return an array answer
where answer[i]
contains the count for the i-th
query.
Example breakdown:
- If
queries[0] = "abc"
withf("abc") = 1
(character'a'
appears once) - And
words = ["aa", "aaa", "aaaa"]
withf
values[2, 3, 4]
respectively - All three words have
f
values greater than 1, soanswer[0] = 3
The solution uses sorting and binary search to efficiently find how many words have f
values greater than each query's f
value.
Intuition
The key observation is that we need to repeatedly answer "how many values are greater than X?" for different X values. This is a classic pattern that suggests preprocessing and binary search.
If we process each query naively by comparing it against all words every time, we'd have O(queries.length * words.length)
comparisons, which could be inefficient for large inputs.
Instead, we can precompute all f
values for the words array once and sort them. Why sorting? Because once sorted, all values greater than a certain threshold will be grouped together at the end of the array. This allows us to use binary search to quickly find where this group starts.
For example, if we have sorted f
values [1, 2, 2, 3, 5, 7]
and we want to count how many are greater than 2
, we can use binary search to find that values greater than 2
start at index 3
. Therefore, there are 6 - 3 = 3
values greater than 2
.
The function bisect_right
finds the rightmost position where we can insert a value while maintaining sorted order. This gives us the index where values greater than our query start. Subtracting this index from the total length gives us the count of greater values.
This preprocessing approach reduces our per-query time from O(words.length)
to O(log(words.length))
, making the solution much more efficient when handling multiple queries.
Learn more about Binary Search and Sorting patterns.
Solution Approach
The solution implements the approach using sorting and binary search as outlined:
Step 1: Implement the function f(s)
The function uses Python's Counter
to count character frequencies and iterates through lowercase letters in alphabetical order to find the first character that appears in the string:
def f(s: str) -> int:
cnt = Counter(s)
return next(cnt[c] for c in ascii_lowercase if cnt[c])
This guarantees we get the frequency of the lexicographically smallest character.
Step 2: Preprocess the words array
Calculate f(w)
for each word w
in words
and sort these values:
n = len(words)
nums = sorted(f(w) for w in words)
After sorting, nums
contains all f
values in ascending order, making it suitable for binary search.
Step 3: Process each query
For each query q
, we:
- Calculate
f(q)
- Use
bisect_right(nums, f(q))
to find the insertion point forf(q)
in the sorted array - This insertion point tells us how many values are less than or equal to
f(q)
- Subtract from
n
to get the count of values greater thanf(q)
return [n - bisect_right(nums, f(q)) for q in queries]
Time Complexity Analysis:
- Computing
f
for all words:O(m * k)
wherem
is the number of words andk
is the average word length - Sorting:
O(m * log(m))
- For each query:
O(k + log(m))
wherek
is for computingf(q)
andlog(m)
is for binary search - Total:
O(m * k + m * log(m) + p * (k + log(m)))
wherep
is the number of queries
Space Complexity: O(m)
for storing the sorted nums
array.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a concrete example to illustrate the solution approach.
Given:
words = ["bbb", "cc", "a", "aa"]
queries = ["aaa", "bb"]
Step 1: Calculate f values for words
For each word, we find the lexicographically smallest character and count its occurrences:
f("bbb")
: smallest char is 'b', appears 3 times →f = 3
f("cc")
: smallest char is 'c', appears 2 times →f = 2
f("a")
: smallest char is 'a', appears 1 time →f = 1
f("aa")
: smallest char is 'a', appears 2 times →f = 2
So our f values for words are: [3, 2, 1, 2]
Step 2: Sort the f values
After sorting: nums = [1, 2, 2, 3]
This sorted array enables efficient binary search to find how many values exceed any given threshold.
Step 3: Process each query
Query 1: "aaa"
- Calculate
f("aaa")
: smallest char is 'a', appears 3 times →f = 3
- Use binary search (
bisect_right
) to find where 3 would be inserted in[1, 2, 2, 3]
bisect_right([1, 2, 2, 3], 3) = 4
(would insert after the existing 3)- Count of values greater than 3:
4 - 4 = 0
- No words have f value greater than 3
Query 2: "bb"
- Calculate
f("bb")
: smallest char is 'b', appears 2 times →f = 2
- Use binary search to find where 2 would be inserted in
[1, 2, 2, 3]
bisect_right([1, 2, 2, 3], 2) = 3
(would insert after both 2's)- Count of values greater than 2:
4 - 3 = 1
- One word ("bbb" with f=3) has f value greater than 2
Final Answer: [0, 1]
The key insight is that bisect_right
tells us exactly how many elements are less than or equal to our target value. By subtracting this from the total count, we get the number of elements strictly greater than our target - which is exactly what we need for each query.
Solution Implementation
1from typing import List
2from collections import Counter
3from string import ascii_lowercase
4from bisect import bisect_right
5
6class Solution:
7 def numSmallerByFrequency(self, queries: List[str], words: List[str]) -> List[int]:
8 def calculate_smallest_char_frequency(s: str) -> int:
9 """
10 Calculate the frequency of the lexicographically smallest character in string s.
11
12 Args:
13 s: Input string
14
15 Returns:
16 Frequency count of the smallest character
17 """
18 # Count frequency of each character in the string
19 char_counter = Counter(s)
20
21 # Find the first character in alphabetical order that exists in the string
22 # and return its frequency
23 for char in ascii_lowercase:
24 if char_counter[char] > 0:
25 return char_counter[char]
26
27 return 0 # Should never reach here for non-empty strings
28
29 # Get the total number of words
30 word_count = len(words)
31
32 # Calculate f(w) for each word and sort the results
33 # This allows us to use binary search later
34 word_frequencies = sorted(calculate_smallest_char_frequency(word) for word in words)
35
36 # For each query, find how many words have f(w) > f(query)
37 # Using binary search to find the rightmost position where f(query) can be inserted
38 # word_count - position gives us the count of elements greater than f(query)
39 result = []
40 for query in queries:
41 query_frequency = calculate_smallest_char_frequency(query)
42 position = bisect_right(word_frequencies, query_frequency)
43 result.append(word_count - position)
44
45 return result
46
1class Solution {
2 /**
3 * For each query, count how many words have f(word) > f(query)
4 * where f(s) returns the frequency of the smallest character in string s
5 *
6 * @param queries Array of query strings
7 * @param words Array of word strings to compare against
8 * @return Array where result[i] is the count of words with f(word) > f(queries[i])
9 */
10 public int[] numSmallerByFrequency(String[] queries, String[] words) {
11 int wordCount = words.length;
12
13 // Calculate f(word) for each word and store in array
14 int[] wordFrequencies = new int[wordCount];
15 for (int i = 0; i < wordCount; i++) {
16 wordFrequencies[i] = f(words[i]);
17 }
18
19 // Sort word frequencies for binary search
20 Arrays.sort(wordFrequencies);
21
22 int queryCount = queries.length;
23 int[] result = new int[queryCount];
24
25 // For each query, find how many words have greater frequency
26 for (int i = 0; i < queryCount; i++) {
27 int queryFrequency = f(queries[i]);
28
29 // Binary search to find first position where wordFrequencies[mid] > queryFrequency
30 int left = 0;
31 int right = wordCount;
32
33 while (left < right) {
34 int mid = (left + right) >> 1; // Equivalent to (left + right) / 2
35
36 if (wordFrequencies[mid] > queryFrequency) {
37 // Found a word with greater frequency, search left half
38 right = mid;
39 } else {
40 // Current word frequency <= query frequency, search right half
41 left = mid + 1;
42 }
43 }
44
45 // All elements from index 'left' to end have frequency > queryFrequency
46 result[i] = wordCount - left;
47 }
48
49 return result;
50 }
51
52 /**
53 * Calculate the frequency of the lexicographically smallest character in a string
54 *
55 * @param s Input string
56 * @return Frequency of the smallest character
57 */
58 private int f(String s) {
59 // Count frequency of each character
60 int[] charCount = new int[26];
61
62 for (int i = 0; i < s.length(); i++) {
63 charCount[s.charAt(i) - 'a']++;
64 }
65
66 // Return frequency of the first (smallest) character that appears
67 for (int frequency : charCount) {
68 if (frequency > 0) {
69 return frequency;
70 }
71 }
72
73 // Should never reach here for non-empty strings
74 return 0;
75 }
76}
77
1class Solution {
2public:
3 vector<int> numSmallerByFrequency(vector<string>& queries, vector<string>& words) {
4 // Lambda function to calculate frequency of smallest character in a string
5 auto calculateSmallestCharFrequency = [](const string& str) -> int {
6 // Count frequency of each character
7 int charCount[26] = {0};
8 for (char ch : str) {
9 charCount[ch - 'a']++;
10 }
11
12 // Return frequency of the first (smallest) character that appears
13 for (int frequency : charCount) {
14 if (frequency > 0) {
15 return frequency;
16 }
17 }
18 return 0;
19 };
20
21 // Calculate f(word) for each word and store in array
22 int wordCount = words.size();
23 vector<int> wordFrequencies(wordCount);
24 for (int i = 0; i < wordCount; i++) {
25 wordFrequencies[i] = calculateSmallestCharFrequency(words[i]);
26 }
27
28 // Sort word frequencies for binary search
29 sort(wordFrequencies.begin(), wordFrequencies.end());
30
31 // Process each query
32 vector<int> result;
33 for (const auto& query : queries) {
34 // Calculate f(query)
35 int queryFrequency = calculateSmallestCharFrequency(query);
36
37 // Find number of words with f(word) > f(query) using binary search
38 auto position = upper_bound(wordFrequencies.begin(), wordFrequencies.end(), queryFrequency);
39 int countGreater = wordCount - (position - wordFrequencies.begin());
40
41 result.push_back(countGreater);
42 }
43
44 return result;
45 }
46};
47
1/**
2 * Counts how many words have a higher frequency of their smallest character
3 * than each query's smallest character frequency
4 * @param queries - Array of query strings to compare
5 * @param words - Array of word strings to compare against
6 * @returns Array where each element represents the count for corresponding query
7 */
8function numSmallerByFrequency(queries: string[], words: string[]): number[] {
9 /**
10 * Calculates the frequency of the lexicographically smallest character in a string
11 * @param str - Input string to analyze
12 * @returns Frequency count of the smallest character
13 */
14 const getSmallestCharFrequency = (str: string): number => {
15 // Initialize frequency array for all 26 lowercase letters
16 const charFrequencies = new Array(26).fill(0);
17
18 // Count frequency of each character
19 for (const char of str) {
20 const charIndex = char.charCodeAt(0) - 'a'.charCodeAt(0);
21 charFrequencies[charIndex]++;
22 }
23
24 // Find and return the first non-zero frequency (smallest character's frequency)
25 return charFrequencies.find(frequency => frequency > 0);
26 };
27
28 // Calculate frequencies for all words and sort in ascending order
29 const wordFrequencies = words.map(getSmallestCharFrequency).sort((a, b) => a - b);
30
31 // Result array to store counts for each query
32 const result: number[] = [];
33
34 // Process each query
35 for (const query of queries) {
36 // Get frequency of smallest character in current query
37 const queryFrequency = getSmallestCharFrequency(query);
38
39 // Binary search to find the first word frequency greater than query frequency
40 let left = 0;
41 let right = wordFrequencies.length;
42
43 while (left < right) {
44 const mid = (left + right) >> 1; // Equivalent to Math.floor((left + right) / 2)
45
46 if (wordFrequencies[mid] > queryFrequency) {
47 // Mid element is greater, search in left half
48 right = mid;
49 } else {
50 // Mid element is less than or equal, search in right half
51 left = mid + 1;
52 }
53 }
54
55 // Count of words with frequency greater than query frequency
56 result.push(wordFrequencies.length - left);
57 }
58
59 return result;
60}
61
Time and Space Complexity
Time Complexity: O((n + q) × M + n × log n)
The time complexity consists of several parts:
- Computing
f(w)
for all words inwords
: Each call tof()
takesO(M)
time whereM
is the maximum string length (creating Counter and iterating through it). This is donen
times, resulting inO(n × M)
. - Sorting the
nums
array:O(n × log n)
wheren
is the length ofwords
. - Computing
f(q)
for all queries and performing binary search: For each of theq
queries, we computef(q)
inO(M)
time and performbisect_right
inO(log n)
time. This gives usO(q × M + q × log n)
. - Total:
O(n × M + n × log n + q × M + q × log n)
=O((n + q) × M + n × log n + q × log n)
Since typically M < log n
for reasonable inputs, the dominant term is O((n + q) × M + n × log n)
. However, the reference answer simplifies this to O((n + q) × M)
, which suggests either the sorting cost is considered negligible compared to string processing, or there's an assumption that n × log n
is bounded by n × M
.
Space Complexity: O(n)
The space complexity includes:
nums
array storingn
frequency values:O(n)
- Temporary Counter objects created in function
f()
:O(26)
=O(1)
(only lowercase letters) - Result list:
O(q)
(not counted in auxiliary space as it's the output) - Total auxiliary space:
O(n)
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Incorrectly Finding the Lexicographically Smallest Character
A common mistake is using min(s)
to find the smallest character, which works but then counting its frequency separately can be inefficient or error-prone:
Incorrect approach:
def f(s: str) -> int:
smallest = min(s)
return s.count(smallest)
While this works, iterating through the string multiple times (once for min
, once for count
) is less efficient than necessary.
Better approach:
def f(s: str) -> int:
cnt = Counter(s)
for c in ascii_lowercase:
if cnt[c] > 0:
return cnt[c]
return 0
2. Using bisect_left
Instead of bisect_right
This is a critical error that leads to incorrect counts. The difference matters when the query's f-value equals some word's f-value:
Incorrect:
position = bisect_left(word_frequencies, query_frequency) return word_count - position
Why it's wrong: bisect_left
finds the leftmost position to insert, so if query_frequency = 2
and word_frequencies = [1, 2, 2, 3]
, bisect_left
returns 1, giving count = 4 - 1 = 3. But we want values strictly greater than 2, which should be 1 (only the value 3).
Correct:
position = bisect_right(word_frequencies, query_frequency) return word_count - position
With bisect_right
, we get position 3, giving count = 4 - 3 = 1, which is correct.
3. Not Sorting the Word Frequencies
Forgetting to sort the frequencies before using binary search will produce incorrect results:
Incorrect:
word_frequencies = [f(word) for word in words] # Missing sort! # Binary search on unsorted array gives wrong results
Correct:
word_frequencies = sorted(f(word) for word in words)
4. Recalculating f-values Multiple Times
Computing the f-value for words inside the query loop is inefficient:
Inefficient:
result = [] for query in queries: count = 0 for word in words: if f(query) < f(word): # Recalculating f(word) every time count += 1 result.append(count)
Efficient: Pre-compute and store all f-values once, then reuse them for all queries.
Which data structure is used in a depth first search?
Recommended Readings
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Don’t Miss This!