1170. Compare Strings by Frequency of the Smallest Character
Problem Description
The problem presents a function f(s)
that calculates the frequency of the lexicographically smallest character within a non-empty string s
. For instance, if s = "dcce"
, then f(s)
equals 2
because the smallest character ('c') appears twice in the string.
Given two arrays, one containing strings words
and the other containing query strings queries
, the task is to determine how many words in words
have a higher f
value than the f
value of each query string in queries
. Specifically, for each query queries[i]
, you need to count the number of strings in words
where f(queries[i])
is less than f(W)
for each string W
in words
.
The expected output is an array answer
, where answer[i]
corresponds to the count for queries[i]
. Essentially, the output tells us for each query, the count of words in words
with a smaller lexicographically smallest character frequency.
Intuition
The intuition behind the solution involves first understanding what the function f
does in terms of string processing. It counts the number of times the smallest lexicographical character appears in a string.
Approaching the solution, we consider these steps:
-
Precompute the
f
values for all strings inwords
since we will need to compare these with each query'sf
value. This precomputation helps to avoid recalculating these values for each query, which would be inefficient. -
Sort the precomputed
f
values of thewords
array. With a sorted list off
values, we can efficiently determine where a particular value would be placed: all elements larger than this value would come after it in the sorted array. -
For each query, calculate its
f
value. Then, to find out how manyf
values inwords
are larger than that of the query, we use binary search to find the insertion point in the sorted list off
values ofwords
. This gives us the position where thef
value of the query would fit if it were to be inserted into the list, effectively giving us the count off
values that are greater because all values to the right of this point are larger. -
The
bisect_right
function from Python'sbisect
module is used to perform the binary search. It returns the insertion point (index) in the sortednums
list, which allows us to subtract this index from the total number of words to get the count of words with a largerf
value than the query. -
The final result for each query is calculated by taking the total number of words
n
and subtracting the insertion index. This count is added to the result list, which is returned at the end.
The process leverages binary search for efficiency, ensuring that an overall time complexity better than O(n^2) is achieved for the solution, where n is the length of the longest array between queries
and words
.
Learn more about Binary Search and Sorting patterns.
Solution Approach
The implementation of the solution to this problem uses Python's Counter
class from the collections
module, the ascii_lowercase
string from the string
module, and the bisect_right
function from the bisect
module. Here's how these are used step-by-step:
-
Define the function
f(s)
: The functionf(s)
is where the calculation of the smallest character's frequency is implemented. It uses theCounter
class to count the occurrences of each character in strings
.Counter(s)
returns a dictionary-like object where characters are keys, and their counts are values. -
Find the smallest character's frequency: Once we have the counts, we iterate over
ascii_lowercase
(a string containing all lowercase letters in alphabetical order) to find the smallest character present ins
. As soon as we find a characterc
that is ins
(i.e.,cnt[c]
is not zero), we return its countcnt[c]
. Usingascii_lowercase
ensures we check characters in lexicographical order. -
Precompute
f
forwords
: We create a list callednums
that holds thef
value for each word inwords
. This is done with a list comprehension[f(w) for w in words]
, calculatingf(w)
for eachw
inwords
. -
Sort
nums
:nums
is sorted so that binary search can be used to efficiently find positions within it. We use Pythonโs built-in sorted method for this purpose. -
Find each query's count: For each query
q
inqueries
, we calculatef(q)
and then find the index at whichf(q)
would be inserted while maintaining the order innums
. This is wherebisect_right
comes in. It returns the insertion point to the right of existing values off(q)
. This index tells us how many words inwords
have a largerf
value thanf(q)
. -
Prepare the result: Knowing the index from
bisect_right
, the number of words with a largerf
value is found by subtracting this index from the total number of words, which islen(words)
. The comprehension[n - bisect_right(nums, f(q)) for q in queries]
creates the result list by performing this calculation for each query. -
Return the result: The resultant list is returned, which contains the count for each query indicating how many words in
words
have a higherf
value than that of the query.
In conclusion, the solution makes use of a combination of string processing, precomputation, binary search, and efficient sorting to arrive at the result. It leverages the properties of sorted arrays and binary search (bisect_right
) to perform queries in logarithmic time, after an initial sort cost, allowing for an efficient and scalable approach to solve the given problem.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's go through an example to illustrate the solution approach described above. Suppose we have a list of words, words = ["a", "aa", "aaa", "aaaa"]
and a list of queries, queries = ["a", "aa", "aaa"]
. Our goal is to count how many words have a higher frequency of the lexicographically smallest character for each query. Let's walk through each step of the approach.
-
Define and calculate
f
for each word: We first define the functionf(s)
to calculate the frequency of the lexicographically smallest character in a given strings
. For thewords
list, since all words consist of the character 'a', thef
values are simply the lengths of the strings. Thus,f("a") = 1
,f("aa") = 2
, and so on. -
Precompute
f
values forwords
: We precompute thef
values for thewords
. This gives usnums = [1, 2, 3, 4]
. -
Sort the
nums
list: We sort the precomputedf
values ofwords
. Ournums
list is already sorted:nums = [1, 2, 3, 4]
. -
Find each query's count: Now, we need to find out for each query how many
f
values inwords
are larger than the query'sf
value. Forqueries[i] = "a"
,f("a")
is1
. Using binary search, we determine the position where1
would fit in the sortednums
list (which is index1
, right after the first occurrence of1
). Since there are a total of 4 words, and1
would be inserted at index1
, there are4 - 1 = 3
words with a largerf
value.Repeating this for
queries[1] = "aa"
withf("aa") = 2
, binary search tells us that2
would be inserted at index2
innums
, leaving4 - 2 = 2
words with a largerf
value.Lastly, for
queries[2] = "aaa"
,f("aaa") = 3
, binary search gives us an insertion index of3
, so there are4 - 3 = 1
word with a largerf
value. -
Prepare the result: The counts we determined in the previous step give us our final result array:
[3, 2, 1]
. -
Return the result: The result for our query is
[3, 2, 1]
, indicating for each query inqueries
, how many words inwords
have a higherf
value.
The example confirms our approach, showing how we can effectively calculate the counts required using precomputation, sorting, and binary search.
Solution Implementation
1from collections import Counter
2from bisect import bisect_right
3from typing import List
4
5class Solution:
6 def numSmallerByFrequency(self, queries: List[str], words: List[str]) -> List[int]:
7 # Helper method to calculate the frequency of the smallest character
8 def calculate_frequency(s: str) -> int:
9 # Counts the occurrence of each character
10 char_count = Counter(s)
11 # Iterates over lowercase ascii characters to find the smallest character and return its count
12 for c in 'abcdefghijklmnopqrstuvwxyz':
13 if c in char_count:
14 return char_count[c]
15 return 0 # In case the string is empty, though the problem assumes non-empty strings
16
17 # Calculate the frequency for each word and sort the results
18 word_frequencies = sorted(calculate_frequency(word) for word in words)
19 # List to hold the result for each query
20 results = []
21 # Calculate the number of words with higher frequency than each query
22 for query in queries:
23 query_frequency = calculate_frequency(query)
24 # Use binary search to find the number of words with higher frequency
25 num_of_higher_frequency_words = len(word_frequencies) - bisect_right(word_frequencies, query_frequency)
26 # Append the result to the results list
27 results.append(num_of_higher_frequency_words)
28 return results # Returns the final list of results
29
1class Solution {
2
3 // Function to count how many strings in 'words' have a frequency smaller than the frequency of each string in 'queries'
4 public int[] numSmallerByFrequency(String[] queries, String[] words) {
5 int wordsLength = words.length;
6 int[] frequencies = new int[wordsLength];
7
8 // Calculate frequency of smallest char for each word
9 for (int i = 0; i < wordsLength; ++i) {
10 frequencies[i] = smallestCharFrequency(words[i]);
11 }
12
13 // Sort the frequencies
14 Arrays.sort(frequencies);
15
16 // For each query, count how many words have a smaller frequency
17 int queriesLength = queries.length;
18 int[] answer = new int[queriesLength];
19 for (int i = 0; i < queriesLength; ++i) {
20 int queryFrequency = smallestCharFrequency(queries[i]);
21 // Binary search to find the number of words with a larger frequency
22 int left = 0;
23 int right = wordsLength;
24 while (left < right) {
25 int mid = (left + right) >> 1;
26 if (frequencies[mid] > queryFrequency) {
27 right = mid;
28 } else {
29 left = mid + 1;
30 }
31 }
32 // The result for this query is the number of frequencies greater than the query frequency
33 answer[i] = wordsLength - left;
34 }
35
36 return answer;
37 }
38
39 // Helper function to calculate the frequency of the smallest character in a string
40 private int smallestCharFrequency(String s) {
41 int[] charCounts = new int[26]; // There are 26 lowercase English letters
42 // Count the occurrences of each character in the string
43 for (int i = 0; i < s.length(); ++i) {
44 charCounts[s.charAt(i) - 'a']++;
45 }
46 // Find the smallest non-zero frequency
47 for (int count : charCounts) {
48 if (count > 0) {
49 return count;
50 }
51 }
52 return 0; // Return 0 if the string was empty (though this should not happen given the problem constraints)
53 }
54}
55
1#include <vector>
2#include <string>
3#include <algorithm> // Needed for sorting and upper_bound
4
5using std::vector;
6using std::string;
7using std::sort;
8using std::upper_bound;
9
10class Solution {
11public:
12 // This method will take a vector of query strings and a vector of word strings
13 // and will return a vector with the count of words that have a higher frequency
14 // of the smallest character than the frequency of the smallest character in each query string.
15 vector<int> numSmallerByFrequency(vector<string>& queries, vector<string>& words) {
16 // 'f' is a lambda function that calculates the frequency of the smallest character in a string
17 auto f = [](const string &s) {
18 int count[26] = {0}; // Initialize an array to store the count of each character from 'a' to 'z'
19
20 // Count how many times each character appears in the string
21 for (char c : s) {
22 count[c - 'a']++;
23 }
24
25 // Find and return count of the first non-zero frequency (smallest character)
26 for (int frequency : count) {
27 if (frequency) {
28 return frequency;
29 }
30 }
31 return 0;
32 };
33
34 int numWords = words.size(); // Cache the number of words for efficiency
35 int wordFrequencies[numWords]; // This array will hold the frequencies of the smallest char in each word
36
37 // Calculate the frequency of each word's smallest character
38 for (int i = 0; i < numWords; ++i) {
39 wordFrequencies[i] = f(words[i]);
40 }
41
42 // Sort the frequencies in non-decreasing order to perform efficient lookups later
43 sort(wordFrequencies, wordFrequencies + numWords);
44
45 // Prepare the answer vector to hold the result
46 vector<int> answer;
47
48 // Iterate the queries to find out the respective counts
49 for (const string &query : queries) {
50 int queryFrequency = f(query); // Get frequency of current query
51
52 // Find how many words have a greater frequency than the current query frequency
53 // The number of words with a greater frequency is the total number of words minus
54 // the number of words with a frequency less than or equal to the query frequency
55 int greaterFrequencyCount = numWords - (upper_bound(wordFrequencies, wordFrequencies + numWords, queryFrequency) - wordFrequencies);
56
57 // Add the count to the answer vector
58 answer.push_back(greaterFrequencyCount);
59 }
60
61 // Return the filled answer vector
62 return answer;
63 }
64};
65
1// Function to count the smallest character frequency in a given string.
2function frequencyOfSmallestChar(s: string): number {
3 const frequencyCounter = new Array(26).fill(0); // Initialize an array for alphabets.
4 // Count frequency of each character in the string.
5 for (const char of s) {
6 frequencyCounter[char.charCodeAt(0) - 'a'.charCodeAt(0)]++;
7 }
8 // Find and return the frequency of the smallest character.
9 return frequencyCounter.find(count => count > 0);
10}
11
12// The main function that applies the frequencyOfSmallestChar function on queries compared to words.
13function numSmallerByFrequency(queries: string[], words: string[]): number[] {
14 // Map the 'words' array into an array of smallest character frequencies
15 // and sort it in ascending order.
16 const sortedWordFrequencies = words.map(frequencyOfSmallestChar).sort((a, b) => a - b);
17 const answerArray: number[] = []; // Initialize an array to store the results.
18
19 // Process each query string.
20 for (const query of queries) {
21 // Find frequency of the smallest character in the current query string.
22 const queryFrequency = frequencyOfSmallestChar(query);
23 let leftIndex = 0,
24 rightIndex = sortedWordFrequencies.length;
25
26 // Perform a binary search to find the count of elements in 'sortedWordFrequencies'
27 // that are greater than 'queryFrequency'.
28 while (leftIndex < rightIndex) {
29 const midIndex = Math.floor((leftIndex + rightIndex) / 2);
30 if (sortedWordFrequencies[midIndex] > queryFrequency) {
31 rightIndex = midIndex;
32 } else {
33 leftIndex = midIndex + 1;
34 }
35 }
36
37 // Push the count (length of the array minus the final leftIndex) to the result array.
38 answerArray.push(sortedWordFrequencies.length - leftIndex);
39 }
40 return answerArray; // Return the final result array.
41}
42
Time and Space Complexity
Time Complexity
The time complexity of the provided code involves several components:
-
Calculating the frequency of the smallest character in each word using the
f
function. This operation involves creating a Counter for each word, which has a time complexity ofO(m)
, wherem
is the average string length. This is done for all words, so forn
words, this part would have a time complexity ofO(n * m)
. -
Sorting the frequencies of words. The sorted function has a time complexity of
O(n log n)
, wheren
is the length of thewords
list. -
For each query in
queries
, the function calculates the frequency of the smallest character and uses binary search (bisect_right
) to find the position in the sorted list of word frequencies. The frequency calculation isO(m)
, and binary search in a sorted list of sizen
isO(log n)
. Therefore, forq
queries, this part has a time complexity ofO(q * (m + log n))
.
Putting it all together, the total time complexity is O(n * m) + O(n log n) + O(q * (m + log n))
.
Space Complexity
The space complexity is the additional space used by the algorithm:
-
Storing the frequencies of all words, which is
O(n)
space. -
The sorted frequency list, which again takes
O(n)
space. -
The counter created for each word, which in the worst case takes
O(m)
space; however, since the counters are not stored and are used one at a time, this is not multiplied byn
. -
The final result list which will contain
q
elements, resulting inO(q)
space.
So, the total space complexity is O(n + q)
, since n
and q
space requirements dominate the O(m)
space needed for the counter for each word or query string.
Learn more about how to find time and space complexity quickly using problem constraints.
The three-steps of Depth First Search are:
- Identify states;
- Draw the state-space tree;
- DFS on the state-space tree.
Recommended Readings
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Patterns The Shortest Path Algorithm for Coding Interviews The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we can determine the
Got a question?ย Ask the Monster Assistantย anything you don't understand.
Still not clear? ย Submitย the part you don't understand to our editors. Or join ourย Discord and ask the community.