2080. Range Frequency Queries
Problem Description
This problem involves creating a data structure that can efficiently find the frequency, or the number of occurrences, of a particular value within a given subarray of an integer array. A subarray is a sequence of contiguous elements from the array, and the frequency of a value is how often it appears within that sequence.
To solve this, one must implement a class RangeFreqQuery
with two functionalities:
- A constructor that takes an integer array
arr
and prepares it for subsequent queries. - A
query
method that, given a left and right index (left
andright
) and a value (value
), returns how many timesvalue
appears betweenarr[left]
andarr[right]
, inclusive.
Since the array is not modified after the data structure is created, the problem statements suggest preprocessing the array to allow for fast frequency queries.
Intuition
The solution strategy is based on preprocessing the given array to make querying efficient. We make use of the fact that the array remains static after the data structure is constructed. Here's how the solution works:
-
Preprocessing: Instead of checking the frequency of a value each time the
query
is called, we build a hashmap (or dictionary in Python) upon the class initialization. This hashmap maps each unique value in the array to a list of indices where this value occurs. Because each value is appended in the order of appearance, each list of indices is sorted. -
Answering Queries: To find the frequency of a value in a particular range, we use binary search to quickly find the position of the
left
andright
boundaries within the preprocessed list of indices for that value. Specifically, we find:- The first index in the list that is greater than or equal to
left
. Ifleft
is not in the list, we find the smallest index that is greater thanleft
. - The last index in the list that is less than or equal to
right
. Ifright
is not in the list, we find the largest index that is less than or equal toright
.
The Python module
bisect
provides thebisect_right
function, which can be used to find these indices. - The first index in the list that is greater than or equal to
-
Calculating Frequency: The frequency is then the difference between the positions found in the list for
right
andleft
. Sincebisect_right
returns a position one past the last occurrence when searching forright
, subtracting these two positions directly gives the correct frequency.
So, the intuition lies in transforming the problem from repetitive frequency counting into a single preprocessing step followed by fast binary search queries.
Learn more about Segment Tree and Binary Search patterns.
Solution Approach
The implementation of the RangeFreqQuery
class involves two core components:
Initialization __init__(self, arr: List[int])
When an object of RangeFreqQuery
is created, the constructor takes an input array arr
and preprocesses it. The constructor uses a hashmap (self.mp
) to store the frequencies. It iterates over the array using enumerate
to access both the index i
and the value x
at that index. Then, it appends the index i
to the list corresponding to the value x
in the hashmap. This creates a mapping of each unique value to a sorted list of indices where it appears in arr
.
Here's a breakdown of the initialization steps:
self.mp = defaultdict(list)
for i, x in enumerate(arr):
self.mp[x].append(i)
Querying query(self, left: int, right: int, value: int) -> int
The query
method is where we find out the frequency of value
between the left
and right
index in the array. We perform two binary searches using bisect_right
from the bisect
module:
- Search for the insertion point for
left - 1
in the sorted list of indices forvalue
. This gives usl
, the number of occurrences ofvalue
before theleft
index. - Search for the insertion point for
right
in the same sorted list. This gives usr
, the number of occurrences ofvalue
up to and including theright
index.
The frequency is then the difference r - l
, which is the count of value
in the subarray starting at left
and ending at right
.
The following is the code for the query
method:
if value not in self.mp: return 0 arr = self.mp[value] l, r = bisect_right(arr, left - 1), bisect_right(arr, right) return r - l
Notice that we return 0
when the value
is not found in self.mp
, as it indicates the value
does not appear in the array. Otherwise, we calculate the frequency using the binary search results and return it.
In summary, the solution involves a combination of hashmap for preprocessing and binary search for efficient queries. By using these algorithms and data structures, the RangeFreqQuery
class allows us to quickly compute the frequency of a value over different subarrays following a single preprocessing pass.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's say we have the following array arr = [1, 1, 2, 3, 2, 2, 4, 1, 2]
, and we want to find out how often particular values appear within specified subarrays of arr
.
We initialize the RangeFreqQuery
object with arr
.
- During the initialization, the object creates a hashmap that will look like this:
Each unique value from{ 1: [0, 1, 7], 2: [2, 4, 5, 8], 3: [3], 4: [6] }
arr
is now mapped to a sorted list of indices where it appears.
Now, let's suppose we want to find out how often the value 2
appears between indices 2 and 6 (left = 2, right = 6
). We perform the following steps:
- Call the
query
method withleft = 2
,right = 6
, andvalue = 2
. - Look for the sorted list of indices for the value
2
in our hashmap, which is[2, 4, 5, 8]
. - Using the
bisect_right
function from thebisect
module, we find the insertion point forleft - 1
, which is1
in this case.bisect_right([2, 4, 5, 8], 1)
will return0
because1
would fit at the start of the list.- This means there are
0
occurrences of2
before the index '2'.
- Next, we do a similar search for
right
, which is6
.bisect_right([2, 4, 5, 8], 6)
will return3
since6
would fit between5
and8
.- There are
3
occurrences of2
up to and including index6
.
- The frequency of value
2
between indices2
and6
is therefore3 - 0 = 3
.
Putting it all together, if we wanted to illustrate this using the RangeFreqQuery
class and the querying process, we would write the following Python code:
from collections import defaultdict
from bisect import bisect_right
class RangeFreqQuery:
def __init__(self, arr):
self.mp = defaultdict(list)
for i, x in enumerate(arr):
self.mp[x].append(i)
def query(self, left, right, value):
if value not in self.mp:
return 0
arr = self.mp[value]
l, r = bisect_right(arr, left - 1), bisect_right(arr, right)
return r - l
# Initialize with the given array
rangeFreqQuery = RangeFreqQuery([1, 1, 2, 3, 2, 2, 4, 1, 2])
# Example query
print(rangeFreqQuery.query(2, 6, 2)) # Output: 3
In this example, the print statement would output 3
, as the value 2
occurs three times in the subarray of arr
from indices 2
to 6
(inclusive). This walk-through demonstrates the efficiency and effectiveness of the solution approach.
Solution Implementation
1from collections import defaultdict
2from bisect import bisect_left, bisect_right
3from typing import List
4
5class RangeFreqQuery:
6 def __init__(self, arr: List[int]):
7 # Dictionary to store the indices of each value in the array.
8 self.index_map = defaultdict(list)
9 # Iterate through the list and map each value to its indices.
10 for index, value in enumerate(arr):
11 self.index_map[value].append(index)
12
13 def query(self, left: int, right: int, value: int) -> int:
14 # Check if the value is not in the index map, returning 0 frequency.
15 if value not in self.index_map:
16 return 0
17
18 # Find the closest index at the left and right boundary using binary search.
19 indices = self.index_map[value]
20 # bisect_left to find start position inclusively, while bisect_right for the end position exclusively.
21 start_position_inclusive = bisect_left(indices, left)
22 end_position_exclusive = bisect_right(indices, right)
23
24 # Calculate the frequency by subtracting end and start position.
25 frequency = end_position_exclusive - start_position_inclusive
26 return frequency
27
28# Example of how to instantiate and use the RangeFreqQuery class:
29# obj = RangeFreqQuery(arr)
30# freq = obj.query(left, right, value)
31
1import java.util.HashMap;
2import java.util.List;
3import java.util.ArrayList;
4import java.util.Map;
5
6class RangeFreqQuery {
7 private Map<Integer, List<Integer>> indexMap = new HashMap<>();
8
9 // Constructor to initialize the data structure with the given array.
10 // It maps each number to its indices in the array.
11 public RangeFreqQuery(int[] arr) {
12 for (int i = 0; i < arr.length; i++) {
13 indexMap.computeIfAbsent(arr[i], k -> new ArrayList<>()).add(i);
14 }
15 }
16
17 // Queries the frequency of a value in the specified range.
18 // It uses binary search to find the frequency of the value.
19 public int query(int left, int right, int value) {
20 // If the value is not present in the array, return 0 frequency
21 if (!indexMap.containsKey(value)) {
22 return 0;
23 }
24
25 List<Integer> indices = indexMap.get(value);
26 // Find the starting index where the value could be inserted to maintain sorted order.
27 int startIndex = binarySearch(indices, left - 1);
28 // Find the ending index where the value could be inserted to maintain sorted order.
29 int endIndex = binarySearch(indices, right);
30
31 // The frequency is the difference between the start and end insertion points.
32 return endIndex - startIndex;
33 }
34
35 // Helper method to perform a binary search on a list of indices.
36 // Finds the first index where 'targetValue' could be inserted.
37 private int binarySearch(List<Integer> list, int targetValue) {
38 int left = 0;
39 int right = list.size();
40
41 while (left < right) {
42 int mid = (left + right) >> 1; // Equivalent to (left + right) / 2
43 if (list.get(mid) > targetValue) {
44 right = mid;
45 } else {
46 left = mid + 1;
47 }
48 }
49 return left;
50 }
51}
52
53// The class could be used as follows:
54// RangeFreqQuery obj = new RangeFreqQuery(arr);
55// int frequency = obj.query(left, right, value);
56
1#include <vector>
2#include <unordered_map>
3#include <algorithm> // includes std::upper_bound
4
5class RangeFreqQuery {
6public:
7 // HashMap that stores each distinct number and its indices in the input array
8 std::unordered_map<int, std::vector<int>> numToIndicesMap;
9
10 // Constructor that fills the HashMap for the given array
11 RangeFreqQuery(std::vector<int>& arr) {
12 for (int i = 0; i < arr.size(); ++i)
13 numToIndicesMap[arr[i]].push_back(i); // Appends the current index to the corresponding number's vector
14 }
15
16 // Method to return the frequency of 'value' within the range [left, right]
17 int query(int left, int right, int value) {
18 // If 'value' is not found in the map, return 0 as its frequency is 0
19 if (numToIndicesMap.find(value) == numToIndicesMap.end()) return 0;
20
21 // References the vector of indices for 'value'
22 auto& indices = numToIndicesMap[value];
23
24 // Find the position in the vector for the first index greater than 'left - 1' (upper_bound is exclusive)
25 auto leftIt = std::upper_bound(indices.begin(), indices.end(), left - 1);
26
27 // Find the position in the vector for the first index greater than 'right'
28 auto rightIt = std::upper_bound(indices.begin(), indices.end(), right);
29
30 // The frequency is the number of elements in the range [leftIt, rightIt)
31 return rightIt - leftIt;
32 }
33};
34
35// The RangeFreqQuery class can be used as follows:
36// RangeFreqQuery* obj = new RangeFreqQuery(arr);
37// int frequency = obj->query(left, right, value);
38
1// Import necessary features from array and algorithm modules (hypothetical imports)
2// In TypeScript/JavaScript, there's no standard library providing an upper_bound function like C++'s <algorithm>,
3// so we'll need to create it or use a third-party library that implements algorithmic functions.
4
5let numToIndicesMap: { [key: number]: number[] } = {}; // equivalent to unordered_map in C++
6
7// Function to 'construct' the numToIndicesMap with indices
8function rangeFreqQueryConstructor(arr: number[]): void {
9 arr.forEach((value, index) => {
10 if (!numToIndicesMap[value]) {
11 numToIndicesMap[value] = [];
12 }
13 numToIndicesMap[value].push(index); // Append the current index to the corresponding number's array
14 });
15}
16
17// Helper function to mimic the 'std::upper_bound'
18function upperBound(arr: number[], target: number): number {
19 let start = 0, end = arr.length;
20 while (start < end) {
21 let mid = Math.floor((start + end) / 2);
22 if (arr[mid] <= target) {
23 start = mid + 1;
24 } else {
25 end = mid;
26 }
27 }
28 return start; // end and start both will be pointing to the first element greater than target
29}
30
31// Function to return the frequency of 'value' within the range [left, right]
32function query(left: number, right: number, value: number): number {
33 if (!(value in numToIndicesMap)) {
34 return 0; // If 'value' is not found, its frequency is 0
35 }
36
37 let indices = numToIndicesMap[value]; // Get the array of indices for 'value'
38
39 // Find the position for the first index greater than 'left - 1' (simulating upper_bound exclusive nature)
40 let leftIndex = upperBound(indices, left - 1);
41
42 // Find the position for the first index greater than 'right'
43 let rightIndex = upperBound(indices, right);
44
45 // The frequency is the count of elements in the range [leftIndex, rightIndex)
46 return rightIndex - leftIndex;
47}
48
Time and Space Complexity
Time Complexity
__init__
constructor:
- The time complexity for initializing the
RangeFreqQuery
class isO(n)
, wheren
is the length of the input array. This is because the constructor iterates through alln
elements to populate theself.mp
dictionary, which maps each unique value to a list of its indices in the array.
query
method:
- The
query
method utilizes binary search twice through thebisect_right
function to find the range of positions of a particular value between indicesleft
andright
. Therefore, the time complexity for each query isO(log k)
, wherek
is the number of occurrences of the value being queried (k
can be at mostn
).
Space Complexity
- The
self.mp
dictionary stores lists of indices for each unique value in the array. In the worst case, the space complexity isO(n)
, if all values are unique and each value appears only once. This would mean that theself.mp
dictionary would haven
keys with a single index in each list.
In summary, the time complexity is O(n)
for the constructor and O(log k)
for each query, and the space complexity is up to O(n)
.
Learn more about how to find time and space complexity quickly using problem constraints.
How many ways can you arrange the three letters A, B and C?
Recommended Readings
Segment Tree Faster Range Queries For this article we want to introduce the idea of a Segment Tree Segment Trees allow us to quickly perform range queries as well as range updates Suppose we have an array and we want to know the sum of a particular range of numbers as well
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Want a Structured Path to Master System Design Too? Don’t Miss This!