2080. Range Frequency Queries


Problem Description

This problem involves creating a data structure that can efficiently find the frequency, or the number of occurrences, of a particular value within a given subarray of an integer array. A subarray is a sequence of contiguous elements from the array, and the frequency of a value is how often it appears within that sequence.

To solve this, one must implement a class RangeFreqQuery with two functionalities:

  1. A constructor that takes an integer array arr and prepares it for subsequent queries.
  2. A query method that, given a left and right index (left and right) and a value (value), returns how many times value appears between arr[left] and arr[right], inclusive.

Since the array is not modified after the data structure is created, the problem statements suggest preprocessing the array to allow for fast frequency queries.

Intuition

The solution strategy is based on preprocessing the given array to make querying efficient. We make use of the fact that the array remains static after the data structure is constructed. Here's how the solution works:

  1. Preprocessing: Instead of checking the frequency of a value each time the query is called, we build a hashmap (or dictionary in Python) upon the class initialization. This hashmap maps each unique value in the array to a list of indices where this value occurs. Because each value is appended in the order of appearance, each list of indices is sorted.

  2. Answering Queries: To find the frequency of a value in a particular range, we use binary search to quickly find the position of the left and right boundaries within the preprocessed list of indices for that value. Specifically, we find:

    • The first index in the list that is greater than or equal to left. If left is not in the list, we find the smallest index that is greater than left.
    • The last index in the list that is less than or equal to right. If right is not in the list, we find the largest index that is less than or equal to right.

    The Python module bisect provides the bisect_right function, which can be used to find these indices.

  3. Calculating Frequency: The frequency is then the difference between the positions found in the list for right and left. Since bisect_right returns a position one past the last occurrence when searching for right, subtracting these two positions directly gives the correct frequency.

So, the intuition lies in transforming the problem from repetitive frequency counting into a single preprocessing step followed by fast binary search queries.

Learn more about Segment Tree and Binary Search patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Which two pointer techniques do you use to check if a string is a palindrome?

Solution Approach

The implementation of the RangeFreqQuery class involves two core components:

Initialization __init__(self, arr: List[int])

When an object of RangeFreqQuery is created, the constructor takes an input array arr and preprocesses it. The constructor uses a hashmap (self.mp) to store the frequencies. It iterates over the array using enumerate to access both the index i and the value x at that index. Then, it appends the index i to the list corresponding to the value x in the hashmap. This creates a mapping of each unique value to a sorted list of indices where it appears in arr.

Here's a breakdown of the initialization steps:

1self.mp = defaultdict(list)
2for i, x in enumerate(arr):
3    self.mp[x].append(i)

Querying query(self, left: int, right: int, value: int) -> int

The query method is where we find out the frequency of value between the left and right index in the array. We perform two binary searches using bisect_right from the bisect module:

  1. Search for the insertion point for left - 1 in the sorted list of indices for value. This gives us l, the number of occurrences of value before the left index.
  2. Search for the insertion point for right in the same sorted list. This gives us r, the number of occurrences of value up to and including the right index.

The frequency is then the difference r - l, which is the count of value in the subarray starting at left and ending at right.

The following is the code for the query method:

1if value not in self.mp:
2    return 0
3arr = self.mp[value]
4l, r = bisect_right(arr, left - 1), bisect_right(arr, right)
5return r - l

Notice that we return 0 when the value is not found in self.mp, as it indicates the value does not appear in the array. Otherwise, we calculate the frequency using the binary search results and return it.

In summary, the solution involves a combination of hashmap for preprocessing and binary search for efficient queries. By using these algorithms and data structures, the RangeFreqQuery class allows us to quickly compute the frequency of a value over different subarrays following a single preprocessing pass.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which of the following is equvalent to O(3*2^n + n^3 + n!+ log n)?

Example Walkthrough

Let's say we have the following array arr = [1, 1, 2, 3, 2, 2, 4, 1, 2], and we want to find out how often particular values appear within specified subarrays of arr.

We initialize the RangeFreqQuery object with arr.

  • During the initialization, the object creates a hashmap that will look like this:
    1{
    2  1: [0, 1, 7],
    3  2: [2, 4, 5, 8],
    4  3: [3],
    5  4: [6]
    6}
    Each unique value from arr is now mapped to a sorted list of indices where it appears.

Now, let's suppose we want to find out how often the value 2 appears between indices 2 and 6 (left = 2, right = 6). We perform the following steps:

  1. Call the query method with left = 2, right = 6, and value = 2.
  2. Look for the sorted list of indices for the value 2 in our hashmap, which is [2, 4, 5, 8].
  3. Using the bisect_right function from the bisect module, we find the insertion point for left - 1, which is 1 in this case.
    • bisect_right([2, 4, 5, 8], 1) will return 0 because 1 would fit at the start of the list.
    • This means there are 0 occurrences of 2 before the index '2'.
  4. Next, we do a similar search for right, which is 6.
    • bisect_right([2, 4, 5, 8], 6) will return 3 since 6 would fit between 5 and 8.
    • There are 3 occurrences of 2 up to and including index 6.
  5. The frequency of value 2 between indices 2 and 6 is therefore 3 - 0 = 3.

Putting it all together, if we wanted to illustrate this using the RangeFreqQuery class and the querying process, we would write the following Python code:

1from collections import defaultdict
2from bisect import bisect_right
3
4class RangeFreqQuery:
5    def __init__(self, arr):
6        self.mp = defaultdict(list)
7        for i, x in enumerate(arr):
8            self.mp[x].append(i)
9
10    def query(self, left, right, value):
11        if value not in self.mp:
12            return 0
13        arr = self.mp[value]
14        l, r = bisect_right(arr, left - 1), bisect_right(arr, right)
15        return r - l
16
17# Initialize with the given array
18rangeFreqQuery = RangeFreqQuery([1, 1, 2, 3, 2, 2, 4, 1, 2])
19
20# Example query
21print(rangeFreqQuery.query(2, 6, 2))  # Output: 3

In this example, the print statement would output 3, as the value 2 occurs three times in the subarray of arr from indices 2 to 6 (inclusive). This walk-through demonstrates the efficiency and effectiveness of the solution approach.

Solution Implementation

1from collections import defaultdict
2from bisect import bisect_left, bisect_right
3from typing import List
4
5class RangeFreqQuery:
6    def __init__(self, arr: List[int]):
7        # Dictionary to store the indices of each value in the array.
8        self.index_map = defaultdict(list)
9        # Iterate through the list and map each value to its indices.
10        for index, value in enumerate(arr):
11            self.index_map[value].append(index)
12
13    def query(self, left: int, right: int, value: int) -> int:
14        # Check if the value is not in the index map, returning 0 frequency.
15        if value not in self.index_map:
16            return 0
17      
18        # Find the closest index at the left and right boundary using binary search.
19        indices = self.index_map[value]
20        # bisect_left to find start position inclusively, while bisect_right for the end position exclusively.
21        start_position_inclusive = bisect_left(indices, left)
22        end_position_exclusive = bisect_right(indices, right)
23      
24        # Calculate the frequency by subtracting end and start position.
25        frequency = end_position_exclusive - start_position_inclusive
26        return frequency
27
28# Example of how to instantiate and use the RangeFreqQuery class:
29# obj = RangeFreqQuery(arr)
30# freq = obj.query(left, right, value)
31
1import java.util.HashMap;
2import java.util.List;
3import java.util.ArrayList;
4import java.util.Map;
5
6class RangeFreqQuery {
7    private Map<Integer, List<Integer>> indexMap = new HashMap<>();
8
9    // Constructor to initialize the data structure with the given array.
10    // It maps each number to its indices in the array.
11    public RangeFreqQuery(int[] arr) {
12        for (int i = 0; i < arr.length; i++) {
13            indexMap.computeIfAbsent(arr[i], k -> new ArrayList<>()).add(i);
14        }
15    }
16
17    // Queries the frequency of a value in the specified range.
18    // It uses binary search to find the frequency of the value.
19    public int query(int left, int right, int value) {
20        // If the value is not present in the array, return 0 frequency
21        if (!indexMap.containsKey(value)) {
22            return 0;
23        }
24
25        List<Integer> indices = indexMap.get(value);
26        // Find the starting index where the value could be inserted to maintain sorted order.
27        int startIndex = binarySearch(indices, left - 1);
28        // Find the ending index where the value could be inserted to maintain sorted order.
29        int endIndex = binarySearch(indices, right);
30
31        // The frequency is the difference between the start and end insertion points.
32        return endIndex - startIndex;
33    }
34
35    // Helper method to perform a binary search on a list of indices.
36    // Finds the first index where 'targetValue' could be inserted.
37    private int binarySearch(List<Integer> list, int targetValue) {
38        int left = 0;
39        int right = list.size();
40      
41        while (left < right) {
42            int mid = (left + right) >> 1; // Equivalent to (left + right) / 2
43            if (list.get(mid) > targetValue) {
44                right = mid;
45            } else {
46                left = mid + 1;
47            }
48        }
49        return left;
50    }
51}
52
53// The class could be used as follows:
54// RangeFreqQuery obj = new RangeFreqQuery(arr);
55// int frequency = obj.query(left, right, value);
56
1#include <vector>
2#include <unordered_map>
3#include <algorithm> // includes std::upper_bound
4
5class RangeFreqQuery {
6public:
7    // HashMap that stores each distinct number and its indices in the input array
8    std::unordered_map<int, std::vector<int>> numToIndicesMap;
9
10    // Constructor that fills the HashMap for the given array
11    RangeFreqQuery(std::vector<int>& arr) {
12        for (int i = 0; i < arr.size(); ++i)
13            numToIndicesMap[arr[i]].push_back(i); // Appends the current index to the corresponding number's vector
14    }
15
16    // Method to return the frequency of 'value' within the range [left, right]
17    int query(int left, int right, int value) {
18        // If 'value' is not found in the map, return 0 as its frequency is 0
19        if (numToIndicesMap.find(value) == numToIndicesMap.end()) return 0;
20
21        // References the vector of indices for 'value'
22        auto& indices = numToIndicesMap[value];
23
24        // Find the position in the vector for the first index greater than 'left - 1' (upper_bound is exclusive)
25        auto leftIt = std::upper_bound(indices.begin(), indices.end(), left - 1);
26      
27        // Find the position in the vector for the first index greater than 'right'
28        auto rightIt = std::upper_bound(indices.begin(), indices.end(), right);
29      
30        // The frequency is the number of elements in the range [leftIt, rightIt)
31        return rightIt - leftIt;
32    }
33};
34
35// The RangeFreqQuery class can be used as follows:
36// RangeFreqQuery* obj = new RangeFreqQuery(arr);
37// int frequency = obj->query(left, right, value);
38
1// Import necessary features from array and algorithm modules (hypothetical imports)
2// In TypeScript/JavaScript, there's no standard library providing an upper_bound function like C++'s <algorithm>,
3// so we'll need to create it or use a third-party library that implements algorithmic functions.
4
5let numToIndicesMap: { [key: number]: number[] } = {}; // equivalent to unordered_map in C++
6
7// Function to 'construct' the numToIndicesMap with indices
8function rangeFreqQueryConstructor(arr: number[]): void {
9    arr.forEach((value, index) => {
10        if (!numToIndicesMap[value]) {
11            numToIndicesMap[value] = [];
12        }
13        numToIndicesMap[value].push(index); // Append the current index to the corresponding number's array
14    });
15}
16
17// Helper function to mimic the 'std::upper_bound'
18function upperBound(arr: number[], target: number): number {
19    let start = 0, end = arr.length;
20    while (start < end) {
21        let mid = Math.floor((start + end) / 2);
22        if (arr[mid] <= target) {
23            start = mid + 1;
24        } else {
25            end = mid;
26        }
27    }
28    return start; // end and start both will be pointing to the first element greater than target
29}
30
31// Function to return the frequency of 'value' within the range [left, right]
32function query(left: number, right: number, value: number): number {
33    if (!(value in numToIndicesMap)) {
34        return 0; // If 'value' is not found, its frequency is 0
35    }
36
37    let indices = numToIndicesMap[value]; // Get the array of indices for 'value'
38
39    // Find the position for the first index greater than 'left - 1' (simulating upper_bound exclusive nature)
40    let leftIndex = upperBound(indices, left - 1);
41
42    // Find the position for the first index greater than 'right'
43    let rightIndex = upperBound(indices, right);
44
45    // The frequency is the count of elements in the range [leftIndex, rightIndex)
46    return rightIndex - leftIndex;
47}
48
Not Sure What to Study? Take the 2-min Quiz:

Which of the following shows the order of node visit in a Breadth-first Search?

Time and Space Complexity

Time Complexity

__init__ constructor:

  • The time complexity for initializing the RangeFreqQuery class is O(n), where n is the length of the input array. This is because the constructor iterates through all n elements to populate the self.mp dictionary, which maps each unique value to a list of its indices in the array.

query method:

  • The query method utilizes binary search twice through the bisect_right function to find the range of positions of a particular value between indices left and right. Therefore, the time complexity for each query is O(log k), where k is the number of occurrences of the value being queried (k can be at most n).

Space Complexity

  • The self.mp dictionary stores lists of indices for each unique value in the array. In the worst case, the space complexity is O(n), if all values are unique and each value appears only once. This would mean that the self.mp dictionary would have n keys with a single index in each list.

In summary, the time complexity is O(n) for the constructor and O(log k) for each query, and the space complexity is up to O(n).

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Consider the classic dynamic programming of fibonacci numbers, what is the recurrence relation?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫