2168. Unique Substrings With Equal Digit Frequency

MediumHash TableStringCountingHash FunctionRolling Hash
Leetcode Link

Problem Description

The problem provides a digit string s and asks us to find the number of unique substrings where every digit appears the same number of times. A digit string means the string contains only numeric characters between '0' and '9'.

For example, if the string is 1211, there are several substrings where every digit appears an equal number of times: 1, 2, 11, 21, 12. Note that 1211 as a whole does not count because '1' appears three times while '2' appears only once.

The challenge is to calculate the count of such substrings without counting any substrings more than once, no matter where they appear in the string.

Intuition

To solve this problem, the intuition is to consider every possible substring of s and check if all digits present in the substring appear with the same frequency. A brute force approach might involve repeatedly checking the frequency of digits within each potential substring, which would be inefficient.

To optimize this, we make use of prefix sums. A prefix sum array can help us quickly determine the frequency of each digit in any substring of s. With this, we can deduce the number of times a digit appears in any substring by subtracting the prefix sum up to its starting point from the prefix sum up to its ending point.

Here's the reasoning behind the solution steps:

  1. Construct a prefix sum matrix presum where presum[i][j] gives the total count of digit j in the string s[0...i-1]. This is built for all digits from 0 to 9 and for each prefix of s.
  2. Iterate through all possible substrings of s using two nested loops. Using indexes i and j, we can define a substring s[i : j + 1].
  3. For each substring, use the prefix sums to determine the count of each digit and add it to a set only if they all have the same count, ensuring that all digits must appear the same number of times for the substring to be valid. This is checked by the check function.
  4. Since a set is used, duplicate substrings are inherently avoided.

Finally, the length of the set gives us the total count of unique valid substrings.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Given an array of 1,000,000 integers that is almost sorted, except for 2 pairs of integers. Which algorithm is fastest for sorting the array?

Solution Approach

The solution uses a few important concepts such as prefix sum arrays, set operations, and a check function to validate each substring. Here's a step-by-step implementation:

  1. Prefix Sum Array: A 2D array presum is initialized, where presum[i][j] represents the total count of the digit j from index 0 to i - 1 in the input string s. The prefix sum array greatly reduces the complexity of determining the count of each digit in a substring from O(len(substring)) to O(1) time complexity since it can be computed by a simple subtraction.

  2. Iterating Through Substrings: To generate all possible substrings of s, two nested loops are used. For all pairs of indices (i, j) such that i <= j, these indices represent the start and end of the substring s[i:j+1].

  3. Checking the Equality of Digit Frequencies: The key part of the solution is the check function, which takes two indices (i, j) and verifies if the digits in s[i:j+1] appear the same number of times. This is done by comparing the prefix sums for index j + 1 and i. If, for every digit, the difference in prefix sums is either 0 (digit not present) or equal for all present digits, the substring meets our criteria.

  4. Set for Unique Substrings: To ensure no substring is counted more than once, we use a set vis. A substring is added to the set only if it passes the check function, which implicitly handles the uniqueness constraint.

  5. Returning the Count: The total number of unique valid substrings is the size of the set vis, which is returned as the output.

This algorithm efficiently checks all possible substrings and ensures unique counts. The usage of prefix sum arrays significantly reduces the time complexity for checking digit frequency equality in substrings. The overall complexity of the algorithm is O(n^3) due to the three nested loops - two for generating substrings and one inside the check function for iterating over the 10 possible digits.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which one best describes the time complexity of the following code?

1int factorial(int n) {
2  if (n < 0) {
3    return -1;
4  } else if (n == 0) {
5    return 1;
6  } else {
7    return n * factorial(n - 1);
8  }
9}

Example Walkthrough

Let's illustrate the solution approach with a small example using the digit string s = "1122". We want to find unique substrings where every digit appears the same number of times.

Step 1: Prefix Sum Array

First, we create a prefix sum matrix presum with dimensions corresponding to the length of s plus one (for simplicity), and with 10 columns for each digit.

The initialized presum for our string s = "1122" will look like this:

1   0 1 2 3 4 5 6 7 8 9
20  0 0 0 0 0 0 0 0 0 0
31  0 1 0 0 0 0 0 0 0 0
42  0 2 0 0 0 0 0 0 0 0
53  0 2 1 0 0 0 0 0 0 0
64  0 2 2 0 0 0 0 0 0 0

Step 2: Iterating Through Substrings

Next, we iterate through all possible substrings using nested loops. For example:

  • s[0:1] is '1', count of '1' is 1
  • s[0:2] is '11', count of '1' is 2
  • s[0:3] is '112', counts of '1' and '2' are not equal
  • s[1:2] is '1', count of '1' is 1
  • … and so on

Step 3: Checking the Equality of Digit Frequencies

For each substring, we use the check function. Let's consider a check for s[2:3], which is '22':

  • We look at presum[4] - presum[2] to get [0, 0, 1, 0, 0, 0, 0, 0, 0, 0].
  • This shows us that '2' occurs once in this range.
  • Since all digits either have a count of 0 or the same count (in this case, 1), this substring is valid.

Step 4: Set for Unique Substrings

If a substring passes the check, we add it to the set vis. Continuing the process will give us all valid substrings:

  • The substring '11' appeared twice, but because we are using a set, it will only be counted once.
  • The set vis might look like: {'1', '2', '11', '22'}.

Step 5: Returning the Count

The size of the set vis gives us the number of unique valid substrings. In this example, the count would be 4.

This walkthrough with the small example of s = "1122" illustrates the mechanism of the algorithm and how it ensures that we count only the unique valid substrings where every digit appears the same number of times.

Solution Implementation

1class Solution:
2    def equalDigitFrequency(self, s: str) -> int:
3        # Helper function to check if the substring from index i to j has equal frequency of digits.
4        def has_equal_frequency(i, j):
5            digit_frequencies = set()
6            # Check the frequency of each digit in the substring.
7            for digit in range(10):
8                # Calculate the frequency of the current digit.
9                frequency = prefix_sum[j + 1][digit] - prefix_sum[i][digit]
10              
11                # If the frequency is greater than 0, add it to the set.
12                if frequency > 0:
13                    digit_frequencies.add(frequency)
14              
15                # If we have more than one frequency, they are not all equal.
16                if len(digit_frequencies) > 1:
17                    return False
18            return True
19
20        # Calculate the length of the string.
21        length = len(s)
22        # Create a prefix sum list to keep track of the frequency of each digit up to that index.
23        prefix_sum = [[0] * 10 for _ in range(length + 1)]
24      
25        # Populate the prefix sum array.
26        for i, char in enumerate(s):
27            digit = int(char)
28            prefix_sum[i + 1][digit] += 1
29            # Add the previous prefix sums for each digit.
30            for j in range(10):
31                prefix_sum[i + 1][j] += prefix_sum[i][j]
32
33        # Use a set to collect unique substrings with equal digit frequency.
34        unique_substr = set()
35        # Check all possible substrings.
36        for i in range(length):
37            for j in range(i, length):
38                # If the substring has equal frequency of digits, add it to the set.
39                if has_equal_frequency(i, j):
40                    unique_substr.add(s[i:j+1])
41
42        # Return the number of unique substrings with equal digit frequency.
43        return len(unique_substr)
44
1class Solution {
2    public int equalDigitFrequency(String s) {
3        int length = s.length();
4        int[][] prefixSum = new int[length + 1][10];
5
6        // Build prefix sum array for each digit
7        for (int i = 0; i < length; ++i) {
8            // Increment the count of the current digit in the prefix sum
9            prefixSum[i + 1][s.charAt(i) - '0']++;
10            // Copy the previous counts to the current prefix
11            for (int digit = 0; digit < 10; ++digit) {
12                prefixSum[i + 1][digit] += prefixSum[i][digit];
13            }
14        }
15
16        Set<String> uniqueSubstrings = new HashSet<>();
17
18        // Check all possible substrings for equal frequency condition
19        for (int start = 0; start < length; ++start) {
20            for (int end = start; end < length; ++end) {
21                // If the current substring meets the condition, add it to the set
22                if (hasEqualDigitFrequency(start, end, prefixSum)) {
23                    uniqueSubstrings.add(s.substring(start, end + 1));
24                }
25            }
26        }
27
28        // Return the number of unique substrings with equal digit frequency
29        return uniqueSubstrings.size();
30    }
31
32    private boolean hasEqualDigitFrequency(int start, int end, int[][] prefixSum) {
33        Set<Integer> frequencies = new HashSet<>();
34        // Check the frequency of each digit in the substring
35        for (int digit = 0; digit < 10; ++digit) {
36            int count = prefixSum[end + 1][digit] - prefixSum[start][digit];
37            if (count > 0) {
38                // Add non-zero frequency to the set
39                frequencies.add(count);
40            }
41            // If there are more than one unique frequencies, return false
42            if (frequencies.size() > 1) {
43                return false;
44            }
45        }
46        // If there is only one frequency for all digits, return true
47        return true;
48    }
49}
50
1#include <string>
2#include <vector>
3#include <unordered_set>
4using namespace std;
5
6class Solution {
7public:
8    // Function to calculate the number of unique substrings with an equal digit frequency
9    int equalDigitFrequency(string s) {
10        int length = s.length();
11        vector<vector<int>> prefixSum(length + 1, vector<int>(10, 0));
12
13        // Build prefix sum array for each digit
14        for (int i = 0; i < length; ++i) {
15            // Increment the count of the current digit in the prefix sum
16            prefixSum[i + 1][s[i] - '0']++;
17            // Copy the previous counts to the current prefix
18            for (int digit = 0; digit < 10; ++digit) {
19                prefixSum[i + 1][digit] += prefixSum[i][digit];
20            }
21        }
22
23        unordered_set<string> uniqueSubstrings;
24
25        // Check all possible substrings for equal frequency condition
26        for (int start = 0; start < length; ++start) {
27            for (int end = start; end < length; ++end) {
28                // If the current substring meets the condition, add it to the set
29                if (hasEqualDigitFrequency(start, end, prefixSum)) {
30                    uniqueSubstrings.insert(s.substr(start, end - start + 1));
31                }
32            }
33        }
34
35        // Return the number of unique substrings with equal digit frequency
36        return uniqueSubstrings.size();
37    }
38
39private:
40    // Helper function to check if a substring has equal digit frequency
41    bool hasEqualDigitFrequency(int start, int end, const vector<vector<int>>& prefixSum) {
42        unordered_set<int> frequencies;
43        // Check the frequency of each digit in the substring
44        for (int digit = 0; digit < 10; ++digit) {
45            int count = prefixSum[end + 1][digit] - prefixSum[start][digit];
46            if (count > 0) {
47                // Add non-zero frequency to the set
48                frequencies.insert(count);
49            }
50            // If there are more than one unique frequencies, return false
51            if (frequencies.size() > 1) {
52                return false;
53            }
54        }
55        // If there is only one frequency for all digits, return true
56        return frequencies.size() == 1;
57    }
58};
59
1// Helper function to check if the substring has equal digit frequency
2function hasEqualDigitFrequency(start: number, end: number, prefixSum: number[][]): boolean {
3    const frequencies: Set<number> = new Set();
4
5    // Check the frequency of each digit in the substring
6    for (let digit = 0; digit < 10; ++digit) {
7        const count: number = prefixSum[end + 1][digit] - prefixSum[start][digit];
8        if (count > 0) {
9            // Add non-zero frequency to the set
10            frequencies.add(count);
11        }
12        // If there are more than one unique frequencies, return false
13        if (frequencies.size > 1) {
14            return false;
15        }
16    }
17    // If there is only one frequency for all digits, return true
18    return true;
19}
20
21// Function to count unique substrings with equal digit frequencies
22function equalDigitFrequency(s: string): number {
23    const length: number = s.length;
24    const prefixSum: number[][] = new Array(length + 1).fill(null).map(() => new Array(10).fill(0));
25    const uniqueSubstrings: Set<string> = new Set();
26
27    // Build prefix sum array for each digit
28    for (let i = 0; i < length; ++i) {
29        // Increment the count of the current digit in the prefix sum
30        prefixSum[i + 1][s.charAt(i) - '0']++;
31        // Copy the previous counts to the current prefix
32        for (let digit = 0; digit < 10; ++digit) {
33            prefixSum[i + 1][digit] += prefixSum[i][digit];
34        }
35    }
36  
37    // Check all possible substrings for equal frequency condition
38    for (let start = 0; start < length; ++start) {
39        for (let end = start; end < length; ++end) {
40            // If the current substring meets the condition, add it to the set
41            if (hasEqualDigitFrequency(start, end, prefixSum)) {
42                uniqueSubstrings.add(s.substring(start, end + 1));
43            }
44        }
45    }
46
47    // Return the number of unique substrings with equal digit frequency
48    return uniqueSubstrings.size;
49}
50
Not Sure What to Study? Take the 2-min Quiz:

What is the worst case running time for finding an element in a binary search tree(not necessarily balanced) of size n?

Time and Space Complexity

Time Complexity

The code includes nested loops where i ranges from 0 to n-1 and j ranges from i to n-1, leading to a O(n^2) complexity for this part. Within the innermost loop, there is a function check() that iterates once again through a constant 10 elements representing the digits 0 through 9, which do not depend on n and hence contribute a O(1) to the inner complexity. Combining the nested loops with the constant-time check() function, we get a total time complexity of O(n^2).

Before the loops, there is also another loop for constructing the presum array which takes O(n) time since it iterates over the length of the input string s and does constant work in updating the counts of digits.

Therefore, combining all these, the overall time complexity is O(n^2).

Space Complexity

The space complexity is determined by the storage requirements of the presum array and the vis set.

  1. presum an array of n+1 elements where each element is an array of 10 integers, contributes a space complexity of O(10n) or simplifying O(n) because 10 is a constant.

  2. vis is a set that holds unique substrings formed by the combinations of indices i and j. In the worst-case scenario, each pair (i, j) could potentially be unique, leading to O(n^2) space complexity.

By adding both O(n) for presum and O(n^2) for vis, the dominant term is O(n^2), making the overall space complexity O(n^2).

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Which data structure is used to implement priority queue?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫