2272. Substring With Largest Variance
Problem Description
The problem focuses on finding the largest variance within any substring of a given string s
, which contains only lowercase English letters. The variance of a string is defined as the largest difference in the number of occurrences between any two distinct characters in it. It's important to consider that the characters could be the same or different.
To clarify, let's say we're given a string like "abcde". In this string, each character appears exactly once, so the variance between any two characters is zero. However, if we had a string like "aabbcc", the variance could be higher. If we choose the substring "aab", the variance would be 2 because there are two 'a' characters and none of the 'b' character (or vice versa). We need to consider all possible substrings to find the one with the largest variance.
Intuition
The intuition behind the solution lies in realizing that the variance could only be maximized if we consider pairs of different characters in the string, as those pairs are the ones that can actually have a non-zero variance. So the first step is to iterate over all possible pairs of characters (permutations of two distinct letters from the 26 lowercase English characters).
Once a pair of characters is selected (let's say 'a' and 'b'), we apply a two-pass scanning algorithm on the string to track the difference in the occurrence counts between 'a' and 'b'. We keep two counters: one (f[0]
) resets every time we encounter an 'a' after a 'b', while the other (f[1]
) continues to accumulate or starts over from f[0] - 1
when a 'b' is encountered after at least one 'a'. We look to maximize the difference (f[1]
), which is essentially the variance, while scanning.
In the end, we take the maximum recorded value during all scans as the answer. This approach works because by focusing on pairs of characters at a time, we are able to isolate the potential substrings that can provide the maximum variance, rather than getting lost in the overwhelming number of all possible substrings.
Along with tracking the occurrences of 'a' and 'b', there's also an understanding that we include a subtraction operation when we encounter a 'b', making sure we are actually counting a valid substring with at least one 'a' when 'b' appears, since variance requires both characters to be present. A -inf
is used to handle cases where a valid substring does not exist yet. By considering every single position in the string as a potential starting or ending point for substrings with maximum variance between two specific characters, the algorithm guarantees that no possible substring is overlooked, thus ensuring the correctness of the solution.
Learn more about Dynamic Programming patterns.
Solution Approach
The implementation employs a brute force strategy enhanced with a smart iteration mechanism and memory optimization to calculate the maximum variance for all substrings of the given string. The key elements of this approach involve permutations, two comparison variables, and a linear scan through the string.
Permutations of Characters
Firstly, the algorithm generates all possible permutations of two distinct characters from the set of lowercase English letters. This is done using permutations(ascii_lowercase, 2)
which loops through all pairs of two distinct characters since we're looking for the difference in occurrences between two potentially different characters.
Tracking Variance with Two Counters
In the solution, an array f
with two elements is used:
f[0]
serves as a resettable counter for occurrences of charactera
since the last occurrence ofb
.f[1]
keeps track of the accumulated variance betweena
andb
.
This dichotomy allows the algorithm to keep track of two scenarios simultaneously:
- Accumulating occurrences of
a
when nob
has interfered (handled byf[0]
). - Capturing the 'variance' until a reset is needed due to a
b
occurrence (handled byf[1]
).
Both f[0]
and f[1]
are reset to zero if a b
appears directly after a b
. However, when an a
is followed by a b
, f[1]
takes the role of capturing the current variance by taking either the current accumulated variance minus one, or the variance just after the last a
minus one, determined by max(f[1] - 1, f[0] - 1)
.
This handling ensures that at least one a
is present before a b
is counted, which is necessary for a valid substring.
Maximal Variance Calculation
As we scan the string s
, for each character c
, the algorithm performs the following actions:
- If
c
isa
, we increment bothf[0]
andf[1]
. - If
c
isb
, we decrementf[1]
and choose the larger of the two:f[1]
orf[0] - 1
, effectively either continuing with the current substring or starting a new substring that excludes the lasta
.f[0]
is reset to zero as we can't start a new substring with justb
.
After each update, we check if f[1]
exceeds the current ans
. If it does, we update ans
to the value of f[1]
. The algorithm avoids false starts with -inf
, ensuring that f[1]
only records valid substrings.
The loop through permutations of characters ensures that we consider every potential character disparity. The nested loop through the string ensures that we consider every potential substring for each character pair. Eventually, ans
will contain the maximum variance achievable among all substrings of s
.
The space complexity of the solution is O(1)
as we only maintain a few variables and a pair of counters, and the time complexity is O(26*26*n)
where n
is the length of the string, due to the nested loops—26 for each character in the permutations, and n
for the iterations over the string.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's illustrate the solution approach using a small example. Suppose we are given the string s = "abbab"
. Our task is to find the largest variance within any substring of s
.
Firstly, we generate all possible permutations of two distinct characters from the set of lowercase English letters. For simplicity, let's consider only a few pairs here, such as ('a', 'b') and ('b', 'a'), since these are the only characters in our example string.
We initiate a variable ans
to store the maximum variance found. Initially, ans = -inf
.
Pair ('a', 'b'):
- Initialize the counters
f[0]
andf[1]
to 0. - Loop through each character
c
ins = "abbab"
:- For
c = 'a'
: Increment bothf[0]
andf[1]
(since we initially consider the occurrence of 'a' without 'b' involved). - For
c = 'b'
: Decrementf[1]
and then setf[1]
to the maximum off[1]
andf[0] - 1
. Resetf[0]
to 0.
- For
Let's go through the string "abbab":
- Start with 'a':
f[0] = 1
,f[1] = 1
. - Next, 'b':
f[1] = max(f[1] - 1, f[0] - 1) = 0
,f[0] = 0
. Updateans = max(ans, f[1]) = 1
. - Next, 'b':
f[1] = max(f[1] - 1, f[0] - 1) = -1
, since there are no 'a' in front, we don't updateans
. - Next, 'a':
f[0] = 1
,f[1] = max(f[1] + 1, f[0]) = 1
. - Last, 'b':
f[1] = max(f[1] - 1, f[0] - 1) = 0
. Updateans = max(ans, f[1]) = 1
.
Pair ('b', 'a'):
Following the same steps but reversing the roles of 'a' and 'b', we would not find a variance larger than what we already have from the pair ('a', 'b').
After considering these permutations, we find that the maximum variance is 1
, which we may have obtained from the substring "ab" or "ba" in the string "abbab".
In a complete implementation, we would evaluate all permutations of character pairs, but in this small example with only 'a' and 'b' present, it is unnecessary.
The process effectively scans through every substring of s
and calculates the variance for pairs of characters, updating ans
when a higher variance is found. The time complexity is mainly driven by the nested loops: there are 26*26 permutations of character pairs, and for each pair, we scan the entire string s
of length n
, resulting in an O(26*26*n)
time complexity.
Solution Implementation
1from itertools import permutations
2from string import ascii_lowercase
3from math import inf
4
5class Solution:
6 def largestVariance(self, s: str) -> int:
7 # Initialize the answer to 0
8 max_variance = 0
9
10 # Iterate through all ordered pairs of distinct lowercase ascii characters
11 for char_a, char_b in permutations(ascii_lowercase, 2):
12
13 # Initialize the frequency counters for the current pair, where:
14 # freq_before_b records the number of 'a' characters seen so far without any 'b' in between
15 # max_freq_after_b records the maximum frequency difference after at least one 'b' was found
16 freq_before_b = 0
17 max_freq_after_b = -inf
18
19 # Iterate over each character in the string
20 for current_char in s:
21 if current_char == char_a:
22 # Increment the frequency counter of 'a' when 'a' is found
23 freq_before_b += 1
24 max_freq_after_b += 1
25 elif current_char == char_b:
26 # Decrement the frequency counter of 'b' when 'b' is found; reset freq_before_b
27 max_freq_after_b = max(max_freq_after_b - 1, freq_before_b - 1)
28 freq_before_b = 0
29
30 # Update the max_variance if we find a new maximum
31 if max_variance < max_freq_after_b:
32 max_variance = max_freq_after_b
33
34 # Return the largest variance found
35 return max_variance
36
1class Solution {
2 public int largestVariance(String s) {
3 // Length of the input string.
4 int length = s.length();
5 // Variable to store the maximum variance found so far.
6 int maxVariance = 0;
7
8 // Iterate over all possible pairs of characters (a and b are different).
9 for (char firstChar = 'a'; firstChar <= 'z'; ++firstChar) {
10 for (char secondChar = 'a'; secondChar <= 'z'; ++secondChar) {
11 if (firstChar == secondChar) {
12 // If both characters are the same, skip this iteration.
13 continue;
14 }
15
16 // Array to keep track of the frequency of character 'a'
17 // f[0] is the streak of 'a's, f[1] is the max variance for the current window.
18 // Initialize with 0 for streak and -n for variance because variance cannot be less than -n.
19 int[] frequency = new int[] {0, -length};
20
21 // Iterate over each character in the string.
22 for (int i = 0; i < length; ++i) {
23 if (s.charAt(i) == firstChar) {
24 // If the current character is 'a', increase both frequencies.
25 frequency[0]++;
26 frequency[1]++;
27 } else if (s.charAt(i) == secondChar) {
28 // If the current character is 'b', calculate the variance.
29 frequency[1] = Math.max(frequency[0] - 1, frequency[1] - 1);
30 // Reset the streak of 'a's because 'b' is encountered.
31 frequency[0] = 0;
32 }
33 // Update the maximum variance found.
34 maxVariance = Math.max(maxVariance, frequency[1]);
35 }
36 }
37 }
38
39 // Return the largest variance found.
40 return maxVariance;
41 }
42}
43
1class Solution {
2public:
3 // Method to find the largest variance of a substring for a given string
4 int largestVariance(string s) {
5 // Length of the input string
6 int n = s.size();
7 // Initialize the maximum variance to 0
8 int maxVariance = 0;
9
10 // Iterate over all possible character pairs (a, b) where a != b
11 for (char a = 'a'; a <= 'z'; ++a) {
12 for (char b = 'a'; b <= 'z'; ++b) {
13 // Skip if both characters are the same
14 if (a == b) continue;
15
16 // Initialize frequency counters for characters a and b
17 // f[0] will track frequency of a, f[1] tracks max variance
18 int frequencies[2] = {0, -n};
19
20 // Iterate over each character in the string
21 for (char currentChar : s) {
22 // Increment frequency count if a matches the current character
23 if (currentChar == a) {
24 frequencies[0]++;
25 frequencies[1]++;
26 }
27 // Decrement frequency count and update variance if b matches
28 else if (currentChar == b) {
29 frequencies[1] = max(frequencies[1] - 1, frequencies[0] - 1);
30 frequencies[0] = 0;
31 }
32 // Update the maximum variance found so far
33 maxVariance = max(maxVariance, frequencies[1]);
34 }
35 }
36 }
37 // Return the final maximum variance
38 return maxVariance;
39 }
40};
41
1// Function to find the largest variance of a substring for a given string
2function largestVariance(s: string): number {
3 // Length of the input string
4 let n: number = s.length;
5 // Initialize the maximum variance to 0
6 let maxVariance: number = 0;
7
8 // Iterate over all possible character pairs (a, b) where a != b
9 for (let a = 'a'.charCodeAt(0); a <= 'z'.charCodeAt(0); ++a) {
10 for (let b = 'a'.charCodeAt(0); b <= 'z'.charCodeAt(0); ++b) {
11 // Skip if both characters are the same
12 if (a === b) continue;
13
14 // Initialize frequency counters for characters 'a' and 'b'
15 let frequencies: number[] = [0, -n];
16
17 // Iterate over each character in the string
18 for (let currentChar of s) {
19 // Increment frequency count if 'a' matches the current character
20 if (currentChar.charCodeAt(0) === a) {
21 frequencies[0]++;
22 frequencies[1]++;
23 }
24 // Decrement frequency count and update variance if 'b' matches
25 else if (currentChar.charCodeAt(0) === b) {
26 frequencies[1] = Math.max(frequencies[1] - 1, frequencies[0] - 1);
27 frequencies[0] = 0;
28 }
29 // Update the maximum variance found so far
30 maxVariance = Math.max(maxVariance, frequencies[1]);
31 }
32 }
33 }
34 // Return the final maximum variance
35 return maxVariance;
36}
37
Time and Space Complexity
Time Complexity
The time complexity of the code can be analyzed based on the operations it performs:
- The function
permutations(ascii_lowercase, 2)
generates all possible permutations of two distinct lowercase letters from the English alphabet. Since there are 26 letters in the English alphabet, there are26 * 25
such permutations because after choosing the first letter, there are 25 remaining letters to form a pair. - The function then iterates through each character in the string
s
for every pair of letters (a
,b
). The length of the strings
isn
. - Within the nested loop, there are constant-time operations (comparisons, assignments,
max
, addition, and subtraction).
Given the above steps, the time complexity is O(26 * 25 * n)
, which simplifies to O(n)
because the number of permutations (26 * 25
) is a constant.
Space Complexity
The space complexity of the code is determined by:
- Variables
ans
,f[0]
, andf[1]
, which are used to keep track of the current and maximum variance. - No additional data structures that grow with input size are used.
Thus, the space complexity of the algorithm is O(1)
as it only uses a constant amount of space.
Learn more about how to find time and space complexity quickly using problem constraints.
How would you design a stack which has a function min
that returns the minimum element in the stack, in addition to push
and pop
? All push
, pop
, min
should have running time O(1)
.
Recommended Readings
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!