467. Unique Substrings in Wraparound String
Problem Description
We have an infinite string base
that consists of the alphabet "abcdefghijklmnopqrstuvwxyz" repeated infinitely. So base
looks like: "...zabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcd...."
.
Given a string s
, we need to find how many unique non-empty substrings of s
can be found in the infinite string base
.
For example, if s = "zab"
, the substrings are:
- Single characters:
"z"
,"a"
,"b"
- all present inbase
- Two characters:
"za"
,"ab"
- both present inbase
(note that"za"
appears when wrapping from 'z' to 'a') - Three characters:
"zab"
- present inbase
The key insight is that valid substrings in base
must be consecutive letters in alphabetical order, with the special case that 'z' can be followed by 'a' (wrapping around).
The solution uses dynamic programming to track the longest valid substring ending with each character. For each character in s
, if it follows the previous character in alphabetical order (or wraps from 'z' to 'a'), we extend the current consecutive sequence. Otherwise, we start a new sequence of length 1. We keep track of the maximum length substring ending with each letter to avoid counting duplicates.
The final answer is the sum of the maximum lengths for all distinct ending characters, as this represents all unique substrings that can be found in base
.
Intuition
The key observation is that any valid substring in base
must consist of consecutive letters in alphabetical order (with wrapping from 'z' to 'a'). This means we're looking for substrings where each character is exactly one position after the previous character in the alphabet.
Let's think about how to count unique substrings efficiently. If we have a valid consecutive sequence like "abc"
, it contains the substrings "a"
, "b"
, "c"
, "ab"
, "bc"
, and "abc"
. Notice that all substrings ending with 'c' are: "c"
, "bc"
, "abc"
- exactly 3 substrings, which equals the length of the consecutive sequence ending at 'c'.
This leads to an important insight: instead of tracking all possible substrings, we can track the longest consecutive sequence ending with each letter. Why the longest? Because if we have "abc"
somewhere and "bc"
somewhere else, the longer sequence "abc"
already accounts for all substrings that "bc"
would contribute (both "bc"
and "c"
).
For each character in the alphabet, we only need to remember the maximum length of a consecutive sequence ending with that character. This avoids counting duplicates - if we see 'c' multiple times in different contexts, we only count the substrings from the longest sequence ending with 'c'.
The algorithm naturally follows: as we scan through the string, we maintain the current consecutive sequence length k
. When we see a character that continues the sequence (differs from the previous by 1 in the circular alphabet), we extend k
. Otherwise, we start fresh with k = 1
. We then update our record for the maximum sequence length ending with the current character.
The final count is the sum of all maximum lengths, as each maximum length for character c
represents the number of unique substrings ending with c
.
Solution Approach
We implement the solution using dynamic programming with a hash map to track the maximum length of consecutive sequences ending with each character.
Data Structure:
- We use a dictionary
f
(implemented asdefaultdict(int)
) wheref[c]
stores the maximum length of a consecutive substring ending with characterc
. - A variable
k
tracks the current consecutive sequence length.
Algorithm Steps:
-
Initialize tracking variables:
- Create
f
as a defaultdict to store maximum lengths for each character - Set
k = 0
to track the current consecutive sequence length
- Create
-
Iterate through the string: For each character
c
at positioni
:- Check if it continues the sequence:
- Calculate
(ord(c) - ord(s[i - 1])) % 26
- If this equals
1
, the current character follows the previous one in the circular alphabet - For example: 'b' after 'a' gives
(98 - 97) % 26 = 1
- Special case: 'a' after 'z' gives
(97 - 122) % 26 = 1
- Calculate
- Update the consecutive length
k
:- If consecutive: increment
k
by 1 - Otherwise: reset
k
to 1 (start a new sequence)
- If consecutive: increment
- Update the maximum for this character:
- Set
f[c] = max(f[c], k)
- This ensures we keep only the longest sequence ending with
c
- Set
- Check if it continues the sequence:
-
Calculate the result:
- Return
sum(f.values())
- Each value in
f
represents the count of unique substrings ending with that character
- Return
Why this works:
- A consecutive sequence of length
n
ending with characterc
contributes exactlyn
unique substrings (all suffixes ending withc
) - By keeping only the maximum length for each character, we avoid counting duplicate substrings
- The modulo 26 operation handles the wraparound from 'z' to 'a' elegantly
Time Complexity: O(n)
where n
is the length of string s
Space Complexity: O(1)
as the dictionary can have at most 26 entries
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the example s = "zabc"
to illustrate the solution approach.
Initial Setup:
f = {}
(empty dictionary to track max lengths)k = 0
(current consecutive sequence length)
Step 1: Process 'z' (index 0)
- First character, so we start a new sequence
k = 1
f['z'] = 1
- State:
f = {'z': 1}
Step 2: Process 'a' (index 1)
- Check if 'a' follows 'z':
(ord('a') - ord('z')) % 26 = (97 - 122) % 26 = -25 % 26 = 1
✓ - It's consecutive! So
k = k + 1 = 2
f['a'] = max(f.get('a', 0), 2) = 2
- State:
f = {'z': 1, 'a': 2}
- This means we found substrings "a" and "za" ending with 'a'
Step 3: Process 'b' (index 2)
- Check if 'b' follows 'a':
(ord('b') - ord('a')) % 26 = (98 - 97) % 26 = 1
✓ - It's consecutive! So
k = k + 1 = 3
f['b'] = max(f.get('b', 0), 3) = 3
- State:
f = {'z': 1, 'a': 2, 'b': 3}
- This means we found substrings "b", "ab", and "zab" ending with 'b'
Step 4: Process 'c' (index 3)
- Check if 'c' follows 'b':
(ord('c') - ord('b')) % 26 = (99 - 98) % 26 = 1
✓ - It's consecutive! So
k = k + 1 = 4
f['c'] = max(f.get('c', 0), 4) = 4
- Final state:
f = {'z': 1, 'a': 2, 'b': 3, 'c': 4}
- This means we found substrings "c", "bc", "abc", and "zabc" ending with 'c'
Calculate Result:
- Sum all values:
1 + 2 + 3 + 4 = 10
Unique substrings found:
- Ending with 'z': "z" (1 substring)
- Ending with 'a': "a", "za" (2 substrings)
- Ending with 'b': "b", "ab", "zab" (3 substrings)
- Ending with 'c': "c", "bc", "abc", "zabc" (4 substrings)
Total: 10 unique substrings that exist in the infinite base string.
Solution Implementation
1from collections import defaultdict
2
3class Solution:
4 def findSubstringInWraproundString(self, s: str) -> int:
5 # Dictionary to store the maximum length of substring ending with each character
6 max_length_ending_with = defaultdict(int)
7
8 # Current consecutive substring length
9 consecutive_length = 0
10
11 for index, char in enumerate(s):
12 # Check if current character follows the previous one in alphabetical order
13 # (considering wraparound: 'z' -> 'a')
14 if index > 0 and (ord(char) - ord(s[index - 1])) % 26 == 1:
15 # Extend the current consecutive substring
16 consecutive_length += 1
17 else:
18 # Start a new consecutive substring
19 consecutive_length = 1
20
21 # Update the maximum length of substring ending with current character
22 # This avoids counting duplicate substrings
23 max_length_ending_with[char] = max(max_length_ending_with[char], consecutive_length)
24
25 # Sum all maximum lengths to get total unique substrings
26 return sum(max_length_ending_with.values())
27
1class Solution {
2 public int findSubstringInWraproundString(String s) {
3 // Array to store the maximum length of valid substring ending with each letter
4 // maxLengthEndingWith[0] corresponds to 'a', maxLengthEndingWith[1] to 'b', etc.
5 int[] maxLengthEndingWith = new int[26];
6
7 int stringLength = s.length();
8
9 // Iterate through the string to find all valid substrings
10 for (int currentIndex = 0, currentSubstringLength = 0; currentIndex < stringLength; currentIndex++) {
11 // Check if current character continues the sequence from previous character
12 // Valid sequences: a->b, b->c, ..., z->a (wraparound)
13 if (currentIndex > 0 && isConsecutiveInWraparound(s.charAt(currentIndex - 1), s.charAt(currentIndex))) {
14 // Extend the current valid substring
15 currentSubstringLength++;
16 } else {
17 // Start a new valid substring with length 1
18 currentSubstringLength = 1;
19 }
20
21 // Update the maximum length of valid substring ending with current character
22 // We only keep the maximum because longer substrings contain all shorter ones
23 char currentChar = s.charAt(currentIndex);
24 int charIndex = currentChar - 'a';
25 maxLengthEndingWith[charIndex] = Math.max(maxLengthEndingWith[charIndex], currentSubstringLength);
26 }
27
28 // Sum all maximum lengths to get total count of unique valid substrings
29 return Arrays.stream(maxLengthEndingWith).sum();
30 }
31
32 /**
33 * Helper method to check if two characters are consecutive in wraparound order
34 * @param prevChar The previous character
35 * @param currChar The current character
36 * @return true if currChar follows prevChar in wraparound order
37 */
38 private boolean isConsecutiveInWraparound(char prevChar, char currChar) {
39 // Calculate difference with wraparound handling
40 // Examples: 'b' - 'a' = 1, 'a' - 'z' = -25, but (-25 + 26) % 26 = 1
41 return (currChar - prevChar + 26) % 26 == 1;
42 }
43}
44
1class Solution {
2public:
3 int findSubstringInWraproundString(string s) {
4 // Array to store the maximum length of valid substring ending with each letter
5 // maxLength[i] represents the max length of substring ending with character ('a' + i)
6 int maxLength[26] = {0};
7
8 int stringLength = s.length();
9
10 // Iterate through the string
11 for (int i = 0, currentLength = 0; i < stringLength; ++i) {
12 // Check if current character follows the previous one in circular alphabet order
13 // (s[i] - s[i-1] + 26) % 26 == 1 handles both normal succession (b after a)
14 // and wraparound case (a after z)
15 if (i > 0 && (s[i] - s[i - 1] + 26) % 26 == 1) {
16 // Extend the current valid substring
17 ++currentLength;
18 } else {
19 // Start a new valid substring
20 currentLength = 1;
21 }
22
23 // Update the maximum length for substrings ending with current character
24 int charIndex = s[i] - 'a';
25 maxLength[charIndex] = max(maxLength[charIndex], currentLength);
26 }
27
28 // Sum up all maximum lengths to get total unique substrings
29 // Each maxLength[i] represents unique substrings ending with that character
30 return accumulate(begin(maxLength), end(maxLength), 0);
31 }
32};
33
1/**
2 * Finds the number of unique non-empty substrings of s that are present in the infinite wraparound string.
3 * The wraparound string is "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz..."
4 *
5 * @param s - The input string containing only lowercase English letters
6 * @returns The count of unique substrings of s found in the wraparound string
7 */
8function findSubstringInWraproundString(s: string): number {
9 // Helper function to convert a character to its index (0-25)
10 const getCharIndex = (char: string): number => {
11 return char.charCodeAt(0) - 97;
12 };
13
14 // Array to store the maximum length of substring ending with each letter
15 // maxLengthEndingWith[i] represents the max length of valid substring ending with character (i + 'a')
16 const maxLengthEndingWith: number[] = Array(26).fill(0);
17
18 const stringLength: number = s.length;
19
20 // Iterate through the string to find all valid substrings
21 for (let i = 0, currentLength = 0; i < stringLength; ++i) {
22 const currentCharIndex: number = getCharIndex(s[i]);
23
24 // Check if current character continues the sequence from previous character
25 // In wraparound string, each character follows the previous one (z -> a is also valid)
26 if (i > 0 && (currentCharIndex - getCharIndex(s[i - 1]) + 26) % 26 === 1) {
27 // Extend the current valid substring length
28 ++currentLength;
29 } else {
30 // Start a new valid substring
31 currentLength = 1;
32 }
33
34 // Update the maximum length for substrings ending with current character
35 maxLengthEndingWith[currentCharIndex] = Math.max(
36 maxLengthEndingWith[currentCharIndex],
37 currentLength
38 );
39 }
40
41 // Sum up all maximum lengths to get total unique substrings
42 return maxLengthEndingWith.reduce((total, length) => total + length, 0);
43}
44
Time and Space Complexity
The time complexity is O(n)
, where n
is the length of the string s
. The algorithm iterates through the string exactly once using a single for loop with enumerate, performing constant-time operations for each character (checking conditions, updating variables, and dictionary operations).
The space complexity is O(|Σ|)
, where Σ
is the character set. In this case, since we're dealing with lowercase English letters, |Σ| = 26
. The defaultdict f
stores at most one entry for each unique character in the alphabet, regardless of the input string length. Therefore, the space used by the dictionary is bounded by the size of the alphabet rather than the input size.
Common Pitfalls
Pitfall 1: Incorrect Wraparound Check
Issue: A common mistake is incorrectly checking if characters are consecutive, especially for the wraparound case from 'z' to 'a'.
Incorrect approach:
# Wrong: This doesn't handle wraparound correctly
if ord(char) - ord(s[index - 1]) == 1:
consecutive_length += 1
Why it fails: When transitioning from 'z' to 'a', ord('a') - ord('z') = -25
, not 1. This would miss valid consecutive sequences like "zab".
Correct approach:
# Use modulo 26 to handle wraparound
if (ord(char) - ord(s[index - 1])) % 26 == 1:
consecutive_length += 1
Pitfall 2: Not Tracking Maximum Length Per Character
Issue: Simply counting all substrings without tracking the maximum length for each ending character leads to counting duplicates.
Incorrect approach:
# Wrong: This counts duplicates
total_count = 0
for i in range(len(s)):
if consecutive:
consecutive_length += 1
else:
consecutive_length = 1
total_count += consecutive_length # Adds all substrings
Why it fails: If "abc" appears twice in the string, this would count "a", "ab", "abc" twice, leading to incorrect results.
Correct approach:
# Keep only the maximum length for each ending character
max_length_ending_with[char] = max(max_length_ending_with[char], consecutive_length)
# Then sum all maximum values
return sum(max_length_ending_with.values())
Pitfall 3: Starting Index Edge Case
Issue: Forgetting to handle the first character properly or checking index bounds incorrectly.
Incorrect approach:
# Wrong: Doesn't handle i=0 case
for i in range(len(s)):
if (ord(s[i]) - ord(s[i-1])) % 26 == 1: # Error when i=0
consecutive_length += 1
Correct approach:
# Check index > 0 before comparing with previous character
for index, char in enumerate(s):
if index > 0 and (ord(char) - ord(s[index - 1])) % 26 == 1:
consecutive_length += 1
else:
consecutive_length = 1
Pitfall 4: Understanding What Constitutes a Valid Substring
Issue: Misunderstanding that only consecutively increasing alphabetical sequences (with wraparound) are valid in the infinite base string.
Example: The substring "ca" is NOT valid in base because 'c' doesn't immediately follow 'a' in the alphabet. Only patterns like "abc", "xyz", "zabc" are valid.
Key insight: A substring is valid if and only if each character follows the previous one in alphabetical order (considering 'z' wraps to 'a').
What does the following code do?
1def f(arr1, arr2):
2 i, j = 0, 0
3 new_arr = []
4 while i < len(arr1) and j < len(arr2):
5 if arr1[i] < arr2[j]:
6 new_arr.append(arr1[i])
7 i += 1
8 else:
9 new_arr.append(arr2[j])
10 j += 1
11 new_arr.extend(arr1[i:])
12 new_arr.extend(arr2[j:])
13 return new_arr
14
1public static List<Integer> f(int[] arr1, int[] arr2) {
2 int i = 0, j = 0;
3 List<Integer> newArr = new ArrayList<>();
4
5 while (i < arr1.length && j < arr2.length) {
6 if (arr1[i] < arr2[j]) {
7 newArr.add(arr1[i]);
8 i++;
9 } else {
10 newArr.add(arr2[j]);
11 j++;
12 }
13 }
14
15 while (i < arr1.length) {
16 newArr.add(arr1[i]);
17 i++;
18 }
19
20 while (j < arr2.length) {
21 newArr.add(arr2[j]);
22 j++;
23 }
24
25 return newArr;
26}
27
1function f(arr1, arr2) {
2 let i = 0, j = 0;
3 let newArr = [];
4
5 while (i < arr1.length && j < arr2.length) {
6 if (arr1[i] < arr2[j]) {
7 newArr.push(arr1[i]);
8 i++;
9 } else {
10 newArr.push(arr2[j]);
11 j++;
12 }
13 }
14
15 while (i < arr1.length) {
16 newArr.push(arr1[i]);
17 i++;
18 }
19
20 while (j < arr2.length) {
21 newArr.push(arr2[j]);
22 j++;
23 }
24
25 return newArr;
26}
27
Recommended Readings
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Don’t Miss This!