3104. Find Longest Self-Contained Substring 🔒
Problem Description
Given a string s
, you need to find the length of the longest self-contained substring.
A substring t
of string s
is called self-contained if:
t != s
(the substring cannot be the entire string)- For every character in
t
, that character does not appear anywhere else ins
outside oft
In other words, a self-contained substring is a portion of the string where all occurrences of every character within that substring are completely contained within the substring itself - none of those characters appear in the rest of the string.
For example:
- If
s = "abcabc"
, the substring"abc"
(from index 0 to 2) is self-contained because the characters 'a', 'b', and 'c' at positions 0, 1, 2 don't appear anywhere after index 2 in the remaining string. - If
s = "abcab"
, there is no self-contained substring because every character appears both inside and outside any potential substring.
Return the length of the longest self-contained substring if one exists, otherwise return -1.
The key insight is that a self-contained substring must start at the first occurrence of some character and must include all occurrences of every character it contains. The algorithm tracks the first and last positions of each character, then checks potential substrings starting from each character's first occurrence to find valid self-contained substrings.
Intuition
The key observation is that if a substring is self-contained, it must include all occurrences of every character that appears in it. This means if character 'a' appears in our substring, we must include every 'a' in the entire string for the substring to be self-contained.
This leads to an important insight: a self-contained substring can only start at a position where a character appears for the first time in the string. Why? Because if we start at the second or later occurrence of a character, we've already left out at least one occurrence of that character, violating the self-contained property.
Consider the string "abcabc"
. If we want a substring containing 'a', we must start at index 0 (first 'a') and extend at least to index 3 (last 'a'). Starting at any other position would leave some 'a' outside our substring.
The approach then becomes:
- For each character, record where it first appears and where it last appears
- Try starting a substring at each character's first occurrence position
- As we extend the substring, keep track of the rightmost position we must reach to include all occurrences of characters we've seen so far
- If we encounter a character whose first occurrence is before our starting position, this substring cannot be self-contained
- If we successfully reach a point where we've included all necessary occurrences and haven't used the entire string, we have a valid self-contained substring
This greedy expansion approach works because once we commit to including a character, we must include all its occurrences, which might force us to include more characters, creating a cascading effect until we either find a valid boundary or determine the substring cannot be self-contained.
Learn more about Binary Search and Prefix Sum patterns.
Solution Approach
First, we create two hash tables (or dictionaries) first
and last
to record the first and last occurrence positions of each character in the string:
first, last = {}, {}
for i, c in enumerate(s):
if c not in first:
first[c] = i
last[c] = i
Next, we enumerate each character c
as a potential starting point for our self-contained substring. We only consider positions where characters appear for the first time (stored in first
):
for c, i in first.items(): mx = last[c] # Initialize mx with the last occurrence of character c
Here, i
is the starting position of our potential substring, and mx
tracks the rightmost position we must reach to include all occurrences of characters we've encountered.
For each starting position i
, we traverse from position i
to the end of the string:
for j in range(i, n):
a, b = first[s[j]], last[s[j]]
At each position j
, we get:
a
: the first occurrence position of characters[j]
b
: the last occurrence position of characters[j]
If a < i
, it means character s[j]
has already appeared before our starting position i
. This violates the self-contained property, so we break out of the inner loop:
if a < i: break
Otherwise, we update mx
to ensure we include all occurrences of s[j]
:
mx = max(mx, b)
If mx == j
, it means we've reached a position where we've included all necessary occurrences of all characters seen so far. If additionally j - i + 1 < n
(ensuring the substring is not the entire string), we have found a valid self-contained substring:
if mx == j and j - i + 1 < n:
ans = max(ans, j - i + 1)
The algorithm continues this process for all possible starting positions and returns the maximum length found, or -1 if no valid self-contained substring exists.
The time complexity is O(n²)
where n
is the length of the string, as we potentially examine each substring starting from each character's first occurrence. The space complexity is O(k)
where k
is the number of unique characters in the string.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with the string s = "abcabc"
.
Step 1: Build first and last occurrence maps
- Iterate through the string to record positions:
- 'a': first = 0, last = 3
- 'b': first = 1, last = 4
- 'c': first = 2, last = 5
Step 2: Try each character's first occurrence as a starting point
Starting from 'a' (position 0):
- Initialize:
i = 0
,mx = 3
(last position of 'a') - Position 0: character 'a'
- First occurrence of 'a' = 0 (not < 0, so continue)
- Update
mx = max(3, 3) = 3
- Check:
mx (3) ≠j (0)
, continue
- Position 1: character 'b'
- First occurrence of 'b' = 1 (not < 0, so continue)
- Update
mx = max(3, 4) = 4
- Check:
mx (4) ≠j (1)
, continue
- Position 2: character 'c'
- First occurrence of 'c' = 2 (not < 0, so continue)
- Update
mx = max(4, 5) = 5
- Check:
mx (5) ≠j (2)
, continue
- Position 3: character 'a'
- First occurrence of 'a' = 0 (not < 0, so continue)
- Update
mx = max(5, 3) = 5
- Check:
mx (5) ≠j (3)
, continue
- Position 4: character 'b'
- First occurrence of 'b' = 1 (not < 0, so continue)
- Update
mx = max(5, 4) = 5
- Check:
mx (5) ≠j (4)
, continue
- Position 5: character 'c'
- First occurrence of 'c' = 2 (not < 0, so continue)
- Update
mx = max(5, 5) = 5
- Check:
mx (5) = j (5)
✓ and5 - 0 + 1 = 6 = n
, not valid (equals entire string)
Starting from 'b' (position 1):
- Initialize:
i = 1
,mx = 4
(last position of 'b') - Position 1: character 'b'
- First occurrence of 'b' = 1 (not < 1, so continue)
- Update
mx = max(4, 4) = 4
- Check:
mx (4) ≠j (1)
, continue
- Position 2: character 'c'
- First occurrence of 'c' = 2 (not < 1, so continue)
- Update
mx = max(4, 5) = 5
- Check:
mx (5) ≠j (2)
, continue
- Position 3: character 'a'
- First occurrence of 'a' = 0 (0 < 1) ✗ Break!
- Character 'a' appears before our start position, invalid
Starting from 'c' (position 2):
- Initialize:
i = 2
,mx = 5
(last position of 'c') - Position 2: character 'c'
- First occurrence of 'c' = 2 (not < 2, so continue)
- Update
mx = max(5, 5) = 5
- Check:
mx (5) ≠j (2)
, continue
- Position 3: character 'a'
- First occurrence of 'a' = 0 (0 < 2) ✗ Break!
- Character 'a' appears before our start position, invalid
Step 3: Return result No valid self-contained substring was found, so return -1.
Let's try another example where a valid substring exists: s = "abcdefabc"
Step 1: Build maps
- 'a': first = 0, last = 6
- 'b': first = 1, last = 7
- 'c': first = 2, last = 8
- 'd': first = 3, last = 3
- 'e': first = 4, last = 4
- 'f': first = 5, last = 5
Step 2: Try starting from 'd' (position 3):
- Initialize:
i = 3
,mx = 3
- Position 3: character 'd'
- First occurrence = 3 (not < 3, continue)
- Update
mx = max(3, 3) = 3
- Check:
mx (3) = j (3)
✓ and3 - 3 + 1 = 1 < 9
✓ - Found valid substring "d" with length 1
Continuing to check 'e' (position 4):
- Similar process yields "e" with length 1
Continuing to check 'f' (position 5):
- Similar process yields "f" with length 1
The longest self-contained substring has length 1 (any of "d", "e", or "f").
Solution Implementation
1class Solution:
2 def maxSubstringLength(self, s: str) -> int:
3 # Build dictionaries to track first and last occurrence of each character
4 first_occurrence = {}
5 last_occurrence = {}
6
7 # Populate the occurrence dictionaries
8 for index, char in enumerate(s):
9 if char not in first_occurrence:
10 first_occurrence[char] = index
11 last_occurrence[char] = index
12
13 # Initialize result and get string length
14 max_length = -1
15 string_length = len(s)
16
17 # Try each character as a potential starting point for a valid substring
18 for char, start_index in first_occurrence.items():
19 # Track the maximum index we need to include to satisfy all character occurrences
20 required_end = last_occurrence[char]
21
22 # Expand from start_index and check if we can form a valid substring
23 for current_index in range(start_index, string_length):
24 current_char = s[current_index]
25 char_first = first_occurrence[current_char]
26 char_last = last_occurrence[current_char]
27
28 # If current character's first occurrence is before our start, invalid substring
29 if char_first < start_index:
30 break
31
32 # Update the required end to include all occurrences of current character
33 required_end = max(required_end, char_last)
34
35 # Check if we've reached a valid substring endpoint
36 # Valid if: we've covered all required occurrences AND it's not the entire string
37 if required_end == current_index and current_index - start_index + 1 < string_length:
38 max_length = max(max_length, current_index - start_index + 1)
39
40 return max_length
41
1class Solution {
2 public int maxSubstringLength(String s) {
3 // Arrays to store the first and last occurrence index of each character
4 int[] firstOccurrence = new int[26];
5 int[] lastOccurrence = new int[26];
6
7 // Initialize first occurrence array with -1 (indicating character not found yet)
8 Arrays.fill(firstOccurrence, -1);
9
10 int stringLength = s.length();
11
12 // Record first and last occurrence of each character
13 for (int i = 0; i < stringLength; ++i) {
14 int charIndex = s.charAt(i) - 'a';
15
16 // If this is the first time we see this character
17 if (firstOccurrence[charIndex] == -1) {
18 firstOccurrence[charIndex] = i;
19 }
20 // Update last occurrence (will be updated for every occurrence)
21 lastOccurrence[charIndex] = i;
22 }
23
24 int maxLength = -1;
25
26 // Try starting a substring from each character's first occurrence
27 for (int charIdx = 0; charIdx < 26; ++charIdx) {
28 int startPos = firstOccurrence[charIdx];
29
30 // Skip if this character doesn't exist in the string
31 if (startPos == -1) {
32 continue;
33 }
34
35 // Track the rightmost position we need to include
36 int requiredEndPos = lastOccurrence[charIdx];
37
38 // Expand from start position and check if we can form a valid substring
39 for (int currentPos = startPos; currentPos < stringLength; ++currentPos) {
40 int currentCharIndex = s.charAt(currentPos) - 'a';
41 int currentCharFirstPos = firstOccurrence[currentCharIndex];
42 int currentCharLastPos = lastOccurrence[currentCharIndex];
43
44 // If current character's first occurrence is before our start position,
45 // we cannot form a valid substring
46 if (currentCharFirstPos < startPos) {
47 break;
48 }
49
50 // Update the required end position to include all occurrences
51 // of characters we've seen so far
52 requiredEndPos = Math.max(requiredEndPos, currentCharLastPos);
53
54 // If we've reached the required end position and the substring
55 // is not the entire string, update the maximum length
56 if (requiredEndPos == currentPos && currentPos - startPos + 1 < stringLength) {
57 maxLength = Math.max(maxLength, currentPos - startPos + 1);
58 }
59 }
60 }
61
62 return maxLength;
63 }
64}
65
1class Solution {
2public:
3 int maxSubstringLength(string s) {
4 // Store the first occurrence index of each character ('a' to 'z')
5 // Initialize with -1 to indicate character hasn't appeared yet
6 vector<int> firstOccurrence(26, -1);
7
8 // Store the last occurrence index of each character ('a' to 'z')
9 vector<int> lastOccurrence(26);
10
11 int stringLength = s.length();
12
13 // Populate first and last occurrence arrays for each character
14 for (int i = 0; i < stringLength; ++i) {
15 int charIndex = s[i] - 'a'; // Convert character to 0-based index
16
17 // Record first occurrence if not seen before
18 if (firstOccurrence[charIndex] == -1) {
19 firstOccurrence[charIndex] = i;
20 }
21
22 // Always update last occurrence
23 lastOccurrence[charIndex] = i;
24 }
25
26 int maxLength = -1; // Maximum valid substring length found
27
28 // Try each character as a potential starting point
29 for (int charIdx = 0; charIdx < 26; ++charIdx) {
30 int startPos = firstOccurrence[charIdx];
31
32 // Skip if this character doesn't exist in the string
33 if (startPos == -1) {
34 continue;
35 }
36
37 // Track the maximum required ending position for current substring
38 int maxEndPos = lastOccurrence[charIdx];
39
40 // Expand substring from startPos, checking validity
41 for (int currentPos = startPos; currentPos < stringLength; ++currentPos) {
42 int currentCharIndex = s[currentPos] - 'a';
43 int currentCharFirstPos = firstOccurrence[currentCharIndex];
44 int currentCharLastPos = lastOccurrence[currentCharIndex];
45
46 // If current character appears before our start position,
47 // we cannot form a valid substring
48 if (currentCharFirstPos < startPos) {
49 break;
50 }
51
52 // Update maximum required ending position
53 maxEndPos = max(maxEndPos, currentCharLastPos);
54
55 // Check if we've reached a valid substring endpoint:
56 // - All required characters are included (maxEndPos == currentPos)
57 // - The substring is not the entire string (length < stringLength)
58 if (maxEndPos == currentPos && currentPos - startPos + 1 < stringLength) {
59 maxLength = max(maxLength, currentPos - startPos + 1);
60 }
61 }
62 }
63
64 return maxLength;
65 }
66};
67
1/**
2 * Finds the maximum length of a substring where all characters in the substring
3 * appear only within that substring and nowhere else in the string.
4 *
5 * @param s - The input string containing only lowercase letters
6 * @returns The maximum length of such a substring, or -1 if no valid substring exists
7 */
8function maxSubstringLength(s: string): number {
9 // Arrays to track first and last occurrence indices for each letter (a-z)
10 const firstOccurrence: number[] = Array(26).fill(-1);
11 const lastOccurrence: number[] = Array(26).fill(0);
12 const stringLength: number = s.length;
13
14 // Record the first and last occurrence index for each character
15 for (let i = 0; i < stringLength; ++i) {
16 const charIndex: number = s.charCodeAt(i) - 97; // Convert 'a'-'z' to 0-25
17
18 if (firstOccurrence[charIndex] === -1) {
19 firstOccurrence[charIndex] = i;
20 }
21 lastOccurrence[charIndex] = i;
22 }
23
24 let maxLength: number = -1;
25
26 // Try each character as a potential starting point for a valid substring
27 for (let charCode = 0; charCode < 26; ++charCode) {
28 const startIndex: number = firstOccurrence[charCode];
29
30 // Skip if this character doesn't exist in the string
31 if (startIndex === -1) {
32 continue;
33 }
34
35 // Track the maximum required ending position for the current substring
36 let maxEndIndex: number = lastOccurrence[charCode];
37
38 // Expand the substring from startIndex, checking each character
39 for (let currentIndex = startIndex; currentIndex < stringLength; ++currentIndex) {
40 const currentCharIndex: number = s.charCodeAt(currentIndex) - 97;
41 const currentCharFirstOccurrence: number = firstOccurrence[currentCharIndex];
42
43 // If character appears before our start, this substring is invalid
44 if (currentCharFirstOccurrence < startIndex) {
45 break;
46 }
47
48 // Update the required ending position to include all occurrences
49 const currentCharLastOccurrence: number = lastOccurrence[currentCharIndex];
50 maxEndIndex = Math.max(maxEndIndex, currentCharLastOccurrence);
51
52 // Check if we've found a complete valid substring
53 // (current position matches required end and it's not the entire string)
54 if (maxEndIndex === currentIndex && currentIndex - startIndex + 1 < stringLength) {
55 maxLength = Math.max(maxLength, currentIndex - startIndex + 1);
56 }
57 }
58 }
59
60 return maxLength;
61}
62
Time and Space Complexity
Time Complexity: O(n × |Σ|)
The algorithm consists of two main parts:
- The first loop iterates through the string once to build the
first
andlast
dictionaries, takingO(n)
time. - The second nested loop structure:
- The outer loop iterates through each unique character in the
first
dictionary, which can be at most|Σ|
characters - For each character, the inner loop can iterate up to
n
positions in the worst case - Therefore, this part takes
O(|Σ| × n)
time
- The outer loop iterates through each unique character in the
The overall time complexity is O(n) + O(|Σ| × n) = O(n × |Σ|)
, where n
is the length of string s
and |Σ|
represents the size of the character set (26 for lowercase letters).
Space Complexity: O(|Σ|)
The algorithm uses:
- Two dictionaries
first
andlast
to store the first and last occurrence indices of each character - Each dictionary can store at most
|Σ|
key-value pairs (one for each unique character in the string) - A few additional variables (
ans
,n
,mx
, etc.) that use constant space
Therefore, the space complexity is O(|Σ|)
, where |Σ| = 26
for lowercase English letters.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Not Breaking Early When Character Appears Before Start Position
A critical pitfall is forgetting to break the inner loop when encountering a character that has already appeared before the current starting position. This would lead to incorrect results by considering invalid substrings.
Incorrect Implementation:
for current_index in range(start_index, string_length):
current_char = s[current_index]
char_first = first_occurrence[current_char]
char_last = last_occurrence[current_char]
# MISSING: Should break if char_first < start_index
required_end = max(required_end, char_last)
# ... rest of code
Why This Fails:
Without the break condition, the algorithm would continue processing even when it encounters characters that violate the self-contained property. For example, with s = "abcabc"
, when starting from index 3 (second 'a'), it would incorrectly consider the substring "abc" at indices 3-5 as valid, even though 'a', 'b', 'c' also appear at indices 0-2.
Solution: Always include the break condition:
if char_first < start_index: break
2. Incorrect Validation of Substring Length
Another common mistake is using <=
instead of <
when checking if the substring length is less than the total string length.
Incorrect Implementation:
if required_end == current_index and current_index - start_index + 1 <= string_length:
max_length = max(max_length, current_index - start_index + 1)
Why This Fails:
Using <=
would allow the entire string to be considered as a valid self-contained substring, which violates the constraint that t != s
. This would return the length of the entire string instead of -1 for strings like s = "aaa"
where no proper self-contained substring exists.
Solution: Use strict inequality:
if required_end == current_index and current_index - start_index + 1 < string_length:
max_length = max(max_length, current_index - start_index + 1)
3. Not Initializing required_end
Correctly
Forgetting to initialize required_end
with the last occurrence of the starting character can cause the algorithm to miss valid substrings.
Incorrect Implementation:
for char, start_index in first_occurrence.items(): required_end = start_index # Wrong: should be last_occurrence[char] # ... rest of code
Why This Fails:
If a character appears multiple times, we must ensure all its occurrences are included in the substring. Starting with required_end = start_index
would fail to account for later occurrences of the starting character itself.
Solution: Initialize with the last occurrence of the starting character:
required_end = last_occurrence[char]
Is the following code DFS or BFS?
void search(Node root) { if (!root) return; visit(root); root.visited = true; for (Node node in root.adjacent) { if (!node.visited) { search(node); } } }
Recommended Readings
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
Prefix Sum The prefix sum is an incredibly powerful and straightforward technique Its primary goal is to allow for constant time range sum queries on an array What is Prefix Sum The prefix sum of an array at index i is the sum of all numbers from index 0 to i By
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Don’t Miss This!