Facebook Pixel

3104. Find Longest Self-Contained Substring 🔒

Problem Description

Given a string s, you need to find the length of the longest self-contained substring.

A substring t of string s is called self-contained if:

  1. t != s (the substring cannot be the entire string)
  2. For every character in t, that character does not appear anywhere else in s outside of t

In other words, a self-contained substring is a portion of the string where all occurrences of every character within that substring are completely contained within the substring itself - none of those characters appear in the rest of the string.

For example:

  • If s = "abcabc", the substring "abc" (from index 0 to 2) is self-contained because the characters 'a', 'b', and 'c' at positions 0, 1, 2 don't appear anywhere after index 2 in the remaining string.
  • If s = "abcab", there is no self-contained substring because every character appears both inside and outside any potential substring.

Return the length of the longest self-contained substring if one exists, otherwise return -1.

The key insight is that a self-contained substring must start at the first occurrence of some character and must include all occurrences of every character it contains. The algorithm tracks the first and last positions of each character, then checks potential substrings starting from each character's first occurrence to find valid self-contained substrings.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key observation is that if a substring is self-contained, it must include all occurrences of every character that appears in it. This means if character 'a' appears in our substring, we must include every 'a' in the entire string for the substring to be self-contained.

This leads to an important insight: a self-contained substring can only start at a position where a character appears for the first time in the string. Why? Because if we start at the second or later occurrence of a character, we've already left out at least one occurrence of that character, violating the self-contained property.

Consider the string "abcabc". If we want a substring containing 'a', we must start at index 0 (first 'a') and extend at least to index 3 (last 'a'). Starting at any other position would leave some 'a' outside our substring.

The approach then becomes:

  1. For each character, record where it first appears and where it last appears
  2. Try starting a substring at each character's first occurrence position
  3. As we extend the substring, keep track of the rightmost position we must reach to include all occurrences of characters we've seen so far
  4. If we encounter a character whose first occurrence is before our starting position, this substring cannot be self-contained
  5. If we successfully reach a point where we've included all necessary occurrences and haven't used the entire string, we have a valid self-contained substring

This greedy expansion approach works because once we commit to including a character, we must include all its occurrences, which might force us to include more characters, creating a cascading effect until we either find a valid boundary or determine the substring cannot be self-contained.

Learn more about Binary Search and Prefix Sum patterns.

Solution Approach

First, we create two hash tables (or dictionaries) first and last to record the first and last occurrence positions of each character in the string:

first, last = {}, {}
for i, c in enumerate(s):
    if c not in first:
        first[c] = i
    last[c] = i

Next, we enumerate each character c as a potential starting point for our self-contained substring. We only consider positions where characters appear for the first time (stored in first):

for c, i in first.items():
    mx = last[c]  # Initialize mx with the last occurrence of character c

Here, i is the starting position of our potential substring, and mx tracks the rightmost position we must reach to include all occurrences of characters we've encountered.

For each starting position i, we traverse from position i to the end of the string:

for j in range(i, n):
    a, b = first[s[j]], last[s[j]]

At each position j, we get:

  • a: the first occurrence position of character s[j]
  • b: the last occurrence position of character s[j]

If a < i, it means character s[j] has already appeared before our starting position i. This violates the self-contained property, so we break out of the inner loop:

if a < i:
    break

Otherwise, we update mx to ensure we include all occurrences of s[j]:

mx = max(mx, b)

If mx == j, it means we've reached a position where we've included all necessary occurrences of all characters seen so far. If additionally j - i + 1 < n (ensuring the substring is not the entire string), we have found a valid self-contained substring:

if mx == j and j - i + 1 < n:
    ans = max(ans, j - i + 1)

The algorithm continues this process for all possible starting positions and returns the maximum length found, or -1 if no valid self-contained substring exists.

The time complexity is O(n²) where n is the length of the string, as we potentially examine each substring starting from each character's first occurrence. The space complexity is O(k) where k is the number of unique characters in the string.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with the string s = "abcabc".

Step 1: Build first and last occurrence maps

  • Iterate through the string to record positions:
    • 'a': first = 0, last = 3
    • 'b': first = 1, last = 4
    • 'c': first = 2, last = 5

Step 2: Try each character's first occurrence as a starting point

Starting from 'a' (position 0):

  • Initialize: i = 0, mx = 3 (last position of 'a')
  • Position 0: character 'a'
    • First occurrence of 'a' = 0 (not < 0, so continue)
    • Update mx = max(3, 3) = 3
    • Check: mx (3) ≠ j (0), continue
  • Position 1: character 'b'
    • First occurrence of 'b' = 1 (not < 0, so continue)
    • Update mx = max(3, 4) = 4
    • Check: mx (4) ≠ j (1), continue
  • Position 2: character 'c'
    • First occurrence of 'c' = 2 (not < 0, so continue)
    • Update mx = max(4, 5) = 5
    • Check: mx (5) ≠ j (2), continue
  • Position 3: character 'a'
    • First occurrence of 'a' = 0 (not < 0, so continue)
    • Update mx = max(5, 3) = 5
    • Check: mx (5) ≠ j (3), continue
  • Position 4: character 'b'
    • First occurrence of 'b' = 1 (not < 0, so continue)
    • Update mx = max(5, 4) = 5
    • Check: mx (5) ≠ j (4), continue
  • Position 5: character 'c'
    • First occurrence of 'c' = 2 (not < 0, so continue)
    • Update mx = max(5, 5) = 5
    • Check: mx (5) = j (5) ✓ and 5 - 0 + 1 = 6 = n, not valid (equals entire string)

Starting from 'b' (position 1):

  • Initialize: i = 1, mx = 4 (last position of 'b')
  • Position 1: character 'b'
    • First occurrence of 'b' = 1 (not < 1, so continue)
    • Update mx = max(4, 4) = 4
    • Check: mx (4) ≠ j (1), continue
  • Position 2: character 'c'
    • First occurrence of 'c' = 2 (not < 1, so continue)
    • Update mx = max(4, 5) = 5
    • Check: mx (5) ≠ j (2), continue
  • Position 3: character 'a'
    • First occurrence of 'a' = 0 (0 < 1) ✗ Break!
    • Character 'a' appears before our start position, invalid

Starting from 'c' (position 2):

  • Initialize: i = 2, mx = 5 (last position of 'c')
  • Position 2: character 'c'
    • First occurrence of 'c' = 2 (not < 2, so continue)
    • Update mx = max(5, 5) = 5
    • Check: mx (5) ≠ j (2), continue
  • Position 3: character 'a'
    • First occurrence of 'a' = 0 (0 < 2) ✗ Break!
    • Character 'a' appears before our start position, invalid

Step 3: Return result No valid self-contained substring was found, so return -1.


Let's try another example where a valid substring exists: s = "abcdefabc"

Step 1: Build maps

  • 'a': first = 0, last = 6
  • 'b': first = 1, last = 7
  • 'c': first = 2, last = 8
  • 'd': first = 3, last = 3
  • 'e': first = 4, last = 4
  • 'f': first = 5, last = 5

Step 2: Try starting from 'd' (position 3):

  • Initialize: i = 3, mx = 3
  • Position 3: character 'd'
    • First occurrence = 3 (not < 3, continue)
    • Update mx = max(3, 3) = 3
    • Check: mx (3) = j (3) ✓ and 3 - 3 + 1 = 1 < 9 ✓
    • Found valid substring "d" with length 1

Continuing to check 'e' (position 4):

  • Similar process yields "e" with length 1

Continuing to check 'f' (position 5):

  • Similar process yields "f" with length 1

The longest self-contained substring has length 1 (any of "d", "e", or "f").

Solution Implementation

1class Solution:
2    def maxSubstringLength(self, s: str) -> int:
3        # Build dictionaries to track first and last occurrence of each character
4        first_occurrence = {}
5        last_occurrence = {}
6      
7        # Populate the occurrence dictionaries
8        for index, char in enumerate(s):
9            if char not in first_occurrence:
10                first_occurrence[char] = index
11            last_occurrence[char] = index
12      
13        # Initialize result and get string length
14        max_length = -1
15        string_length = len(s)
16      
17        # Try each character as a potential starting point for a valid substring
18        for char, start_index in first_occurrence.items():
19            # Track the maximum index we need to include to satisfy all character occurrences
20            required_end = last_occurrence[char]
21          
22            # Expand from start_index and check if we can form a valid substring
23            for current_index in range(start_index, string_length):
24                current_char = s[current_index]
25                char_first = first_occurrence[current_char]
26                char_last = last_occurrence[current_char]
27              
28                # If current character's first occurrence is before our start, invalid substring
29                if char_first < start_index:
30                    break
31              
32                # Update the required end to include all occurrences of current character
33                required_end = max(required_end, char_last)
34              
35                # Check if we've reached a valid substring endpoint
36                # Valid if: we've covered all required occurrences AND it's not the entire string
37                if required_end == current_index and current_index - start_index + 1 < string_length:
38                    max_length = max(max_length, current_index - start_index + 1)
39      
40        return max_length
41
1class Solution {
2    public int maxSubstringLength(String s) {
3        // Arrays to store the first and last occurrence index of each character
4        int[] firstOccurrence = new int[26];
5        int[] lastOccurrence = new int[26];
6      
7        // Initialize first occurrence array with -1 (indicating character not found yet)
8        Arrays.fill(firstOccurrence, -1);
9      
10        int stringLength = s.length();
11      
12        // Record first and last occurrence of each character
13        for (int i = 0; i < stringLength; ++i) {
14            int charIndex = s.charAt(i) - 'a';
15          
16            // If this is the first time we see this character
17            if (firstOccurrence[charIndex] == -1) {
18                firstOccurrence[charIndex] = i;
19            }
20            // Update last occurrence (will be updated for every occurrence)
21            lastOccurrence[charIndex] = i;
22        }
23      
24        int maxLength = -1;
25      
26        // Try starting a substring from each character's first occurrence
27        for (int charIdx = 0; charIdx < 26; ++charIdx) {
28            int startPos = firstOccurrence[charIdx];
29          
30            // Skip if this character doesn't exist in the string
31            if (startPos == -1) {
32                continue;
33            }
34          
35            // Track the rightmost position we need to include
36            int requiredEndPos = lastOccurrence[charIdx];
37          
38            // Expand from start position and check if we can form a valid substring
39            for (int currentPos = startPos; currentPos < stringLength; ++currentPos) {
40                int currentCharIndex = s.charAt(currentPos) - 'a';
41                int currentCharFirstPos = firstOccurrence[currentCharIndex];
42                int currentCharLastPos = lastOccurrence[currentCharIndex];
43              
44                // If current character's first occurrence is before our start position,
45                // we cannot form a valid substring
46                if (currentCharFirstPos < startPos) {
47                    break;
48                }
49              
50                // Update the required end position to include all occurrences
51                // of characters we've seen so far
52                requiredEndPos = Math.max(requiredEndPos, currentCharLastPos);
53              
54                // If we've reached the required end position and the substring
55                // is not the entire string, update the maximum length
56                if (requiredEndPos == currentPos && currentPos - startPos + 1 < stringLength) {
57                    maxLength = Math.max(maxLength, currentPos - startPos + 1);
58                }
59            }
60        }
61      
62        return maxLength;
63    }
64}
65
1class Solution {
2public:
3    int maxSubstringLength(string s) {
4        // Store the first occurrence index of each character ('a' to 'z')
5        // Initialize with -1 to indicate character hasn't appeared yet
6        vector<int> firstOccurrence(26, -1);
7      
8        // Store the last occurrence index of each character ('a' to 'z')
9        vector<int> lastOccurrence(26);
10      
11        int stringLength = s.length();
12      
13        // Populate first and last occurrence arrays for each character
14        for (int i = 0; i < stringLength; ++i) {
15            int charIndex = s[i] - 'a';  // Convert character to 0-based index
16          
17            // Record first occurrence if not seen before
18            if (firstOccurrence[charIndex] == -1) {
19                firstOccurrence[charIndex] = i;
20            }
21          
22            // Always update last occurrence
23            lastOccurrence[charIndex] = i;
24        }
25      
26        int maxLength = -1;  // Maximum valid substring length found
27      
28        // Try each character as a potential starting point
29        for (int charIdx = 0; charIdx < 26; ++charIdx) {
30            int startPos = firstOccurrence[charIdx];
31          
32            // Skip if this character doesn't exist in the string
33            if (startPos == -1) {
34                continue;
35            }
36          
37            // Track the maximum required ending position for current substring
38            int maxEndPos = lastOccurrence[charIdx];
39          
40            // Expand substring from startPos, checking validity
41            for (int currentPos = startPos; currentPos < stringLength; ++currentPos) {
42                int currentCharIndex = s[currentPos] - 'a';
43                int currentCharFirstPos = firstOccurrence[currentCharIndex];
44                int currentCharLastPos = lastOccurrence[currentCharIndex];
45              
46                // If current character appears before our start position,
47                // we cannot form a valid substring
48                if (currentCharFirstPos < startPos) {
49                    break;
50                }
51              
52                // Update maximum required ending position
53                maxEndPos = max(maxEndPos, currentCharLastPos);
54              
55                // Check if we've reached a valid substring endpoint:
56                // - All required characters are included (maxEndPos == currentPos)
57                // - The substring is not the entire string (length < stringLength)
58                if (maxEndPos == currentPos && currentPos - startPos + 1 < stringLength) {
59                    maxLength = max(maxLength, currentPos - startPos + 1);
60                }
61            }
62        }
63      
64        return maxLength;
65    }
66};
67
1/**
2 * Finds the maximum length of a substring where all characters in the substring
3 * appear only within that substring and nowhere else in the string.
4 * 
5 * @param s - The input string containing only lowercase letters
6 * @returns The maximum length of such a substring, or -1 if no valid substring exists
7 */
8function maxSubstringLength(s: string): number {
9    // Arrays to track first and last occurrence indices for each letter (a-z)
10    const firstOccurrence: number[] = Array(26).fill(-1);
11    const lastOccurrence: number[] = Array(26).fill(0);
12    const stringLength: number = s.length;
13  
14    // Record the first and last occurrence index for each character
15    for (let i = 0; i < stringLength; ++i) {
16        const charIndex: number = s.charCodeAt(i) - 97; // Convert 'a'-'z' to 0-25
17      
18        if (firstOccurrence[charIndex] === -1) {
19            firstOccurrence[charIndex] = i;
20        }
21        lastOccurrence[charIndex] = i;
22    }
23  
24    let maxLength: number = -1;
25  
26    // Try each character as a potential starting point for a valid substring
27    for (let charCode = 0; charCode < 26; ++charCode) {
28        const startIndex: number = firstOccurrence[charCode];
29      
30        // Skip if this character doesn't exist in the string
31        if (startIndex === -1) {
32            continue;
33        }
34      
35        // Track the maximum required ending position for the current substring
36        let maxEndIndex: number = lastOccurrence[charCode];
37      
38        // Expand the substring from startIndex, checking each character
39        for (let currentIndex = startIndex; currentIndex < stringLength; ++currentIndex) {
40            const currentCharIndex: number = s.charCodeAt(currentIndex) - 97;
41            const currentCharFirstOccurrence: number = firstOccurrence[currentCharIndex];
42          
43            // If character appears before our start, this substring is invalid
44            if (currentCharFirstOccurrence < startIndex) {
45                break;
46            }
47          
48            // Update the required ending position to include all occurrences
49            const currentCharLastOccurrence: number = lastOccurrence[currentCharIndex];
50            maxEndIndex = Math.max(maxEndIndex, currentCharLastOccurrence);
51          
52            // Check if we've found a complete valid substring
53            // (current position matches required end and it's not the entire string)
54            if (maxEndIndex === currentIndex && currentIndex - startIndex + 1 < stringLength) {
55                maxLength = Math.max(maxLength, currentIndex - startIndex + 1);
56            }
57        }
58    }
59  
60    return maxLength;
61}
62

Time and Space Complexity

Time Complexity: O(n × |Σ|)

The algorithm consists of two main parts:

  1. The first loop iterates through the string once to build the first and last dictionaries, taking O(n) time.
  2. The second nested loop structure:
    • The outer loop iterates through each unique character in the first dictionary, which can be at most |Σ| characters
    • For each character, the inner loop can iterate up to n positions in the worst case
    • Therefore, this part takes O(|Σ| × n) time

The overall time complexity is O(n) + O(|Σ| × n) = O(n × |Σ|), where n is the length of string s and |Σ| represents the size of the character set (26 for lowercase letters).

Space Complexity: O(|Σ|)

The algorithm uses:

  • Two dictionaries first and last to store the first and last occurrence indices of each character
  • Each dictionary can store at most |Σ| key-value pairs (one for each unique character in the string)
  • A few additional variables (ans, n, mx, etc.) that use constant space

Therefore, the space complexity is O(|Σ|), where |Σ| = 26 for lowercase English letters.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Not Breaking Early When Character Appears Before Start Position

A critical pitfall is forgetting to break the inner loop when encountering a character that has already appeared before the current starting position. This would lead to incorrect results by considering invalid substrings.

Incorrect Implementation:

for current_index in range(start_index, string_length):
    current_char = s[current_index]
    char_first = first_occurrence[current_char]
    char_last = last_occurrence[current_char]
  
    # MISSING: Should break if char_first < start_index
  
    required_end = max(required_end, char_last)
    # ... rest of code

Why This Fails: Without the break condition, the algorithm would continue processing even when it encounters characters that violate the self-contained property. For example, with s = "abcabc", when starting from index 3 (second 'a'), it would incorrectly consider the substring "abc" at indices 3-5 as valid, even though 'a', 'b', 'c' also appear at indices 0-2.

Solution: Always include the break condition:

if char_first < start_index:
    break

2. Incorrect Validation of Substring Length

Another common mistake is using <= instead of < when checking if the substring length is less than the total string length.

Incorrect Implementation:

if required_end == current_index and current_index - start_index + 1 <= string_length:
    max_length = max(max_length, current_index - start_index + 1)

Why This Fails: Using <= would allow the entire string to be considered as a valid self-contained substring, which violates the constraint that t != s. This would return the length of the entire string instead of -1 for strings like s = "aaa" where no proper self-contained substring exists.

Solution: Use strict inequality:

if required_end == current_index and current_index - start_index + 1 < string_length:
    max_length = max(max_length, current_index - start_index + 1)

3. Not Initializing required_end Correctly

Forgetting to initialize required_end with the last occurrence of the starting character can cause the algorithm to miss valid substrings.

Incorrect Implementation:

for char, start_index in first_occurrence.items():
    required_end = start_index  # Wrong: should be last_occurrence[char]
    # ... rest of code

Why This Fails: If a character appears multiple times, we must ensure all its occurrences are included in the substring. Starting with required_end = start_index would fail to account for later occurrences of the starting character itself.

Solution: Initialize with the last occurrence of the starting character:

required_end = last_occurrence[char]
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Is the following code DFS or BFS?

void search(Node root) {
  if (!root) return;
  visit(root);
  root.visited = true;
  for (Node node in root.adjacent) {
    if (!node.visited) {
      search(node);
    }
  }
}

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More