Facebook Pixel

2707. Extra Characters in a String

Problem Description

You are given a string s (0-indexed) and a dictionary containing a list of valid words. Your task is to break the string s into substrings where each substring must be a word from the dictionary. The substrings cannot overlap with each other.

However, it may not be possible to completely break down the entire string using only dictionary words. Some characters might be left over as "extra characters" that don't belong to any valid dictionary word substring.

Your goal is to find the optimal way to break the string such that the number of these extra characters is minimized.

For example:

  • If s = "leetscode" and dictionary = ["leet", "code", "leetcode"]
  • One way to break it: use "leet" (covers indices 0-3) and "code" (covers indices 4-7), leaving "s" as an extra character
  • Another way: use "leetcode" (covers indices 0-7), leaving "s" as an extra character
  • Both ways result in 1 extra character

The function should return the minimum number of extra characters that will be left over after optimally breaking the string using dictionary words.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

When we look at this problem, we need to make decisions at each position in the string - should we treat a character as extra, or should we try to form a word ending at this position?

Think of it this way: as we scan through the string from left to right, at each position i, we have already solved the problem for all positions before i. Now we need to figure out the minimum extra characters for the substring from index 0 to i.

At position i, we have two choices:

  1. Treat the character at position i-1 as an extra character. If we do this, the total extra characters would be the minimum extra characters up to position i-1 plus 1.
  2. Try to form a valid word that ends at position i. We can check all possible starting positions j (where j < i) and see if the substring from j to i forms a valid dictionary word. If it does, then the minimum extra characters would be the same as the minimum extra characters up to position j.

This naturally leads to a dynamic programming approach where f[i] represents the minimum number of extra characters in the first i characters of the string. We build up the solution incrementally - once we know the answer for smaller substrings, we can use that information to solve for larger substrings.

The key insight is that we don't need to track which words we've used or their positions - we only care about the minimum number of extra characters at each position. By trying all possible ways to form words ending at each position and taking the minimum, we ensure we find the optimal solution.

Using a hash set for the dictionary allows us to quickly check if any substring is a valid word in O(1) time after extracting the substring.

Learn more about Trie and Dynamic Programming patterns.

Solution Approach

We implement the solution using a hash table combined with dynamic programming.

Step 1: Initialize Data Structures

  • Convert the dictionary list into a hash set ss for O(1) lookup time when checking if a substring exists in the dictionary
  • Create a DP array f of size n + 1 where n is the length of string s
  • Initialize f[0] = 0 since there are no extra characters when we have processed 0 characters

Step 2: Build the DP Solution

For each position i from 1 to n:

  1. First assumption: Treat the character at position i-1 as an extra character

    • Set f[i] = f[i - 1] + 1
    • This represents the case where we don't use the current character as part of any dictionary word
  2. Check for valid words ending at position i:

    • Iterate through all possible starting positions j from 0 to i-1
    • Extract the substring s[j:i] (from index j to i-1 inclusive)
    • If this substring exists in our hash set ss:
      • Update f[i] = min(f[i], f[j])
      • This means we can form a valid word from position j to i, so the minimum extra characters at position i would be the same as at position j

State Transition Formula:

f[i]=min{f[i1]+1,minj[0,i1]f[j]}f[i] = \min \{ f[i - 1] + 1, \min_{j \in [0, i - 1]} f[j] \}

where the second term is only considered when s[j:i] is in the dictionary.

Step 3: Return the Result

  • The answer is f[n], which represents the minimum number of extra characters when we've processed all n characters of the string

Time Complexity: O(n²) where n is the length of the string, as we have two nested loops Space Complexity: O(n + m) where m is the total number of characters in all dictionary words (for the hash set) and n for the DP array

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example to illustrate the solution approach.

Example: s = "sayhello" and dictionary = ["say", "hello", "he"]

Step 1: Initialize

  • Convert dictionary to hash set: ss = {"say", "hello", "he"}
  • Create DP array f of size 9 (length of "sayhello" + 1)
  • Initialize f[0] = 0

Step 2: Build DP Solution

i = 1 (processing 's'):

  • Start with f[1] = f[0] + 1 = 1 (treat 's' as extra)
  • Check substring "s" (j=0 to i=1): not in dictionary
  • Final: f[1] = 1

i = 2 (processing 'sa'):

  • Start with f[2] = f[1] + 1 = 2 (treat 'a' as extra)
  • Check "sa" (j=0): not in dictionary
  • Check "a" (j=1): not in dictionary
  • Final: f[2] = 2

i = 3 (processing 'say'):

  • Start with f[3] = f[2] + 1 = 3 (treat 'y' as extra)
  • Check "say" (j=0): in dictionary!f[3] = min(3, f[0]) = 0
  • Check "ay" (j=1): not in dictionary
  • Check "y" (j=2): not in dictionary
  • Final: f[3] = 0

i = 4 (processing 'sayh'):

  • Start with f[4] = f[3] + 1 = 1
  • Check all substrings ending at position 4: none in dictionary
  • Final: f[4] = 1

i = 5 (processing 'sayhe'):

  • Start with f[5] = f[4] + 1 = 2
  • Check "sayhe" (j=0): not in dictionary
  • Check "ayhe" (j=1): not in dictionary
  • Check "yhe" (j=2): not in dictionary
  • Check "he" (j=3): in dictionary!f[5] = min(2, f[3]) = 0
  • Check "e" (j=4): not in dictionary
  • Final: f[5] = 0

i = 6 (processing 'sayhel'):

  • Start with f[6] = f[5] + 1 = 1
  • Check all substrings: none in dictionary
  • Final: f[6] = 1

i = 7 (processing 'sayhell'):

  • Start with f[7] = f[6] + 1 = 2
  • Check all substrings: none in dictionary
  • Final: f[7] = 2

i = 8 (processing 'sayhello'):

  • Start with f[8] = f[7] + 1 = 3
  • Check "sayhello" (j=0): not in dictionary
  • Check "ayhello" (j=1): not in dictionary
  • Check "yhello" (j=2): not in dictionary
  • Check "hello" (j=3): in dictionary!f[8] = min(3, f[3]) = 0
  • Continue checking other substrings: none match
  • Final: f[8] = 0

Result: f[8] = 0, meaning we can break "sayhello" perfectly into "say" + "hello" with 0 extra characters.

The DP array progression: [0, 1, 2, 0, 1, 0, 1, 2, 0]

This shows how at each position, we either add 1 to the previous minimum (treating current character as extra) or find a valid word ending at the current position that gives us a better result.

Solution Implementation

1class Solution:
2    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
3        # Convert dictionary list to set for O(1) lookup
4        word_set = set(dictionary)
5        n = len(s)
6      
7        # dp[i] represents minimum extra characters needed for s[0:i]
8        # dp[0] = 0 means empty string needs 0 extra characters
9        dp = [0] * (n + 1)
10      
11        # Process each position in the string
12        for i in range(1, n + 1):
13            # Option 1: Treat s[i-1] as an extra character
14            # This means we take the previous result and add 1
15            dp[i] = dp[i - 1] + 1
16          
17            # Option 2: Check if any substring ending at position i is in dictionary
18            # Try all possible starting positions j for substring s[j:i]
19            for j in range(i):
20                # If substring s[j:i] is a valid word in dictionary
21                # We can potentially use dp[j] (no extra chars needed for this word)
22                if s[j:i] in word_set:
23                    # Update dp[i] with the minimum value
24                    dp[i] = min(dp[i], dp[j])
25      
26        # Return minimum extra characters needed for entire string
27        return dp[n]
28
1class Solution {
2    public int minExtraChar(String s, String[] dictionary) {
3        // Convert dictionary array to HashSet for O(1) lookup
4        Set<String> dictionarySet = new HashSet<>();
5        for (String word : dictionary) {
6            dictionarySet.add(word);
7        }
8      
9        int stringLength = s.length();
10      
11        // dp[i] represents the minimum number of extra characters in s[0...i-1]
12        // dp[0] means empty string, so we need array of size stringLength + 1
13        int[] dp = new int[stringLength + 1];
14      
15        // Base case: empty string has 0 extra characters
16        dp[0] = 0;
17      
18        // Fill the dp array for each position
19        for (int i = 1; i <= stringLength; ++i) {
20            // Option 1: Treat character at position i-1 as extra (not part of any dictionary word)
21            dp[i] = dp[i - 1] + 1;
22          
23            // Option 2: Check if substring ending at position i matches any dictionary word
24            for (int j = 0; j < i; ++j) {
25                // Check if substring from j to i exists in dictionary
26                if (dictionarySet.contains(s.substring(j, i))) {
27                    // If found, we can use this word, so extra chars = dp[j]
28                    dp[i] = Math.min(dp[i], dp[j]);
29                }
30            }
31        }
32      
33        // Return minimum extra characters for the entire string
34        return dp[stringLength];
35    }
36}
37
1class Solution {
2public:
3    int minExtraChar(string s, vector<string>& dictionary) {
4        // Convert dictionary to hash set for O(1) lookup
5        unordered_set<string> dictSet(dictionary.begin(), dictionary.end());
6      
7        int n = s.size();
8      
9        // dp[i] represents the minimum number of extra characters 
10        // when considering the first i characters of string s
11        vector<int> dp(n + 1);
12      
13        // Base case: no characters means no extra characters
14        dp[0] = 0;
15      
16        // Fill the dp array for each position
17        for (int i = 1; i <= n; ++i) {
18            // Option 1: Treat current character as extra (not part of any dictionary word)
19            dp[i] = dp[i - 1] + 1;
20          
21            // Option 2: Try to match dictionary words ending at position i
22            for (int j = 0; j < i; ++j) {
23                // Check if substring from j to i exists in dictionary
24                string substring = s.substr(j, i - j);
25                if (dictSet.count(substring)) {
26                    // If word found, update dp[i] with minimum extra characters
27                    dp[i] = min(dp[i], dp[j]);
28                }
29            }
30        }
31      
32        // Return minimum extra characters for the entire string
33        return dp[n];
34    }
35};
36
1/**
2 * Finds the minimum number of extra characters left over if you break up the string optimally
3 * such that each substring is in the dictionary
4 * @param s - The input string to be broken up
5 * @param dictionary - Array of valid words that can be used
6 * @returns The minimum number of extra characters
7 */
8function minExtraChar(s: string, dictionary: string[]): number {
9    // Convert dictionary array to Set for O(1) lookup
10    const dictionarySet: Set<string> = new Set(dictionary);
11  
12    // Length of the input string
13    const stringLength: number = s.length;
14  
15    // dp[i] represents minimum extra characters for substring s[0...i-1]
16    // Initialize with zeros, dp[0] = 0 means empty string has 0 extra characters
17    const dp: number[] = new Array(stringLength + 1).fill(0);
18  
19    // Process each position in the string
20    for (let endIndex = 1; endIndex <= stringLength; ++endIndex) {
21        // Default case: current character is extra (not part of any dictionary word)
22        dp[endIndex] = dp[endIndex - 1] + 1;
23      
24        // Check all possible starting positions for substrings ending at current position
25        for (let startIndex = 0; startIndex < endIndex; ++startIndex) {
26            // Extract substring from startIndex to endIndex (exclusive)
27            const currentSubstring: string = s.substring(startIndex, endIndex);
28          
29            // If substring exists in dictionary, update minimum extra characters
30            if (dictionarySet.has(currentSubstring)) {
31                dp[endIndex] = Math.min(dp[endIndex], dp[startIndex]);
32            }
33        }
34    }
35  
36    // Return minimum extra characters for the entire string
37    return dp[stringLength];
38}
39

Time and Space Complexity

Time Complexity: O(n^3 + L)

The time complexity consists of two parts:

  • Converting the dictionary list to a set takes O(L) time, where L is the sum of lengths of all words in the dictionary
  • The nested loops have complexity O(n^2) for iterating through all pairs (i, j) where 1 ≤ i ≤ n and 0 ≤ j < i
  • For each pair (i, j), creating the substring s[j:i] takes O(i - j) time, which in the worst case is O(n)
  • Checking if the substring exists in the set takes O(i - j) time for hash computation, which is also O(n) in the worst case
  • Therefore, the total time for the nested loops is O(n^2) × O(n) = O(n^3)
  • Combined: O(n^3 + L)

Space Complexity: O(n + L)

The space complexity consists of:

  • The set ss stores all dictionary words, requiring O(L) space where L is the sum of lengths of all words
  • The DP array f of size n + 1 requires O(n) space
  • The substring operations create temporary strings but they don't accumulate, so they don't affect the overall space complexity
  • Combined: O(n + L)

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Inefficient Substring Checking with Long Dictionary Words

The Pitfall: When the dictionary contains very long words or when the string s is long, checking all possible substrings s[j:i] can become inefficient. Even though each substring check is O(1) with a hash set, creating the substring itself takes O(length) time, and we're creating O(n²) substrings in total.

Example scenario:

  • If s has length 1000 and the dictionary contains words up to length 100
  • We're potentially creating and checking many long substrings that will never match

Solution: Add an optimization to limit the substring length based on the maximum word length in the dictionary:

class Solution:
    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
        word_set = set(dictionary)
        n = len(s)
      
        # Find the maximum word length in dictionary to optimize
        max_word_len = max(len(word) for word in dictionary) if dictionary else 0
      
        dp = [0] * (n + 1)
      
        for i in range(1, n + 1):
            dp[i] = dp[i - 1] + 1
          
            # Only check substrings up to max_word_len
            # Start from max(0, i - max_word_len) instead of 0
            for j in range(max(0, i - max_word_len), i):
                if s[j:i] in word_set:
                    dp[i] = min(dp[i], dp[j])
      
        return dp[n]

2. Memory Issues with Very Large Dictionaries

The Pitfall: When converting a large dictionary to a set, we might encounter memory issues if the dictionary contains millions of words or very long strings.

Solution: Use a Trie data structure instead of a hash set for more memory-efficient storage:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_word = False

class Solution:
    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
        # Build Trie
        root = TrieNode()
        for word in dictionary:
            node = root
            for char in word:
                if char not in node.children:
                    node.children[char] = TrieNode()
                node = node.children[char]
            node.is_word = True
      
        n = len(s)
        dp = [0] * (n + 1)
      
        for i in range(1, n + 1):
            dp[i] = dp[i - 1] + 1
          
            # Check using Trie
            node = root
            for j in range(i - 1, -1, -1):
                if s[j] not in node.children:
                    break
                node = node.children[s[j]]
                if node.is_word:
                    dp[i] = min(dp[i], dp[j])
      
        return dp[n]

3. Integer Overflow in Other Languages

The Pitfall: While Python handles arbitrary-precision integers automatically, implementing this solution in languages like C++ or Java might face integer overflow if not careful with initialization.

Solution: In other languages, be careful with initialization values. Instead of using INT_MAX or similar large values, the incremental approach (starting with dp[i] = dp[i-1] + 1) naturally avoids overflow issues.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which data structure is used in a depth first search?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More