2707. Extra Characters in a String
Problem Description
You are given a string s
(0-indexed) and a dictionary containing a list of valid words. Your task is to break the string s
into substrings where each substring must be a word from the dictionary. The substrings cannot overlap with each other.
However, it may not be possible to completely break down the entire string using only dictionary words. Some characters might be left over as "extra characters" that don't belong to any valid dictionary word substring.
Your goal is to find the optimal way to break the string such that the number of these extra characters is minimized.
For example:
- If
s = "leetscode"
anddictionary = ["leet", "code", "leetcode"]
- One way to break it: use "leet" (covers indices 0-3) and "code" (covers indices 4-7), leaving "s" as an extra character
- Another way: use "leetcode" (covers indices 0-7), leaving "s" as an extra character
- Both ways result in 1 extra character
The function should return the minimum number of extra characters that will be left over after optimally breaking the string using dictionary words.
Intuition
When we look at this problem, we need to make decisions at each position in the string - should we treat a character as extra, or should we try to form a word ending at this position?
Think of it this way: as we scan through the string from left to right, at each position i
, we have already solved the problem for all positions before i
. Now we need to figure out the minimum extra characters for the substring from index 0 to i
.
At position i
, we have two choices:
- Treat the character at position
i-1
as an extra character. If we do this, the total extra characters would be the minimum extra characters up to positioni-1
plus 1. - Try to form a valid word that ends at position
i
. We can check all possible starting positionsj
(wherej < i
) and see if the substring fromj
toi
forms a valid dictionary word. If it does, then the minimum extra characters would be the same as the minimum extra characters up to positionj
.
This naturally leads to a dynamic programming approach where f[i]
represents the minimum number of extra characters in the first i
characters of the string. We build up the solution incrementally - once we know the answer for smaller substrings, we can use that information to solve for larger substrings.
The key insight is that we don't need to track which words we've used or their positions - we only care about the minimum number of extra characters at each position. By trying all possible ways to form words ending at each position and taking the minimum, we ensure we find the optimal solution.
Using a hash set for the dictionary allows us to quickly check if any substring is a valid word in O(1)
time after extracting the substring.
Learn more about Trie and Dynamic Programming patterns.
Solution Approach
We implement the solution using a hash table combined with dynamic programming.
Step 1: Initialize Data Structures
- Convert the dictionary list into a hash set
ss
forO(1)
lookup time when checking if a substring exists in the dictionary - Create a DP array
f
of sizen + 1
wheren
is the length of strings
- Initialize
f[0] = 0
since there are no extra characters when we have processed 0 characters
Step 2: Build the DP Solution
For each position i
from 1 to n
:
-
First assumption: Treat the character at position
i-1
as an extra character- Set
f[i] = f[i - 1] + 1
- This represents the case where we don't use the current character as part of any dictionary word
- Set
-
Check for valid words ending at position
i
:- Iterate through all possible starting positions
j
from 0 toi-1
- Extract the substring
s[j:i]
(from indexj
toi-1
inclusive) - If this substring exists in our hash set
ss
:- Update
f[i] = min(f[i], f[j])
- This means we can form a valid word from position
j
toi
, so the minimum extra characters at positioni
would be the same as at positionj
- Update
- Iterate through all possible starting positions
State Transition Formula:
where the second term is only considered when s[j:i]
is in the dictionary.
Step 3: Return the Result
- The answer is
f[n]
, which represents the minimum number of extra characters when we've processed alln
characters of the string
Time Complexity: O(n²)
where n
is the length of the string, as we have two nested loops
Space Complexity: O(n + m)
where m
is the total number of characters in all dictionary words (for the hash set) and n
for the DP array
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate the solution approach.
Example: s = "sayhello"
and dictionary = ["say", "hello", "he"]
Step 1: Initialize
- Convert dictionary to hash set:
ss = {"say", "hello", "he"}
- Create DP array
f
of size 9 (length of "sayhello" + 1) - Initialize
f[0] = 0
Step 2: Build DP Solution
i = 1 (processing 's'):
- Start with
f[1] = f[0] + 1 = 1
(treat 's' as extra) - Check substring "s" (j=0 to i=1): not in dictionary
- Final:
f[1] = 1
i = 2 (processing 'sa'):
- Start with
f[2] = f[1] + 1 = 2
(treat 'a' as extra) - Check "sa" (j=0): not in dictionary
- Check "a" (j=1): not in dictionary
- Final:
f[2] = 2
i = 3 (processing 'say'):
- Start with
f[3] = f[2] + 1 = 3
(treat 'y' as extra) - Check "say" (j=0): in dictionary! →
f[3] = min(3, f[0]) = 0
- Check "ay" (j=1): not in dictionary
- Check "y" (j=2): not in dictionary
- Final:
f[3] = 0
i = 4 (processing 'sayh'):
- Start with
f[4] = f[3] + 1 = 1
- Check all substrings ending at position 4: none in dictionary
- Final:
f[4] = 1
i = 5 (processing 'sayhe'):
- Start with
f[5] = f[4] + 1 = 2
- Check "sayhe" (j=0): not in dictionary
- Check "ayhe" (j=1): not in dictionary
- Check "yhe" (j=2): not in dictionary
- Check "he" (j=3): in dictionary! →
f[5] = min(2, f[3]) = 0
- Check "e" (j=4): not in dictionary
- Final:
f[5] = 0
i = 6 (processing 'sayhel'):
- Start with
f[6] = f[5] + 1 = 1
- Check all substrings: none in dictionary
- Final:
f[6] = 1
i = 7 (processing 'sayhell'):
- Start with
f[7] = f[6] + 1 = 2
- Check all substrings: none in dictionary
- Final:
f[7] = 2
i = 8 (processing 'sayhello'):
- Start with
f[8] = f[7] + 1 = 3
- Check "sayhello" (j=0): not in dictionary
- Check "ayhello" (j=1): not in dictionary
- Check "yhello" (j=2): not in dictionary
- Check "hello" (j=3): in dictionary! →
f[8] = min(3, f[3]) = 0
- Continue checking other substrings: none match
- Final:
f[8] = 0
Result: f[8] = 0
, meaning we can break "sayhello" perfectly into "say" + "hello" with 0 extra characters.
The DP array progression: [0, 1, 2, 0, 1, 0, 1, 2, 0]
This shows how at each position, we either add 1 to the previous minimum (treating current character as extra) or find a valid word ending at the current position that gives us a better result.
Solution Implementation
1class Solution:
2 def minExtraChar(self, s: str, dictionary: List[str]) -> int:
3 # Convert dictionary list to set for O(1) lookup
4 word_set = set(dictionary)
5 n = len(s)
6
7 # dp[i] represents minimum extra characters needed for s[0:i]
8 # dp[0] = 0 means empty string needs 0 extra characters
9 dp = [0] * (n + 1)
10
11 # Process each position in the string
12 for i in range(1, n + 1):
13 # Option 1: Treat s[i-1] as an extra character
14 # This means we take the previous result and add 1
15 dp[i] = dp[i - 1] + 1
16
17 # Option 2: Check if any substring ending at position i is in dictionary
18 # Try all possible starting positions j for substring s[j:i]
19 for j in range(i):
20 # If substring s[j:i] is a valid word in dictionary
21 # We can potentially use dp[j] (no extra chars needed for this word)
22 if s[j:i] in word_set:
23 # Update dp[i] with the minimum value
24 dp[i] = min(dp[i], dp[j])
25
26 # Return minimum extra characters needed for entire string
27 return dp[n]
28
1class Solution {
2 public int minExtraChar(String s, String[] dictionary) {
3 // Convert dictionary array to HashSet for O(1) lookup
4 Set<String> dictionarySet = new HashSet<>();
5 for (String word : dictionary) {
6 dictionarySet.add(word);
7 }
8
9 int stringLength = s.length();
10
11 // dp[i] represents the minimum number of extra characters in s[0...i-1]
12 // dp[0] means empty string, so we need array of size stringLength + 1
13 int[] dp = new int[stringLength + 1];
14
15 // Base case: empty string has 0 extra characters
16 dp[0] = 0;
17
18 // Fill the dp array for each position
19 for (int i = 1; i <= stringLength; ++i) {
20 // Option 1: Treat character at position i-1 as extra (not part of any dictionary word)
21 dp[i] = dp[i - 1] + 1;
22
23 // Option 2: Check if substring ending at position i matches any dictionary word
24 for (int j = 0; j < i; ++j) {
25 // Check if substring from j to i exists in dictionary
26 if (dictionarySet.contains(s.substring(j, i))) {
27 // If found, we can use this word, so extra chars = dp[j]
28 dp[i] = Math.min(dp[i], dp[j]);
29 }
30 }
31 }
32
33 // Return minimum extra characters for the entire string
34 return dp[stringLength];
35 }
36}
37
1class Solution {
2public:
3 int minExtraChar(string s, vector<string>& dictionary) {
4 // Convert dictionary to hash set for O(1) lookup
5 unordered_set<string> dictSet(dictionary.begin(), dictionary.end());
6
7 int n = s.size();
8
9 // dp[i] represents the minimum number of extra characters
10 // when considering the first i characters of string s
11 vector<int> dp(n + 1);
12
13 // Base case: no characters means no extra characters
14 dp[0] = 0;
15
16 // Fill the dp array for each position
17 for (int i = 1; i <= n; ++i) {
18 // Option 1: Treat current character as extra (not part of any dictionary word)
19 dp[i] = dp[i - 1] + 1;
20
21 // Option 2: Try to match dictionary words ending at position i
22 for (int j = 0; j < i; ++j) {
23 // Check if substring from j to i exists in dictionary
24 string substring = s.substr(j, i - j);
25 if (dictSet.count(substring)) {
26 // If word found, update dp[i] with minimum extra characters
27 dp[i] = min(dp[i], dp[j]);
28 }
29 }
30 }
31
32 // Return minimum extra characters for the entire string
33 return dp[n];
34 }
35};
36
1/**
2 * Finds the minimum number of extra characters left over if you break up the string optimally
3 * such that each substring is in the dictionary
4 * @param s - The input string to be broken up
5 * @param dictionary - Array of valid words that can be used
6 * @returns The minimum number of extra characters
7 */
8function minExtraChar(s: string, dictionary: string[]): number {
9 // Convert dictionary array to Set for O(1) lookup
10 const dictionarySet: Set<string> = new Set(dictionary);
11
12 // Length of the input string
13 const stringLength: number = s.length;
14
15 // dp[i] represents minimum extra characters for substring s[0...i-1]
16 // Initialize with zeros, dp[0] = 0 means empty string has 0 extra characters
17 const dp: number[] = new Array(stringLength + 1).fill(0);
18
19 // Process each position in the string
20 for (let endIndex = 1; endIndex <= stringLength; ++endIndex) {
21 // Default case: current character is extra (not part of any dictionary word)
22 dp[endIndex] = dp[endIndex - 1] + 1;
23
24 // Check all possible starting positions for substrings ending at current position
25 for (let startIndex = 0; startIndex < endIndex; ++startIndex) {
26 // Extract substring from startIndex to endIndex (exclusive)
27 const currentSubstring: string = s.substring(startIndex, endIndex);
28
29 // If substring exists in dictionary, update minimum extra characters
30 if (dictionarySet.has(currentSubstring)) {
31 dp[endIndex] = Math.min(dp[endIndex], dp[startIndex]);
32 }
33 }
34 }
35
36 // Return minimum extra characters for the entire string
37 return dp[stringLength];
38}
39
Time and Space Complexity
Time Complexity: O(n^3 + L)
The time complexity consists of two parts:
- Converting the dictionary list to a set takes
O(L)
time, whereL
is the sum of lengths of all words in the dictionary - The nested loops have complexity
O(n^2)
for iterating through all pairs(i, j)
where1 ≤ i ≤ n
and0 ≤ j < i
- For each pair
(i, j)
, creating the substrings[j:i]
takesO(i - j)
time, which in the worst case isO(n)
- Checking if the substring exists in the set takes
O(i - j)
time for hash computation, which is alsoO(n)
in the worst case - Therefore, the total time for the nested loops is
O(n^2) × O(n) = O(n^3)
- Combined:
O(n^3 + L)
Space Complexity: O(n + L)
The space complexity consists of:
- The set
ss
stores all dictionary words, requiringO(L)
space whereL
is the sum of lengths of all words - The DP array
f
of sizen + 1
requiresO(n)
space - The substring operations create temporary strings but they don't accumulate, so they don't affect the overall space complexity
- Combined:
O(n + L)
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Inefficient Substring Checking with Long Dictionary Words
The Pitfall:
When the dictionary contains very long words or when the string s
is long, checking all possible substrings s[j:i]
can become inefficient. Even though each substring check is O(1) with a hash set, creating the substring itself takes O(length) time, and we're creating O(n²) substrings in total.
Example scenario:
- If
s
has length 1000 and the dictionary contains words up to length 100 - We're potentially creating and checking many long substrings that will never match
Solution: Add an optimization to limit the substring length based on the maximum word length in the dictionary:
class Solution:
def minExtraChar(self, s: str, dictionary: List[str]) -> int:
word_set = set(dictionary)
n = len(s)
# Find the maximum word length in dictionary to optimize
max_word_len = max(len(word) for word in dictionary) if dictionary else 0
dp = [0] * (n + 1)
for i in range(1, n + 1):
dp[i] = dp[i - 1] + 1
# Only check substrings up to max_word_len
# Start from max(0, i - max_word_len) instead of 0
for j in range(max(0, i - max_word_len), i):
if s[j:i] in word_set:
dp[i] = min(dp[i], dp[j])
return dp[n]
2. Memory Issues with Very Large Dictionaries
The Pitfall: When converting a large dictionary to a set, we might encounter memory issues if the dictionary contains millions of words or very long strings.
Solution: Use a Trie data structure instead of a hash set for more memory-efficient storage:
class TrieNode:
def __init__(self):
self.children = {}
self.is_word = False
class Solution:
def minExtraChar(self, s: str, dictionary: List[str]) -> int:
# Build Trie
root = TrieNode()
for word in dictionary:
node = root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_word = True
n = len(s)
dp = [0] * (n + 1)
for i in range(1, n + 1):
dp[i] = dp[i - 1] + 1
# Check using Trie
node = root
for j in range(i - 1, -1, -1):
if s[j] not in node.children:
break
node = node.children[s[j]]
if node.is_word:
dp[i] = min(dp[i], dp[j])
return dp[n]
3. Integer Overflow in Other Languages
The Pitfall: While Python handles arbitrary-precision integers automatically, implementing this solution in languages like C++ or Java might face integer overflow if not careful with initialization.
Solution:
In other languages, be careful with initialization values. Instead of using INT_MAX
or similar large values, the incremental approach (starting with dp[i] = dp[i-1] + 1
) naturally avoids overflow issues.
Which data structure is used in a depth first search?
Recommended Readings
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Don’t Miss This!