1698. Number of Distinct Substrings in a String π
Problem Description
Given a string s
, you need to find and return the total number of distinct (unique) substrings that can be formed from it.
A substring is a contiguous sequence of characters within the string. You can obtain a substring by removing any number of characters (including zero) from the beginning of the string and any number of characters (including zero) from the end of the string.
For example, if s = "abc"
, the possible substrings are:
- Length 1:
"a"
,"b"
,"c"
- Length 2:
"ab"
,"bc"
- Length 3:
"abc"
All these substrings are distinct, so the answer would be 6.
The solution uses a brute force approach by generating all possible substrings using nested loops. The outer loop i
represents the starting position of the substring, and the inner loop j
represents the ending position (exclusive). By using s[i:j]
for all valid combinations of i
and j
where 0 β€ i < j β€ n
, we generate all possible substrings. These substrings are stored in a set, which automatically handles duplicates, keeping only distinct values. The final answer is the size of this set, representing the count of distinct substrings.
Intuition
When we need to count distinct substrings, the key insight is that we need to examine every possible substring and keep track of which ones we've seen before. Since a substring is defined by its starting and ending positions in the original string, we can systematically generate all possibilities.
Think of it this way: for a string of length n
, each substring can be uniquely identified by two things:
- Where it starts (position
i
) - Where it ends (position
j
)
This naturally leads us to use two nested loops - one to fix the starting position and another to extend the ending position. For each starting position i
, we can create substrings ending at positions i+1
, i+2
, ... up to n
.
The challenge is handling duplicates. For instance, in the string "aaa"
, the substring "a"
appears at multiple positions, but we should count it only once. This is where Python's set
data structure becomes perfect - it automatically handles uniqueness for us. By adding each substring s[i:j]
to a set, duplicate substrings are ignored, and we're left with only the distinct ones.
The beauty of this approach is its simplicity: generate all substrings using slice notation s[i:j]
, let the set handle deduplication, and return the size of the set. While this may not be the most efficient solution for very long strings, it's straightforward to understand and implement, making it an excellent starting point for solving this problem.
Learn more about Trie patterns.
Solution Approach
The solution uses a brute force enumeration strategy to find all distinct substrings. Let's walk through the implementation step by step:
-
Get the string length: First, we store the length of the input string
s
in variablen
for convenience. -
Generate all substrings using nested loops: The core of the solution uses a set comprehension with two nested loops:
- The outer loop:
for i in range(n)
iterates through all possible starting positions (from index 0 to n-1) - The inner loop:
for j in range(i + 1, n + 1)
iterates through all possible ending positions for each starting positioni
- The range
(i + 1, n + 1)
ensures that:j
is always greater thani
(so we get valid substrings)j
can go up ton
(to include substrings that end at the last character)
- The outer loop:
-
Extract substrings using slicing: For each pair of
(i, j)
, we extract the substring using Python's slice notations[i:j]
. This gives us the substring starting at indexi
and ending at indexj-1
. -
Use a set for automatic deduplication: All generated substrings are placed in a set using the set comprehension
{s[i:j] for i in range(n) for j in range(i + 1, n + 1)}
. The set data structure automatically handles duplicate removal - if the same substring appears multiple times, it's only stored once. -
Return the count: Finally, we use
len()
to get the size of the set, which gives us the count of distinct substrings.
Time Complexity: O(nΒ²)
for generating all substrings, where each substring extraction takes O(n)
time in the worst case, leading to overall O(nΒ³)
.
Space Complexity: O(nΒ²)
in the worst case when all substrings are distinct, as we need to store them in the set.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with the string s = "aba"
.
Step 1: Initialize
- String:
s = "aba"
- Length:
n = 3
Step 2: Generate all substrings using nested loops
We'll systematically generate substrings by fixing a starting position i
and varying the ending position j
:
-
When i = 0 (starting from 'a' at index 0):
- j = 1:
s[0:1] = "a"
- j = 2:
s[0:2] = "ab"
- j = 3:
s[0:3] = "aba"
- j = 1:
-
When i = 1 (starting from 'b' at index 1):
- j = 2:
s[1:2] = "b"
- j = 3:
s[1:3] = "ba"
- j = 2:
-
When i = 2 (starting from 'a' at index 2):
- j = 3:
s[2:3] = "a"
- j = 3:
Step 3: Collect in a set
All generated substrings: ["a", "ab", "aba", "b", "ba", "a"]
When we add these to a set, the duplicate "a"
(which appears twice - once from position 0 and once from position 2) is automatically removed.
Final set: {"a", "ab", "aba", "b", "ba"}
Step 4: Count distinct substrings
The set contains 5 distinct substrings, so the answer is 5.
This example demonstrates how the nested loops generate all possible substrings and how the set automatically handles deduplication when we encounter the same substring at different positions.
Solution Implementation
1class Solution:
2 def countDistinct(self, s: str) -> int:
3 """
4 Count the number of distinct substrings in a given string.
5
6 Args:
7 s: Input string
8
9 Returns:
10 Number of distinct substrings
11 """
12 n = len(s)
13
14 # Generate all possible substrings using set comprehension
15 # i represents the starting index (inclusive)
16 # j represents the ending index (exclusive)
17 # s[i:j] extracts substring from index i to j-1
18 distinct_substrings = {s[i:j] for i in range(n) for j in range(i + 1, n + 1)}
19
20 # Return the count of unique substrings
21 return len(distinct_substrings)
22
1class Solution {
2 /**
3 * Counts the number of distinct substrings in a given string.
4 *
5 * @param s the input string
6 * @return the count of distinct substrings
7 */
8 public int countDistinct(String s) {
9 // Use a HashSet to store unique substrings
10 Set<String> distinctSubstrings = new HashSet<>();
11
12 // Get the length of the input string
13 int length = s.length();
14
15 // Iterate through all possible starting positions
16 for (int startIndex = 0; startIndex < length; ++startIndex) {
17 // For each starting position, iterate through all possible ending positions
18 for (int endIndex = startIndex + 1; endIndex <= length; ++endIndex) {
19 // Extract substring from startIndex to endIndex (exclusive)
20 // and add it to the set (duplicates are automatically handled)
21 distinctSubstrings.add(s.substring(startIndex, endIndex));
22 }
23 }
24
25 // Return the number of distinct substrings
26 return distinctSubstrings.size();
27 }
28}
29
1class Solution {
2public:
3 int countDistinct(string s) {
4 // Use unordered_set to store unique substrings
5 // string_view provides a lightweight view without copying
6 unordered_set<string_view> uniqueSubstrings;
7
8 // Get the length of the input string
9 int stringLength = s.size();
10
11 // Create a string_view of the entire input string for efficient substring operations
12 string_view stringView = s;
13
14 // Iterate through all possible starting positions
15 for (int startIndex = 0; startIndex < stringLength; ++startIndex) {
16 // For each starting position, iterate through all possible ending positions
17 for (int endIndex = startIndex + 1; endIndex <= stringLength; ++endIndex) {
18 // Extract substring from startIndex with length (endIndex - startIndex)
19 string_view currentSubstring = stringView.substr(startIndex, endIndex - startIndex);
20
21 // Insert the substring into the set (duplicates are automatically handled)
22 uniqueSubstrings.insert(currentSubstring);
23 }
24 }
25
26 // Return the count of distinct substrings
27 return uniqueSubstrings.size();
28 }
29};
30
1function countDistinct(s: string): number {
2 // Use a Set to store unique substrings
3 // Set automatically handles duplicates
4 const uniqueSubstrings: Set<string> = new Set();
5
6 // Get the length of the input string
7 const stringLength: number = s.length;
8
9 // Iterate through all possible starting positions
10 for (let startIndex = 0; startIndex < stringLength; startIndex++) {
11 // For each starting position, iterate through all possible ending positions
12 for (let endIndex = startIndex + 1; endIndex <= stringLength; endIndex++) {
13 // Extract substring from startIndex to endIndex (exclusive)
14 // substring() method takes start and end indices
15 const currentSubstring: string = s.substring(startIndex, endIndex);
16
17 // Add the substring to the set (duplicates are automatically handled)
18 uniqueSubstrings.add(currentSubstring);
19 }
20 }
21
22 // Return the count of distinct substrings
23 return uniqueSubstrings.size;
24}
25
Time and Space Complexity
Time Complexity: O(n^3)
The code uses nested loops to generate all possible substrings. The outer loop runs n
times (for each starting position i
), and the inner loop runs up to n
times for each i
(for each ending position j
). This gives us O(n^2)
substring generations. However, each substring extraction s[i:j]
takes O(j-i)
time, which in the worst case is O(n)
. Additionally, adding each substring to the set requires hashing, which takes O(length of substring)
, also up to O(n)
in the worst case. Therefore, the overall time complexity is O(n^2) * O(n) = O(n^3)
.
Space Complexity: O(n^2)
The set stores all distinct substrings. In the worst case where all characters are unique, the number of distinct substrings is n * (n + 1) / 2
, which is O(n^2)
. Each substring can have a maximum length of n
, so theoretically the total space for storing all substrings could be O(n^3)
. However, since the problem typically considers the space complexity in terms of the number of distinct elements stored rather than their total size, and given the reference answer, the space complexity is O(n^2)
.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Off-by-One Error in the Inner Loop Range
One of the most common mistakes is incorrectly setting the range for the inner loop j
. Developers often write:
Incorrect:
distinct_substrings = {s[i:j] for i in range(n) for j in range(i, n)}
Why it's wrong:
- When
j = i
, the slices[i:i]
produces an empty string""
, which is not a valid substring - When
j = n-1
, we miss substrings that include the last character
Correct approach:
distinct_substrings = {s[i:j] for i in range(n) for j in range(i + 1, n + 1)}
i + 1
ensures we skip empty strings (substring must have at least one character)n + 1
ensures we include substrings ending at the last character (since Python slicing is exclusive of the end index)
2. Using a List Instead of a Set
Incorrect:
distinct_substrings = [s[i:j] for i in range(n) for j in range(i + 1, n + 1)]
return len(distinct_substrings)
Why it's wrong: Lists don't automatically remove duplicates. If the string has repeating patterns (like "aab"
), the substring "a"
appears twice, and using a list would count it twice.
Solution: Always use a set to automatically handle deduplication:
distinct_substrings = {s[i:j] for i in range(n) for j in range(i + 1, n + 1)}
3. Memory Issues with Very Long Strings
For strings with length approaching 10^4 or more, storing all substrings can cause memory issues since there are O(nΒ²)
possible substrings.
Alternative approach for memory-constrained scenarios:
def countDistinct(self, s: str) -> int:
n = len(s)
distinct_substrings = set()
for i in range(n):
for j in range(i + 1, n + 1):
distinct_substrings.add(s[i:j])
return len(distinct_substrings)
While this doesn't fundamentally reduce memory usage, it allows for potential optimizations like:
- Early termination if a memory limit is reached
- Batch processing of substrings
- Using a more memory-efficient data structure like a Trie for very large inputs
4. Forgetting Edge Cases
Developers sometimes forget to handle edge cases properly:
Edge cases to consider:
- Empty string:
s = ""
should return 0 - Single character:
s = "a"
should return 1 - All same characters:
s = "aaaa"
has only 4 distinct substrings: "a", "aa", "aaa", "aaaa"
Robust solution:
def countDistinct(self, s: str) -> int:
if not s: # Handle empty string
return 0
n = len(s)
distinct_substrings = {s[i:j] for i in range(n) for j in range(i + 1, n + 1)}
return len(distinct_substrings)
What's the output of running the following function using the following tree as input?
1def serialize(root):
2 res = []
3 def dfs(root):
4 if not root:
5 res.append('x')
6 return
7 res.append(root.val)
8 dfs(root.left)
9 dfs(root.right)
10 dfs(root)
11 return ' '.join(res)
12
1import java.util.StringJoiner;
2
3public static String serialize(Node root) {
4 StringJoiner res = new StringJoiner(" ");
5 serializeDFS(root, res);
6 return res.toString();
7}
8
9private static void serializeDFS(Node root, StringJoiner result) {
10 if (root == null) {
11 result.add("x");
12 return;
13 }
14 result.add(Integer.toString(root.val));
15 serializeDFS(root.left, result);
16 serializeDFS(root.right, result);
17}
18
1function serialize(root) {
2 let res = [];
3 serialize_dfs(root, res);
4 return res.join(" ");
5}
6
7function serialize_dfs(root, res) {
8 if (!root) {
9 res.push("x");
10 return;
11 }
12 res.push(root.val);
13 serialize_dfs(root.left, res);
14 serialize_dfs(root.right, res);
15}
16
Recommended Readings
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Donβt Miss This!