Facebook Pixel

1698. Number of Distinct Substrings in a String πŸ”’

MediumTrieStringSuffix ArrayHash FunctionRolling Hash
Leetcode Link

Problem Description

Given a string s, you need to find and return the total number of distinct (unique) substrings that can be formed from it.

A substring is a contiguous sequence of characters within the string. You can obtain a substring by removing any number of characters (including zero) from the beginning of the string and any number of characters (including zero) from the end of the string.

For example, if s = "abc", the possible substrings are:

  • Length 1: "a", "b", "c"
  • Length 2: "ab", "bc"
  • Length 3: "abc"

All these substrings are distinct, so the answer would be 6.

The solution uses a brute force approach by generating all possible substrings using nested loops. The outer loop i represents the starting position of the substring, and the inner loop j represents the ending position (exclusive). By using s[i:j] for all valid combinations of i and j where 0 ≀ i < j ≀ n, we generate all possible substrings. These substrings are stored in a set, which automatically handles duplicates, keeping only distinct values. The final answer is the size of this set, representing the count of distinct substrings.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

When we need to count distinct substrings, the key insight is that we need to examine every possible substring and keep track of which ones we've seen before. Since a substring is defined by its starting and ending positions in the original string, we can systematically generate all possibilities.

Think of it this way: for a string of length n, each substring can be uniquely identified by two things:

  1. Where it starts (position i)
  2. Where it ends (position j)

This naturally leads us to use two nested loops - one to fix the starting position and another to extend the ending position. For each starting position i, we can create substrings ending at positions i+1, i+2, ... up to n.

The challenge is handling duplicates. For instance, in the string "aaa", the substring "a" appears at multiple positions, but we should count it only once. This is where Python's set data structure becomes perfect - it automatically handles uniqueness for us. By adding each substring s[i:j] to a set, duplicate substrings are ignored, and we're left with only the distinct ones.

The beauty of this approach is its simplicity: generate all substrings using slice notation s[i:j], let the set handle deduplication, and return the size of the set. While this may not be the most efficient solution for very long strings, it's straightforward to understand and implement, making it an excellent starting point for solving this problem.

Learn more about Trie patterns.

Solution Approach

The solution uses a brute force enumeration strategy to find all distinct substrings. Let's walk through the implementation step by step:

  1. Get the string length: First, we store the length of the input string s in variable n for convenience.

  2. Generate all substrings using nested loops: The core of the solution uses a set comprehension with two nested loops:

    • The outer loop: for i in range(n) iterates through all possible starting positions (from index 0 to n-1)
    • The inner loop: for j in range(i + 1, n + 1) iterates through all possible ending positions for each starting position i
    • The range (i + 1, n + 1) ensures that:
      • j is always greater than i (so we get valid substrings)
      • j can go up to n (to include substrings that end at the last character)
  3. Extract substrings using slicing: For each pair of (i, j), we extract the substring using Python's slice notation s[i:j]. This gives us the substring starting at index i and ending at index j-1.

  4. Use a set for automatic deduplication: All generated substrings are placed in a set using the set comprehension {s[i:j] for i in range(n) for j in range(i + 1, n + 1)}. The set data structure automatically handles duplicate removal - if the same substring appears multiple times, it's only stored once.

  5. Return the count: Finally, we use len() to get the size of the set, which gives us the count of distinct substrings.

Time Complexity: O(nΒ²) for generating all substrings, where each substring extraction takes O(n) time in the worst case, leading to overall O(nΒ³).

Space Complexity: O(nΒ²) in the worst case when all substrings are distinct, as we need to store them in the set.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with the string s = "aba".

Step 1: Initialize

  • String: s = "aba"
  • Length: n = 3

Step 2: Generate all substrings using nested loops

We'll systematically generate substrings by fixing a starting position i and varying the ending position j:

  • When i = 0 (starting from 'a' at index 0):

    • j = 1: s[0:1] = "a"
    • j = 2: s[0:2] = "ab"
    • j = 3: s[0:3] = "aba"
  • When i = 1 (starting from 'b' at index 1):

    • j = 2: s[1:2] = "b"
    • j = 3: s[1:3] = "ba"
  • When i = 2 (starting from 'a' at index 2):

    • j = 3: s[2:3] = "a"

Step 3: Collect in a set

All generated substrings: ["a", "ab", "aba", "b", "ba", "a"]

When we add these to a set, the duplicate "a" (which appears twice - once from position 0 and once from position 2) is automatically removed.

Final set: {"a", "ab", "aba", "b", "ba"}

Step 4: Count distinct substrings

The set contains 5 distinct substrings, so the answer is 5.

This example demonstrates how the nested loops generate all possible substrings and how the set automatically handles deduplication when we encounter the same substring at different positions.

Solution Implementation

1class Solution:
2    def countDistinct(self, s: str) -> int:
3        """
4        Count the number of distinct substrings in a given string.
5      
6        Args:
7            s: Input string
8          
9        Returns:
10            Number of distinct substrings
11        """
12        n = len(s)
13      
14        # Generate all possible substrings using set comprehension
15        # i represents the starting index (inclusive)
16        # j represents the ending index (exclusive)
17        # s[i:j] extracts substring from index i to j-1
18        distinct_substrings = {s[i:j] for i in range(n) for j in range(i + 1, n + 1)}
19      
20        # Return the count of unique substrings
21        return len(distinct_substrings)
22
1class Solution {
2    /**
3     * Counts the number of distinct substrings in a given string.
4     * 
5     * @param s the input string
6     * @return the count of distinct substrings
7     */
8    public int countDistinct(String s) {
9        // Use a HashSet to store unique substrings
10        Set<String> distinctSubstrings = new HashSet<>();
11      
12        // Get the length of the input string
13        int length = s.length();
14      
15        // Iterate through all possible starting positions
16        for (int startIndex = 0; startIndex < length; ++startIndex) {
17            // For each starting position, iterate through all possible ending positions
18            for (int endIndex = startIndex + 1; endIndex <= length; ++endIndex) {
19                // Extract substring from startIndex to endIndex (exclusive)
20                // and add it to the set (duplicates are automatically handled)
21                distinctSubstrings.add(s.substring(startIndex, endIndex));
22            }
23        }
24      
25        // Return the number of distinct substrings
26        return distinctSubstrings.size();
27    }
28}
29
1class Solution {
2public:
3    int countDistinct(string s) {
4        // Use unordered_set to store unique substrings
5        // string_view provides a lightweight view without copying
6        unordered_set<string_view> uniqueSubstrings;
7      
8        // Get the length of the input string
9        int stringLength = s.size();
10      
11        // Create a string_view of the entire input string for efficient substring operations
12        string_view stringView = s;
13      
14        // Iterate through all possible starting positions
15        for (int startIndex = 0; startIndex < stringLength; ++startIndex) {
16            // For each starting position, iterate through all possible ending positions
17            for (int endIndex = startIndex + 1; endIndex <= stringLength; ++endIndex) {
18                // Extract substring from startIndex with length (endIndex - startIndex)
19                string_view currentSubstring = stringView.substr(startIndex, endIndex - startIndex);
20              
21                // Insert the substring into the set (duplicates are automatically handled)
22                uniqueSubstrings.insert(currentSubstring);
23            }
24        }
25      
26        // Return the count of distinct substrings
27        return uniqueSubstrings.size();
28    }
29};
30
1function countDistinct(s: string): number {
2    // Use a Set to store unique substrings
3    // Set automatically handles duplicates
4    const uniqueSubstrings: Set<string> = new Set();
5  
6    // Get the length of the input string
7    const stringLength: number = s.length;
8  
9    // Iterate through all possible starting positions
10    for (let startIndex = 0; startIndex < stringLength; startIndex++) {
11        // For each starting position, iterate through all possible ending positions
12        for (let endIndex = startIndex + 1; endIndex <= stringLength; endIndex++) {
13            // Extract substring from startIndex to endIndex (exclusive)
14            // substring() method takes start and end indices
15            const currentSubstring: string = s.substring(startIndex, endIndex);
16          
17            // Add the substring to the set (duplicates are automatically handled)
18            uniqueSubstrings.add(currentSubstring);
19        }
20    }
21  
22    // Return the count of distinct substrings
23    return uniqueSubstrings.size;
24}
25

Time and Space Complexity

Time Complexity: O(n^3)

The code uses nested loops to generate all possible substrings. The outer loop runs n times (for each starting position i), and the inner loop runs up to n times for each i (for each ending position j). This gives us O(n^2) substring generations. However, each substring extraction s[i:j] takes O(j-i) time, which in the worst case is O(n). Additionally, adding each substring to the set requires hashing, which takes O(length of substring), also up to O(n) in the worst case. Therefore, the overall time complexity is O(n^2) * O(n) = O(n^3).

Space Complexity: O(n^2)

The set stores all distinct substrings. In the worst case where all characters are unique, the number of distinct substrings is n * (n + 1) / 2, which is O(n^2). Each substring can have a maximum length of n, so theoretically the total space for storing all substrings could be O(n^3). However, since the problem typically considers the space complexity in terms of the number of distinct elements stored rather than their total size, and given the reference answer, the space complexity is O(n^2).

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Off-by-One Error in the Inner Loop Range

One of the most common mistakes is incorrectly setting the range for the inner loop j. Developers often write:

Incorrect:

distinct_substrings = {s[i:j] for i in range(n) for j in range(i, n)}

Why it's wrong:

  • When j = i, the slice s[i:i] produces an empty string "", which is not a valid substring
  • When j = n-1, we miss substrings that include the last character

Correct approach:

distinct_substrings = {s[i:j] for i in range(n) for j in range(i + 1, n + 1)}
  • i + 1 ensures we skip empty strings (substring must have at least one character)
  • n + 1 ensures we include substrings ending at the last character (since Python slicing is exclusive of the end index)

2. Using a List Instead of a Set

Incorrect:

distinct_substrings = [s[i:j] for i in range(n) for j in range(i + 1, n + 1)]
return len(distinct_substrings)

Why it's wrong: Lists don't automatically remove duplicates. If the string has repeating patterns (like "aab"), the substring "a" appears twice, and using a list would count it twice.

Solution: Always use a set to automatically handle deduplication:

distinct_substrings = {s[i:j] for i in range(n) for j in range(i + 1, n + 1)}

3. Memory Issues with Very Long Strings

For strings with length approaching 10^4 or more, storing all substrings can cause memory issues since there are O(nΒ²) possible substrings.

Alternative approach for memory-constrained scenarios:

def countDistinct(self, s: str) -> int:
    n = len(s)
    distinct_substrings = set()
  
    for i in range(n):
        for j in range(i + 1, n + 1):
            distinct_substrings.add(s[i:j])
          
    return len(distinct_substrings)

While this doesn't fundamentally reduce memory usage, it allows for potential optimizations like:

  • Early termination if a memory limit is reached
  • Batch processing of substrings
  • Using a more memory-efficient data structure like a Trie for very large inputs

4. Forgetting Edge Cases

Developers sometimes forget to handle edge cases properly:

Edge cases to consider:

  • Empty string: s = "" should return 0
  • Single character: s = "a" should return 1
  • All same characters: s = "aaaa" has only 4 distinct substrings: "a", "aa", "aaa", "aaaa"

Robust solution:

def countDistinct(self, s: str) -> int:
    if not s:  # Handle empty string
        return 0
      
    n = len(s)
    distinct_substrings = {s[i:j] for i in range(n) for j in range(i + 1, n + 1)}
    return len(distinct_substrings)
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

What's the output of running the following function using the following tree as input?

1def serialize(root):
2    res = []
3    def dfs(root):
4        if not root:
5            res.append('x')
6            return
7        res.append(root.val)
8        dfs(root.left)
9        dfs(root.right)
10    dfs(root)
11    return ' '.join(res)
12
1import java.util.StringJoiner;
2
3public static String serialize(Node root) {
4    StringJoiner res = new StringJoiner(" ");
5    serializeDFS(root, res);
6    return res.toString();
7}
8
9private static void serializeDFS(Node root, StringJoiner result) {
10    if (root == null) {
11        result.add("x");
12        return;
13    }
14    result.add(Integer.toString(root.val));
15    serializeDFS(root.left, result);
16    serializeDFS(root.right, result);
17}
18
1function serialize(root) {
2    let res = [];
3    serialize_dfs(root, res);
4    return res.join(" ");
5}
6
7function serialize_dfs(root, res) {
8    if (!root) {
9        res.push("x");
10        return;
11    }
12    res.push(root.val);
13    serialize_dfs(root.left, res);
14    serialize_dfs(root.right, res);
15}
16

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More