1698. Number of Distinct Substrings in a String
Problem Description
The problem requires calculating the total number of distinct substrings that can be formed from a given string s
. A substring is defined as any sequence of characters that can be derived from the string by deleting zero or more characters from the beginning and zero or more characters from the end. For example, in the string "abc"
, some of the substrings include "a"
, "ab"
, "abc"
, and "b"
. The key here is to ensure that each substring is counted only once, even if it occurs multiple times in different positions of the string.
Intuition
The intuitive approach to solving this problem relies on generating all possible substrings of the given string and then counting the unique ones. To achieve this, two nested loops can be used where the outer loop iterates over the start position of a substring and the inner loop iterates over the end position. For each pair of start and end positions, a substring is extracted and stored in a set, which inherently eliminates duplicates due to its unordered collection of unique elements. After all possible substrings are added to the set, the number of distinct substrings can be determined by simply taking the size of the set.
This approach is easy to implement and understand, and although not the most efficient in terms of time complexity, it works well for strings of reasonable length.
Learn more about Trie patterns.
Solution Approach
The solution's implementation relies on the concept of hash sets and comprehensions in Python. Here's a step-by-step explanation of the code provided:
- Initialize the length of the string
n = len(s)
. This is required to control the loops' range. - A set is created using a set comprehension (
{}
) that iterates through all possible start (i
) and end (j
) positions of the substrings. In Python, sets store only unique elements which means any duplicate substrings generated will not be added to the set. - The outer loop begins at the first character (
i
starts at 0) and continues until the end of the string (i < n
). - The inner loop begins at the character immediately following the start position (
j
starts ati + 1
) and continues to the end of the string (j <= n
). This ensures that we consider all possible end positions for substrings starting at positioni
. - During each iteration of the inner loop, a new substring
s[i:j]
is sliced from the original strings
, withi
inclusive andj
exclusive. - The sliced substring is then added to the set. If the substring already exists in the set, it's ignored, as sets do not store duplicates.
- After the looping, the set will contain all possible unique substrings of the original string.
- The total number of distinct substrings is the size of the set, which is obtained using the
len()
function on the set.
To summarize, the algorithm leverages a brute-force approach to generate all substrings and a data structure (set) that naturally handles the uniqueness constraint. It's excellent for its simplicity and ease of understanding. However, for large strings, this approach may lead to time-consuming operations because the number of substrings grows quadratically with the length of the input string.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let us consider a simple example using the string "abc"
to illustrate the solution approach.
- First, we initialize the length of the string
n = len(s)
. In our case,n
would be3
, since"abc"
is three characters long. - We then create a set to store our substrings. This set will only keep unique elements.
- We start our outer loop, where
i
is our starting point of the substring. For our string"abc"
,i
will take values0
,1
, and2
. - For each value of
i
, the inner loop starts fromi + 1
and continues until the end of the string. This is to ensure we cover all possible substrings starting at indexi
. - As the loops execute, we generate substrings using
s[i:j]
and add them to our set. Fori = 0
,j
will go from1
to3
, generating"a"
,"ab"
, and"abc"
. - We then move to
i = 1
, wherej
will go from2
to3
, giving us substrings"b"
and"bc"
. - Lastly, for
i = 2
,j
will only be3
, which gives us the substring"c"
. - All these substrings are added to our set:
{"a", "ab", "abc", "b", "bc", "c"}
. - After both loops have completed, we end up with a set containing all unique substrings.
- The total number of distinct substrings is simply the size of this set. Here, we have
6
unique substrings. Thus, the answer is6
.
Through this example, we can see how the proposed solution methodically enumerates all the substrings while enforcing uniqueness by utilizing the properties of a set.
Solution Implementation
1class Solution:
2 def countDistinct(self, text: str) -> int:
3 # Calculate the length of the input string
4 length_of_text = len(text)
5
6 # Use a set comprehension to generate all unique substrings
7 # The set will automatically eliminate duplicate substrings
8 unique_substrings = {
9 text[start:end] for start in range(length_of_text)
10 for end in range(start + 1, length_of_text + 1)
11 }
12
13 # The length of the set of unique substrings gives us the count of distinct substrings
14 return len(unique_substrings)
15
1class Solution {
2 public int countDistinct(String s) {
3 // Create a HashSet to store the unique substrings
4 Set<String> uniqueSubstrings = new HashSet<>();
5 int lengthOfString = s.length(); // Get the length of the input string
6
7 // Iterate over all possible starting points for the substrings
8 for (int start = 0; start < lengthOfString; ++start) {
9 // Iterate over all possible ending points for the substrings
10 for (int end = start + 1; end <= lengthOfString; ++end) {
11 // Extract the substring from start to end index (exclusive) and add it to the Set
12 uniqueSubstrings.add(s.substring(start, end));
13 }
14 }
15
16 // Return the size of the Set, which represents the count of distinct substrings
17 return uniqueSubstrings.size();
18 }
19}
20
1#include <string>
2#include <unordered_set>
3
4class Solution {
5public:
6 // Function to count the number of distinct substrings in a given string
7 int countDistinct(std::string s) {
8 // Using an unordered set to store unique substrings for O(1) average time complexity for insertions and lookups
9 std::unordered_set<std::string_view> uniqueSubstrings;
10
11 // Size of the input string
12 int length = s.size();
13 // Variable to represent the current substring
14 std::string_view currentSubstring;
15 // String_view of the input string to optimize substring operations
16 std::string_view stringView = s;
17
18 // Iterate over all possible starting points for substrings in s
19 for (int i = 0; i < length; ++i) {
20 // Iterate over all possible ending points for substrings, starting from the next character
21 for (int j = i + 1; j <= length; ++j) {
22 // Create a substring from the current starting and ending points
23 currentSubstring = stringView.substr(i, j - i);
24 // Insert the current substring into the set of unique substrings
25 uniqueSubstrings.insert(currentSubstring);
26 }
27 }
28
29 // The size of the set represents the number of distinct substrings in s
30 return uniqueSubstrings.size();
31 }
32};
33
1// Importing Set from JavaScript's standard library
2import { Set } from 'core-js';
3
4// Function to count the number of distinct substrings in a given string
5function countDistinct(s: string): number {
6 // Using a Set to store unique substrings for efficient insertions and uniqueness checks
7 let uniqueSubstrings: Set<string> = new Set<string>();
8
9 // Length of the input string
10 let length: number = s.length;
11
12 // Iterating over all possible starting indexes for substrings in s
13 for (let i = 0; i < length; ++i) {
14 // Iterating over all possible ending indexes for substrings, starting from the next character
15 for (let j = i + 1; j <= length; ++j) {
16 // Extracting a substring from the current starting and ending indexes
17 let currentSubstring: string = s.substring(i, j);
18 // Inserting the current substring into the set of unique substrings
19 uniqueSubstrings.add(currentSubstring);
20 }
21 }
22
23 // The size of the set represents the number of distinct substrings in s
24 return uniqueSubstrings.size;
25}
26
27export { countDistinct }; // Exporting countDistinct for other modules to use
28
Time and Space Complexity
The given Python code snippet defines a function that counts the distinct substrings of a given string s
. To analyze the complexity, let's consider n
to be the length of the input string s
.
Time Complexity: O(n^3)
The time complexity of the given function is O(n^3)
for the following reasons:
-
Two nested loops iterate over the string to generate all possible substrings, which leads to
O(n^2)
complexity. -
For each iteration, a substring is created using slicing
s[i:j]
. In Python, slicing a string isO(k)
wherek
is the number of characters in the substring. In the worst case,k
can ben
leading toO(n)
for the slicing operation. -
These two combined give us a total of
O(n^2) * O(n) = O(n^3)
complexity since for each pair(i, j)
, a new substring of the strings
can be created withO(n)
complexity.
Space Complexity: O(n^2)
The space complexity of the given function is O(n^2)
for the following reasons:
-
A set is used to store the distinct substrings. In the worst case, the number of distinct substrings of a string of length
n
is the sum of the series 1 + 2 + 3 + ... + n, which results inn(n + 1)/2
substrings, equating to anO(n^2)
space complexity. -
Even though string slicing itself does not take additional space since it's creating new strings that reference the same characters within the original string, in Python, slicing creates new objects and the total space in the set would be the sum of the length of all unique substrings.
Considering these points, the complexities are as follows:
- Time Complexity:
O(n^3)
- Space Complexity:
O(n^2)
Learn more about how to find time and space complexity quickly using problem constraints.
Consider the classic dynamic programming of longest increasing subsequence:
Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.
For example, the length of LIS for [50, 3, 10, 7, 40, 80]
is 4
and LIS is
[3, 7, 40, 80]
.
What is the recurrence relation?
Recommended Readings
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!