Facebook Pixel

3760. Maximum Substrings With Distinct Start

MediumHash TableString
LeetCode ↗

Problem Description

You are given a string s consisting of lowercase English letters.

Your task is to split s into substrings (contiguous, non-empty parts of the string). When splitting, every character of s must belong to exactly one substring, and the substrings together form the original string s in order.

The constraint is that each substring must start with a distinct character. In other words, if you look at the first character of every substring you create, no two of these first characters can be the same.

Return an integer representing the maximum number of substrings you can split s into while satisfying this rule.

Example reasoning:

Suppose s = "abcab". You could split it as "a", "b", "c", "ab", giving you 4 substrings starting with 'a', 'b', 'c', and 'a'. But this is invalid because two substrings start with 'a'. A valid split that maximizes the count would be "a", "b", "cab", which starts with 'a', 'b', and 'c' — all distinct — producing 3 substrings.

Since each substring's starting character must be unique, the maximum number of substrings you can ever create is limited by the number of distinct characters present in s. You can always achieve this maximum by greedily starting a new substring each time you encounter a character you have not used as a starting character before, and merging any repeated characters into the previous substring.

Therefore, the answer is simply the count of distinct characters in the string s.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

How We Pick the Algorithm

Why Hash Table / Counting?

This problem maps to Hash Table / Counting through a short path in the full flowchart.

Fastlookup orcounting?yesSlidingwindow?noHash Table /Counting

The answer reduces to counting distinct starting characters, which is simply the count of unique letters in the string.

Open in Flowchart

Intuition

The key insight comes from focusing on what truly limits how many substrings we can create: the starting characters must all be distinct.

Since the alphabet has only 26 lowercase letters, and every substring needs a unique starting character, the number of substrings can never exceed the number of different characters that appear in s. If a character does not appear in s at all, it can never be used as a starting character, so only the distinct characters actually present matter.

Now the question becomes: can we always reach this upper bound? The answer is yes. Imagine scanning the string from left to right and greedily cutting a new substring whenever we meet a character we have not yet used to begin a substring. Every distinct character will eventually appear at some position, and at the moment it first appears we can start a fresh substring with it. Any repeated character that we have already used as a starter can simply be absorbed into the current substring instead of forcing a new cut. This way, no starting character is ever wasted or duplicated.

Because we can always achieve a separate substring for each distinct character, and we can never exceed that count, the maximum number of substrings is exactly equal to the number of distinct characters in s. This reduces the entire problem to counting unique letters, which is precisely what len(set(s)) computes.

Solution Approach

The implementation follows directly from the intuition: the answer equals the number of distinct characters in s.

Data structure used — a set:

A set is the natural choice here because it automatically discards duplicate elements and keeps only unique values. By converting the string s into a set with set(s), every repeated character collapses into a single entry, leaving us with exactly the distinct characters present in the string.

Step-by-step walkthrough:

  1. Take the input string s.
  2. Apply set(s). This iterates over every character of s and inserts it into a set. Characters that already exist are ignored, so the resulting set holds only the unique letters.
  3. Apply len(...) to the set to count how many distinct characters it contains.
  4. Return this count as the final answer.

The entire logic fits in a single line:

return len(set(s))

Complexity analysis:

  • Time complexity: O(n), where n is the length of s. Building the set requires a single pass over all characters, and each insertion takes average O(1) time.
  • Space complexity: O(|Σ|), where |Σ| is the size of the character alphabet. Since s contains only lowercase English letters, the set can hold at most 26 characters, so the space used is bounded by a constant.

This approach works because the greedy splitting strategy guarantees we can always create one substring per distinct starting character, making the count of unique characters both the achievable result and the theoretical maximum.

Example Walkthrough

Let's trace through the solution approach using the small example s = "abcab".

Goal: Split s into the maximum number of substrings such that every substring begins with a distinct character.

Step 1 — Take the input string.

s = "abcab"
       indices: 0='a', 1='b', 2='c', 3='a', 4='b'

Step 2 — Simulate the greedy split (to build intuition).

We scan left to right and start a new substring only when we hit a character we have not yet used as a starter:

PositionCharUsed starters so farActionSubstrings so far
0a{}New starter a → begin substring "a"["a"]
1b{a}New starter b → begin substring "b"["a", "b"]
2c{a, b}New starter c → begin substring "c"["a", "b", "c"]
3a{a, b, c}a already used → absorb into current substring["a", "b", "ca"]
4b{a, b, c}b already used → absorb into current substring["a", "b", "cab"]

Final split: "a", "b", "cab" → starting characters are a, b, c, all distinct. Count = 3.

Step 3 — Apply the formula set(s).

set("abcab") = {'a', 'b', 'c'}

The repeated 'a' (index 3) and 'b' (index 4) collapse into the existing entries, leaving only the unique letters.

Step 4 — Count with len(...).

len({'a', 'b', 'c'}) = 3

Result: The greedy simulation produced 3 substrings, which exactly matches len(set(s)) = 3. This confirms that the answer is simply the number of distinct characters in s — every distinct character becomes the unique start of one substring, and repeated characters are merged into whatever substring is currently open.

Solution Implementation

1class Solution:
2    def maxDistinct(self, s: str) -> int:
3        # Convert the string into a set to remove duplicate characters,
4        # then return the count of unique characters.
5        unique_chars = set(s)
6        return len(unique_chars)
7
1class Solution {
2    /**
3     * Counts the number of distinct characters in the given string.
4     *
5     * @param s the input string consisting of lowercase letters
6     * @return the count of distinct characters
7     */
8    public int maxDistinct(String s) {
9        // Tracks how many distinct characters we have seen so far
10        int distinctCount = 0;
11
12        // Frequency array for the 26 lowercase English letters
13        int[] frequency = new int[26];
14
15        // Iterate over every character in the string
16        for (int i = 0; i < s.length(); i++) {
17            // Map the current character to an index in [0, 25]
18            int index = s.charAt(i) - 'a';
19
20            // Increment its frequency; if this is the first occurrence,
21            // it contributes a new distinct character
22            if (++frequency[index] == 1) {
23                distinctCount++;
24            }
25        }
26
27        // Return the total number of distinct characters
28        return distinctCount;
29    }
30}
31
1class Solution {
2public:
3    int maxDistinct(string s) {
4        int ans = 0;                  // Count of distinct characters
5        int cnt[26]{};                // Frequency table for 'a' to 'z', zero-initialized
6
7        // Traverse every character in the string
8        for (char c : s) {
9            // Increment this character's frequency.
10            // If it becomes 1, it is the first occurrence,
11            // so it contributes a new distinct character.
12            if (++cnt[c - 'a'] == 1) {
13                ++ans;
14            }
15        }
16
17        return ans;                   // Return the total number of distinct characters
18    }
19};
20
1/**
2 * Counts the number of distinct characters in the given string.
3 *
4 * @param s - The input string consisting of lowercase English letters.
5 * @returns The count of distinct characters present in the string.
6 */
7function maxDistinct(s: string): number {
8    // Tracks how many distinct characters have been seen so far.
9    let distinctCount = 0;
10
11    // Frequency table for the 26 lowercase English letters.
12    const frequency: number[] = Array(26).fill(0);
13
14    // Iterate over each character in the string.
15    for (const char of s) {
16        // Map the character to an index in the range [0, 25].
17        const index = char.charCodeAt(0) - 97;
18
19        // The first time a character appears, increment the distinct count.
20        if (++frequency[index] === 1) {
21            ++distinctCount;
22        }
23    }
24
25    return distinctCount;
26}
27

Time and Space Complexity

Time Complexity: O(n)

The function constructs a set from the input string s via set(s), which requires iterating over all n characters of the string once. Each insertion into the set is O(1) on average, so building the set takes O(n) time. The subsequent len() call on the set runs in O(1). Therefore, the overall time complexity is O(n), where n is the length of the string s.

Space Complexity: O(k)

The set stores only the distinct characters of s, so the space used is proportional to the number of unique characters k. In the worst case, all characters are distinct and k = n, giving O(n) space. If the character set is bounded (e.g., lowercase English letters), k is at most the size of the alphabet, making it O(1). Thus the space complexity is O(k), bounded by min(n, alphabet size).

Pattern Learn more about how to find time and space complexity quickly.

Common Pitfalls

Pitfall 1: Over-engineering the splitting logic

The most common mistake is taking the problem statement too literally and attempting to actually construct the substrings. Many people read "split into substrings" and immediately reach for a complex greedy simulation:

class Solution:
    def maxDistinct(self, s: str) -> int:
        seen = set()
        count = 0
        current = ""
        for ch in s:
            current += ch
            if current[0] not in seen:
                # try to close this substring...
                seen.add(current[0])
                count += 1
                current = ""
        return count

This kind of code is buggy and overly complicated. It misinterprets when a substring should be "closed," mishandles trailing characters, and produces incorrect results (e.g., on "abcab" it may overcount or leave dangling characters).

Why it's wrong: The key insight is that the answer is mathematically bounded by the number of distinct characters, and that bound is always achievable. There is no need to physically build any substring.

Solution: Recognize the reduction. Each distinct character can serve as the starting character of exactly one substring, and you can always merge repeated characters into the preceding substring. Hence the answer collapses to len(set(s)).

class Solution:
    def maxDistinct(self, s: str) -> int:
        return len(set(s))

Pitfall 2: Confusing "distinct starting characters" with "distinct substrings"

Another trap is assuming the constraint is about the substrings themselves being unique, rather than just their first characters. This leads people to count distinct substrings or distinct contiguous segments, which is a completely different (and much harder) problem.

Why it's wrong: The constraint only restricts the first character of each piece. The remainder of each substring is irrelevant to the count.

Solution: Focus strictly on first characters. Since only the leading character matters and there are at most 26 lowercase letters, the maximum number of valid pieces is precisely the count of unique leading characters available — i.e., the distinct characters in s.


Pitfall 3: Forgetting edge cases (empty input)

If the problem ever allowed an empty string, code that assumes at least one character (e.g., indexing s[0] in a simulation) would throw an IndexError.

Why it's safe with the set approach: len(set("")) simply returns 0 with no special handling.

Solution: The one-liner return len(set(s)) is inherently robust to empty or single-character inputs, requiring no additional guards:

class Solution:
    def maxDistinct(self, s: str) -> int:
        return len(set(s))   # handles "", "a", "aaaa" gracefully

Ready to land your dream job?

Unlock your dream job with a 5-minute quiz for a personalized study roadmap!

Get My Roadmap
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Get a Personalized Study Roadmap:

Problem: Given a list of tasks and a list of requirements, compute a sequence of tasks that can be performed, such that we complete every task once while satisfying all the requirements.

Which of the following method should we use to solve this problem?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More