1163. Last Substring in Lexicographical Order


Problem Description

Given a string s, the goal is to find the substring that is the largest in lexicographical order and return it. A lexicographical order is essentially dictionary order, so for example, "b" is greater than "a", and "ba" is greater than "ab". When determining the lexicographical order of two strings, the comparison is made character by character starting from the first character. As soon as a difference is found, the string with the greater character at that position is considered larger. If all compared characters are equal and the strings are of different lengths, the longer string is considered to be larger.

Intuition

To solve this problem, we can leverage a two-pointer approach to iterate through the string and track potential substrings that could be the largest in lexicographical order. Starting with the first character, we treat it as the initial largest substring. As we move through the string with another pointer, we compare new potential substrings with the current largest one.

Each time we find that the substring starting at the second pointer is larger, we update our initial pointer to the location just after where the comparison difference was found. If the initial pointer's substring is larger, we simply move the second pointer forward to explore further options. By comparing characters at the offset (k) where the substrings start to differ, we can efficiently find the largest substring without needing to compare entire substrings each time.

This utilizes the fact that any prefix that isn't the largest isn't worth considering because any extension to it will also not be the largest. We keep doing this until we've traversed the entire string, and our first pointer will indicate the beginning of the largest possible substring in lexicographical order, which we return by slicing the string from this position to the end.

Learn more about Two Pointers patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

A heap is a ...?

Solution Approach

The solution is based on a smart iteration using two pointers, i and j, that scan through the string to find the largest lexicographical substring. These pointers represent the start of the potentially largest substrings. The third pointer, k, is used to compare characters at an offset from the pointers i and j. Data structure wise, the only requirement is the input string s; no additional complex data structures are needed.

We initialize i at 0, indicating the start of the current largest substring and set j to 1, which is the start of the substring we want to compare with the current largest. The variable k is initialized to 0, and it indicates the offset from i and j where the current comparison is happening.

The algorithm proceeds as follows:

  1. While j + k is less than the length of s, we are not done comparing.
  2. Compare the characters at s[i + k] and s[j + k].
  3. If s[i + k] is equal to s[j + k], we have not yet found a difference, so we increment k to continue the comparison at the next character.
  4. If s[i + k] is less than s[j + k], we have found a larger substring starting at j. We update i to be i + k + 1, move j ahead to i + 1, and reset k to 0 to start comparisons at the beginning of the new substrings.
  5. If s[i + k] is greater than s[j + k], we keep our current largest substring starting at i intact, simply move j ahead by k + 1 to skip the lesser substring, and reset k to 0.

This process ensures that we are always aware of the largest substring seen so far (marked by i) and efficiently skip over parts of the string that cannot contribute to a larger lexicographical substring.

In terms of complexity, this approach ensures a worst-case time complexity of O(n) where n is the length of the string. This is because each character will be compared at most twice. Once when it is part of a potential largest substring (tracked by i), and once when it is part of a competing substring (tracked by j).

Finally, when the loop completes, the index i points to the start of the last substring in lexicographical order, and we return s[i:], the substring from i to the end of s.

This two-pointer approach with character comparison is key because it strategically narrows down the search for the largest lexicographical substring without redundant comparisons or the use of extra space, making it both efficient and elegant.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Breadth first search can be used to find the shortest path between two nodes in a directed graph.

Example Walkthrough

Let's say our string s is "ababc", and we want to find the substring that is the largest in lexicographical order.

  • Initialize i, j, and k to 0, 1, and 0, respectively. Our initial largest substring is "a" at index 0.

  • Compare characters at i + k and j + k, which means comparing s[0] with s[1] ("a" vs. "b"). Since "b" is greater, we update i to j, which is 1, so now i = 1. Reset j to i + 1 which makes j = 2 and reset k to 0.

  • Now, our largest substring candidate starts at index 1 with "b". Compare s[i + k] with s[j + k] again, which is now "b" vs. "a". Here, "b" is greater so we just move j ahead to j + k + 1, which is 3, and reset k to 0.

  • Continue with the comparisons. We have "b" vs. "b" (s[1] vs. s[3]), which are equal, so we increment k to 1 to compare the next characters.

  • Next comparison is "a" vs. "c" (s[1 + k] vs. s[3 + k]). Here "c" is greater, so we find a new largest substring starting at index 3. We update i to j, which is 3, reset j to i + 1, which makes j = 4, and k to 0.

  • Our last comparison would be "b" vs. "c" (s[3] vs. s[4]). "c" is greater, so i stays at 3.

Now j + k is equal to the length of s, and we exit the loop. The last index i was updated to is 3, so the largest lexicographical substring of s is s[i:] which is "c".

Ultimately, the algorithm efficiently deduced that "c" is the largest substring without having to do an exhaustive search or comparison of all potential substrings.

Not Sure What to Study? Take the 2-min Quiz:

How does merge sort divide the problem into subproblems?

Python Solution

1class Solution:
2    def lastSubstring(self, string: str) -> str:
3        # Initialize pointer (i) for the start of the current candidate substring.
4        # Initialize pointer (j) as the start of the next substring to compare.
5        # Initialize (k) as the offset from both i and j during comparison.
6        current_start = 0
7        compare_start = 1
8        offset = 0
9      
10        # Iterate until the end of string is reached.
11        while compare_start + offset < len(string):
12            # If characters at the current offset are equal, increase the offset.
13            if string[current_start + offset] == string[compare_start + offset]:
14                offset += 1
15            # If the current character in the comparison substring is greater, 
16            # it becomes the new candidate. Update current_start to be compare_start.
17            elif string[current_start + offset] < string[compare_start + offset]:
18                current_start = max(current_start + offset + 1, compare_start)
19                offset = 0
20                # Ensure compare_start is always after current_start.
21                if current_start >= compare_start:
22                    compare_start = current_start + 1
23            # If the current character in the candidate substring is greater, 
24            # continue with the next substring by moving compare_start.
25            else:
26                compare_start += offset + 1
27                offset = 0
28              
29        # Return the last substring starting from the candidate position.
30        return string[current_start:]
31

Java Solution

1class Solution {
2    public String lastSubstring(String str) {
3        int length = str.length(); // Length of the string
4        int maxCharIndex = 0; // Index of the start of the last substring with the highest lexicographical order
5        int currentIndex = 1; // Current index iterating through the string
6        int compareIndex = 0; // Index used for comparing characters
7
8        // Loop through the string once
9        while (currentIndex + compareIndex < length) {
10            // Compare characters at the current index and the current maximum substring index
11            int diff = str.charAt(maxCharIndex + compareIndex) - str.charAt(currentIndex + compareIndex);
12          
13            if (diff == 0) { // Characters are equal, move to the next character for comparison
14                compareIndex++;
15            } else if (diff < 0) { // Current character is larger, update maxCharIndex to current index
16                maxCharIndex = currentIndex;
17                currentIndex = maxCharIndex + 1;
18                compareIndex = 0; // Reset compareIndex
19            } else { // Current character is smaller, move past the current substring for comparison
20                currentIndex += compareIndex + 1;
21                compareIndex = 0; // Reset compareIndex
22            }
23        }
24
25        // Create and return the substring starting from maxCharIndex to the end of the string
26        return str.substring(maxCharIndex);
27    }
28}
29

C++ Solution

1class Solution {
2public:
3    // Function to find the lexicographically largest substring of 's'
4    string lastSubstring(string s) {
5        int strSize = s.size();      // The size of the input string
6        int startIndex = 0;          // The starting index of the current candidate for the result
7      
8        // 'nextIndex' - The index of the next potential candidate
9        // 'offset' - The offset from both 'startIndex' and 'nextIndex' to compare characters
10        for (int nextIndex = 1, offset = 0; nextIndex + offset < strSize;) {
11            // If characters at the current offset are the same, just go to next character
12            if (s[startIndex + offset] == s[nextIndex + offset]) {
13                ++offset;
14            } 
15            // If the current character at 'nextIndex' + 'offset' is greater,
16            // it means this could be a new candidate for the result
17            else if (s[startIndex + offset] < s[nextIndex + offset]) {
18                startIndex = nextIndex;    // Set the 'startIndex' to 'nextIndex'
19                ++startIndex;              // Increment 'startIndex' to consider next substring
20                offset = 0;                // Reset 'offset' since we have a new candidate
21                // Make sure that 'nextIndex' is always ahead of 'startIndex'
22                if (startIndex >= nextIndex) {
23                    nextIndex = startIndex + 1;
24                }
25            } 
26            // If the current character at 'startIndex' + 'offset' is greater,
27            // 'startIndex' remains as the candidate and move 'nextIndex' for next comparison
28            else {
29                nextIndex += offset + 1;
30                offset = 0;                // Reset 'offset' because we are comparing a new pair of indices
31            }
32        }
33        // Return the substring from 'startIndex' to the end of the string,
34        // as it's the lexicographically largest substring
35        return s.substr(startIndex);
36    }
37};
38

Typescript Solution

1function lastSubstring(str: string): string {
2    const length = str.length;  // Store the length of the string
3    let startIndex = 0;  // Initialize the starting index of the last substring
4    // Loop to find the last substring in lexicographical order
5    for (let currentIndex = 1, offset = 0; currentIndex + offset < length;) {
6        if (str[startIndex + offset] === str[currentIndex + offset]) {
7            // If the characters are the same, increment the offset
8            offset++;
9        } else if (str[startIndex + offset] < str[currentIndex + offset]) {
10            // Found a later character, update start index beyond the current comparison
11            startIndex = currentIndex;
12            currentIndex++;
13            offset = 0;  // Reset the offset for new comparisons
14        } else {
15            // Current character is not later, just move the current index forward
16            currentIndex += offset + 1;
17            offset = 0;  // Reset the offset for new comparisons
18        }
19    }
20    // Return the substring from the start index to the end of the string
21    return str.slice(startIndex);
22}
23
Fast Track Your Learning with Our Quick Skills Quiz:

How does merge sort divide the problem into subproblems?

Time and Space Complexity

Time Complexity

The time complexity of the algorithm is indeed O(n). Here's why: The indices i and j represent the starting positions of the two substrings being compared, and k tracks the current comparing position relative to i and j. The while loop will continue until j + k reaches len(s), which would happen in the worst case after 2n comparisons (when every character in the string is the same), because when s[i + k] is less than s[j + k], i is set to k + 1 steps ahead which could repeat n times in the worst case, and each time j is shifted only one step ahead, also up to n times. Therefore, the algorithm does not compare each character more than twice in the worst-case scenario.

Space Complexity

The space complexity is O(1) because the algorithm uses a fixed number of integer variables i, j, and k, regardless of the input size. There are no data structures that grow with the size of the input.

Learn more about how to find time and space complexity quickly.


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫