1163. Last Substring in Lexicographical Order
Problem Description
Given a string s
, you need to find and return the last substring of s
when all possible substrings are arranged in lexicographical (dictionary) order.
For example, if s = "abab"
, all possible substrings are: "a"
, "ab"
, "aba"
, "abab"
, "b"
, "ba"
, "bab"
, "a"
, "ab"
, "b"
. When sorted lexicographically, they become: "a"
, "a"
, "ab"
, "ab"
, "aba"
, "abab"
, "b"
, "b"
, "ba"
, "bab"
. The last substring in this sorted order is "bab"
.
The key insight is that among all substrings starting from a position i
, the suffix s[i:]
(substring from position i
to the end) will always be the lexicographically largest one starting from that position. This is because any shorter substring starting from i
would be a prefix of s[i:]
, and prefixes are always lexicographically smaller than the full string when the string has more characters.
Therefore, the problem reduces to finding which suffix of the string is lexicographically largest. The solution uses a two-pointer technique to efficiently compare different suffixes and identify the one with the maximum lexicographical value.
Intuition
The first observation is that we don't need to consider all possible substrings. Why? Because for any starting position i
, the longest substring starting from i
(which is the suffix s[i:]
) will always be lexicographically larger than any shorter substring starting from the same position. This is because when comparing strings lexicographically, if one string is a prefix of another, the longer string is always greater.
So our problem simplifies to: find the lexicographically largest suffix of the string.
A naive approach would be to compare all suffixes pairwise, but that would be inefficient. Instead, we can use a two-pointer technique to eliminate multiple candidates at once.
The key insight is when we're comparing two suffixes starting at positions i
and j
. If we find that at some offset k
, s[i+k] < s[j+k]
(the first differing character), then not only is suffix s[i:]
smaller than s[j:]
, but also all suffixes s[i+1:]
, s[i+2:]
, ..., s[i+k:]
are smaller than their corresponding counterparts starting from j
. Why? Because they all share the same prefix relationship that made them smaller.
For example, if s = "abcabd"
and we're comparing suffix starting at i=0
("abcabd") with suffix starting at j=3
("abd"), we find they differ at position k=2
where s[0+2]='c'
> s[3+2]='d'
. This tells us that not only "abcabd" > "abd", but also "bcabd" > "bd" and "cabd" > "d".
This observation allows us to skip multiple positions at once: when s[i+k] < s[j+k]
, we can directly jump i
to i+k+1
, skipping all intermediate positions. Similarly, when s[i+k] > s[j+k]
, we can jump j
to j+k+1
.
This clever jumping strategy ensures we examine each character at most a constant number of times, leading to an efficient linear-time algorithm.
Learn more about Two Pointers patterns.
Solution Approach
We implement a two-pointer algorithm to find the lexicographically largest suffix. Here's how the algorithm works:
Initialization:
- Pointer
i = 0
: tracks the starting position of the current best (lexicographically largest) suffix candidate - Pointer
j = 1
: tracks the starting position of the suffix we're currently comparing against - Variable
k = 0
: tracks the offset for character-by-character comparison
Main Loop:
The algorithm continues while j + k < len(s)
, ensuring we don't go out of bounds.
At each iteration, we compare characters s[i + k]
and s[j + k]
:
-
Case 1:
s[i + k] == s[j + k]
- The characters match, so we increment
k
to compare the next pair of characters - We continue extending the comparison to find where the two suffixes differ
- The characters match, so we increment
-
Case 2:
s[i + k] < s[j + k]
- The suffix starting at
j
is lexicographically larger - We update
i = i + k + 1
, effectively skipping all suffixess[i:]
,s[i+1:]
, ...,s[i+k:]
- Why can we skip these? Because if
s[i:i+k]
equalss[j:j+k]
buts[i+k] < s[j+k]
, then any suffix starting in the range[i, i+k]
will be smaller than the corresponding suffix starting fromj
- Reset
k = 0
for the next comparison - If
i >= j
, we setj = i + 1
to ensurej
is always ahead ofi
- The suffix starting at
-
Case 3:
s[i + k] > s[j + k]
- The suffix starting at
i
is lexicographically larger - We update
j = j + k + 1
, skipping all suffixes fromj
toj+k
- Reset
k = 0
for the next comparison
- The suffix starting at
Return Value:
After the loop completes, i
points to the starting position of the lexicographically largest suffix, so we return s[i:]
.
Time Complexity: O(n) where n is the length of the string. Although it might seem like we're doing nested comparisons, each character is examined at most twice due to the jumping strategy.
Space Complexity: O(1) as we only use a constant amount of extra space for the pointers and counter.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's trace through the algorithm with s = "abcab"
.
We need to find the lexicographically largest suffix among: "abcab", "bcab", "cab", "ab", "b".
Initial State:
i = 0
(candidate: "abcab")j = 1
(comparing with: "bcab")k = 0
Iteration 1:
- Compare
s[0+0] = 'a'
withs[1+0] = 'b'
- Since
'a' < 'b'
, suffix "bcab" is larger than "abcab" - Update:
i = 0 + 0 + 1 = 1
- Since
i >= j
, setj = i + 1 = 2
- Reset
k = 0
- New state:
i = 1
(candidate: "bcab"),j = 2
(comparing with: "cab")
Iteration 2:
- Compare
s[1+0] = 'b'
withs[2+0] = 'c'
- Since
'b' < 'c'
, suffix "cab" is larger than "bcab" - Update:
i = 1 + 0 + 1 = 2
- Since
i >= j
, setj = i + 1 = 3
- Reset
k = 0
- New state:
i = 2
(candidate: "cab"),j = 3
(comparing with: "ab")
Iteration 3:
- Compare
s[2+0] = 'c'
withs[3+0] = 'a'
- Since
'c' > 'a'
, suffix "cab" is larger than "ab" - Update:
j = 3 + 0 + 1 = 4
- Reset
k = 0
- New state:
i = 2
(candidate: "cab"),j = 4
(comparing with: "b")
Iteration 4:
- Compare
s[2+0] = 'c'
withs[4+0] = 'b'
- Since
'c' > 'b'
, suffix "cab" is larger than "b" - Update:
j = 4 + 0 + 1 = 5
- Reset
k = 0
- New state:
i = 2
,j = 5
Termination:
j + k = 5 + 0 = 5
which equalslen(s)
, so the loop ends- Return
s[2:] = "cab"
The algorithm correctly identifies "cab" as the lexicographically largest suffix. When we consider all substrings of "abcab" and sort them, "cab" would indeed be the last substring in the sorted order.
Solution Implementation
1class Solution:
2 def lastSubstring(self, s: str) -> str:
3 """
4 Find the lexicographically largest substring of the given string.
5 The largest substring will always be a suffix of the original string.
6
7 Uses a two-pointer approach with comparison offset to efficiently find
8 the starting position of the lexicographically largest suffix.
9
10 Args:
11 s: Input string
12
13 Returns:
14 The lexicographically largest substring (suffix)
15 """
16 # Initialize two pointers for comparing potential starting positions
17 left_start = 0 # First candidate starting position
18 right_start = 1 # Second candidate starting position
19 offset = 0 # Current comparison offset from both starting positions
20
21 # Continue until we've examined all necessary characters
22 while right_start + offset < len(s):
23 # Case 1: Characters at current offset are equal
24 if s[left_start + offset] == s[right_start + offset]:
25 # Move to next character for comparison
26 offset += 1
27
28 # Case 2: Left substring is smaller at current offset
29 elif s[left_start + offset] < s[right_start + offset]:
30 # Skip the entire left substring and its compared portion
31 # The new left_start jumps past all compared characters
32 left_start = left_start + offset + 1
33 offset = 0 # Reset comparison offset
34
35 # Ensure right_start is always ahead of left_start
36 if left_start >= right_start:
37 right_start = left_start + 1
38
39 # Case 3: Right substring is smaller at current offset
40 else:
41 # Skip the entire right substring and its compared portion
42 right_start = right_start + offset + 1
43 offset = 0 # Reset comparison offset
44
45 # Return the suffix starting from the optimal position
46 return s[left_start:]
47
1class Solution {
2 public String lastSubstring(String s) {
3 int stringLength = s.length();
4
5 // Starting index of the current best (lexicographically largest) substring
6 int bestStartIndex = 0;
7
8 // candidateIndex: starting index of the candidate substring being compared
9 // comparisonOffset: current position offset for character-by-character comparison
10 for (int candidateIndex = 1, comparisonOffset = 0;
11 candidateIndex + comparisonOffset < stringLength;) {
12
13 // Compare characters at current positions of both substrings
14 int charDifference = s.charAt(bestStartIndex + comparisonOffset) -
15 s.charAt(candidateIndex + comparisonOffset);
16
17 if (charDifference == 0) {
18 // Characters are equal, continue comparing next characters
19 comparisonOffset++;
20 } else if (charDifference < 0) {
21 // Current best substring is smaller, update to candidate substring
22 // Skip the compared portion since any substring starting within it
23 // would be lexicographically smaller
24 bestStartIndex += comparisonOffset + 1;
25 comparisonOffset = 0;
26
27 // If bestStartIndex catches up or passes candidateIndex,
28 // move candidateIndex forward to maintain comparison order
29 if (bestStartIndex >= candidateIndex) {
30 candidateIndex = bestStartIndex + 1;
31 }
32 } else {
33 // Candidate substring is smaller, move to next candidate
34 // Skip the compared portion for the same reason as above
35 candidateIndex += comparisonOffset + 1;
36 comparisonOffset = 0;
37 }
38 }
39
40 // Return the lexicographically largest substring starting from bestStartIndex
41 return s.substring(bestStartIndex);
42 }
43}
44
1class Solution {
2public:
3 string lastSubstring(string s) {
4 int n = s.size();
5 int maxIndex = 0; // Index of the current best (lexicographically largest) substring
6
7 // Use two-pointer technique to find the lexicographically largest substring
8 for (int candidateIndex = 1, offset = 0; candidateIndex + offset < n;) {
9 // Compare characters at maxIndex + offset and candidateIndex + offset
10 if (s[maxIndex + offset] == s[candidateIndex + offset]) {
11 // Characters match, continue comparing next characters
12 ++offset;
13 }
14 else if (s[maxIndex + offset] < s[candidateIndex + offset]) {
15 // Found a better candidate starting at candidateIndex
16 // Skip all positions that we've already compared
17 maxIndex += offset + 1;
18 offset = 0;
19
20 // Ensure candidateIndex is always ahead of maxIndex
21 if (maxIndex >= candidateIndex) {
22 candidateIndex = maxIndex + 1;
23 }
24 }
25 else {
26 // Current maxIndex is still better
27 // Skip the candidate and positions we've compared
28 candidateIndex += offset + 1;
29 offset = 0;
30 }
31 }
32
33 // Return the substring starting from the best position to the end
34 return s.substr(maxIndex);
35 }
36};
37
1/**
2 * Finds the lexicographically largest substring of the given string.
3 * Uses a two-pointer approach with an offset to efficiently compare substrings.
4 *
5 * @param s - The input string
6 * @returns The lexicographically largest substring
7 */
8function lastSubstring(s: string): string {
9 const length: number = s.length;
10
11 // Starting index of the current candidate for largest substring
12 let candidateIndex: number = 0;
13
14 // Comparison index for finding potentially larger substrings
15 let comparisonIndex: number = 1;
16
17 // Offset for character-by-character comparison between two substrings
18 let offset: number = 0;
19
20 // Continue while we haven't compared all possible positions
21 while (comparisonIndex + offset < length) {
22 if (s[candidateIndex + offset] === s[comparisonIndex + offset]) {
23 // Characters match, continue comparing next characters
24 offset++;
25 } else if (s[candidateIndex + offset] < s[comparisonIndex + offset]) {
26 // Substring starting at comparisonIndex is larger
27 // Skip all positions that would be smaller and update candidate
28 candidateIndex += offset + 1;
29 offset = 0;
30
31 // Ensure comparisonIndex stays ahead of candidateIndex
32 if (candidateIndex >= comparisonIndex) {
33 comparisonIndex = candidateIndex + 1;
34 }
35 } else {
36 // Substring starting at candidateIndex is larger
37 // Skip positions at comparisonIndex that would be smaller
38 comparisonIndex += offset + 1;
39 offset = 0;
40 }
41 }
42
43 // Return the substring starting from the best candidate index
44 return s.slice(candidateIndex);
45}
46
Time and Space Complexity
Time Complexity: O(n)
, where n
is the length of string s
.
The algorithm uses a two-pointer approach with variables i
and j
to find the lexicographically largest substring. While it may appear that the nested nature of the while loop could lead to O(nΒ²)
complexity, careful analysis reveals that each character is visited at most twice:
- The pointer
j
moves forward through the string, and whens[i + k] < s[j + k]
, the pointeri
jumps ahead byk + 1
positions - When
s[i + k] > s[j + k]
, the pointerj
jumps ahead byk + 1
positions - The key insight is that when either pointer jumps, it skips over positions that have already been compared (the
k
positions), ensuring that no position is redundantly processed
Since each position in the string is examined at most a constant number of times, the total time complexity is O(n)
.
Space Complexity: O(1)
The algorithm only uses a fixed number of integer variables (i
, j
, and k
) to track positions and comparison lengths. No additional data structures are created that scale with the input size. The returned substring s[i:]
is not counted toward space complexity as it's the output. Therefore, the space complexity is constant, O(1)
.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
Pitfall 1: Incorrect Pointer Update Logic
The Problem: A common mistake is updating the pointers incorrectly when characters don't match. Developers often write:
# WRONG: Simply incrementing by 1 if s[i + k] < s[j + k]: i = i + 1 # This misses the optimization!
Why It's Wrong: This approach doesn't leverage the information we've already gathered. If we've already compared k
characters and found them equal, we know that any suffix starting between positions i
and i+k
will be lexicographically smaller than the corresponding suffix starting from j
.
The Solution: Always jump by k + 1
to skip all the positions we've already implicitly compared:
if s[i + k] < s[j + k]: i = i + k + 1 # Correct: Skip all positions from i to i+k
Pitfall 2: Forgetting to Maintain the Invariant j > i
The Problem: After updating i
, it's possible that i
becomes greater than or equal to j
:
# WRONG: Not adjusting j after updating i if s[i + k] < s[j + k]: i = i + k + 1 k = 0 # Missing: j might now be <= i!
Why It's Wrong: The algorithm relies on comparing two different positions. If j <= i
, we'd either be comparing a position with itself or comparing in the wrong order, leading to incorrect results or infinite loops.
The Solution: Always ensure j
stays ahead of i
:
if s[i + k] < s[j + k]: i = i + k + 1 k = 0 if i >= j: j = i + 1 # Maintain j > i invariant
Pitfall 3: Not Resetting the Offset k
The Problem: Forgetting to reset k
to 0 after updating either i
or j
:
# WRONG: k not reset if s[i + k] < s[j + k]: i = i + k + 1 # Missing: k = 0
Why It's Wrong: The offset k
represents how many characters we've compared from the current starting positions. When we change either starting position, we need to begin a fresh comparison from offset 0.
The Solution: Always reset k
when changing starting positions:
if s[i + k] < s[j + k]: i = i + k + 1 k = 0 # Reset for new comparison
Pitfall 4: Misunderstanding Why This is O(n)
The Problem: Developers might think this is O(nΒ²) because of the nested nature of the comparisons (the while loop with internal offset increments).
Why It Seems Wrong: At first glance, it appears we might compare each character multiple times in different suffix comparisons.
The Clarification: Each character is examined at most twice throughout the entire algorithm:
- Once as part of the suffix starting at
i
- Once as part of the suffix starting at
j
When we update i
or j
by jumping k + 1
positions, we're effectively marking those skipped positions as "processed" and never return to them. This jumping mechanism ensures linear time complexity.
Which data structure is used in a depth first search?
Recommended Readings
Tech Interview Pattern Two Pointers Introduction If you prefer videos here's a super quick introduction to Two Pointers div class responsive iframe iframe src https www youtube com embed xZ4AfXHQ1VQ title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture allowfullscreen iframe div Two pointers is a common interview
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Donβt Miss This!