3. Longest Substring Without Repeating Characters

MediumHash TableStringSliding Window
Leetcode Link

Problem Description

The task is to find the length of the longest substring within a given string s that does not contain any repeating characters. A substring is defined as a contiguous sequence of characters within a string. The goal is to seek out the longest possible substring where each character is unique; no character appears more than once.

Imagine you have a unique piece of string and you want to make the longest possible necklace with beads where each bead must have a different shape or color. How would you pick beads so that repeats are avoided? Similar to this, the problem requires us to find such a unique sequence of characters in the given string s.

Intuition

To solve this challenge, the approach revolves around using a sliding window to scan through the string s while keeping track of unique characters we've seen so far. This technique involves expanding and shrinking a window on different portions of the string to cover the non-repeating sequence of characters.

We use two pointers i (the start of the window) and j (the end of the window) to represent the current segment of the string we're looking at. If we see a new character that hasn't appeared in the current window, it's safe to add this character to our current sequence and move the end pointer j ahead.

However, if we find a character that's already in our current window, it means we've found a repeat and must therefore move the start pointer i forward. We do this by removing characters from the start of our window until the repeating character is eliminated from it.

During this process, we always keep track of the maximum length of the substring found that meets the condition. The length of the current unique character substring can be calculated by taking the difference of the end and start pointers j - i and adding 1 (since our count is zero-indexed).

At the end of this process, when we've checked all characters in the string, the maximum length we tracked gives us the length of the longest substring without repeating characters.

Learn more about Sliding Window patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Which two pointer technique does Quick Sort use?

Solution Approach

The solution implements a sliding window algorithm, which is an efficient way to maintain a subset of data in a larger dataset such as an array or string. In this context, we're dealing with a string and aiming to find a length of substring, i.e. a continuous range of characters, with distinct values. Two pointers, often denoted as i and j, are used to represent the current window, starting from the beginning of the string and ending at the last unique character found.

The algorithm also incorporates a hash set, named ss in the code, to efficiently verify if a character has already been seen within the current window. Hash sets offer constant time complexity for add, remove, and check operations, which makes them an ideal choice for this algorithm.

Here is the step-by-step breakdown of the implementation:

  1. Initialize a hash set ss to record the characters in the current window. This will allow us to quickly check if a character is part of the current substring without having to scan all of it.

  2. Initialize two pointers, i and j. i will point to the start of the current window, and j will iterate over the string to examine each character.

  3. Initialize a variable ans to keep track of the length of the longest substring found.

  4. Iterate over the string with j. For each character c located at the jth position:

    • While c is already in the set ss (implying a repeat and therefore a violation of the unique-substring rule), remove characters starting from the ith position and move i forward; this effectively shrinks the window from the left side until c can be added without creating a duplicate.

    • Once c is not in ss, add c to ss to include it in the current window.

    • Update ans if the current window size (j - i + 1) is larger than the maximum found so far.

  5. After iterating over all characters, return the value of ans.

The solution functions within O(n) time complexity—where n is the length of the string—since each character in the string is visited once by j and at most once by i as the window is moved forward.

To demonstrate the behavior of the sliding window algorithm, consider the Java template provided in the reference solution approach. This generic algorithm pattern consists of two pointers moving over a dataset and a condition checked in a while loop that modifies one of the pointers (j) based on some condition (check(j, i)) applied to the current range (or window) that the pointers define.

1for (int i = 0, j = 0; i < n; ++i) {
2    while (j < i && check(j, i)) {
3        ++j;
4    }
5    // logic of specific problem
6}

In our solution, the "check" is finding if c is already in the set ss, and the logic after the while loop is the add-to-set and max-value-update operations.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

What are the most two important steps in writing a depth first search function? (Select 2)

Example Walkthrough

Let's walk through the solution approach by using a simple example. Consider the input string s = "abcabcbb".

  1. Initialize variables:

    • A hash set ss to store characters of the current substring without repeating ones.
    • Two pointers: i (start of the window) set to 0, and j (end of the window) also set to 0.
    • A variable ans to store the length of the longest substring found, set to 0.
  2. Traverse the string with j:

    • The outer loop moves j from the left to the right across the string.
  3. Examining each character:

    • When j = 0, the character is 'a'. It's not in ss, so add it to the set and ans is updated to 1.
    • Move j to 1, now the character is 'b'. It's not in ss, so add it, and ans is updated to 2.
    • Move j to 2, the character is 'c'. It's not in ss, so add it, and ans is updated to 3.
    • Move j to 3, we find 'a' again. It's in ss, so we enter the while loop to start removing from the left side:
      • Remove 'a' from ss, and move i to 1.
    • Add 'a' back as its repeat was removed, and move j to 4.
    • Now j = 4 and the character is 'b'. Since 'b' is in ss, remove characters with the while loop:
      • Remove 'b' from ss, and move i to 2.
    • Because the repeat 'b' has been removed from ss, add the new 'b' and j moves to 5.
    • Continuing this pattern, j keeps moving to the right, adding unique characters to ss, and updating ans if we find a longer unique-substring along the way.
  4. Completing the traverse:

    • As j progresses to the end of the string (j = 8), we keep removing duplicates from the left and adding new characters to the right until our window contains unique characters only.
    • The length of each window is calculated and compared with ans, and ans is updated if a longer length is found.
  5. Return the result:

    • After the above process, we find that the longest substring without repeating characters is 'abc' with a length of 3, which is the final value of ans.

Applying this approach to our example string s = "abcabcbb", we successfully find that the longest substring without repeating characters is 3 characters long.

Not Sure What to Study? Take the 2-min Quiz:

What's the output of running the following function using input [30, 20, 10, 100, 33, 12]?

1def fun(arr: List[int]) -> List[int]:
2    import heapq
3    heapq.heapify(arr)
4    res = []
5    for i in range(3):
6        res.append(heapq.heappop(arr))
7    return res
8
1public static int[] fun(int[] arr) {
2    int[] res = new int[3];
3    PriorityQueue<Integer> heap = new PriorityQueue<>();
4    for (int i = 0; i < arr.length; i++) {
5        heap.add(arr[i]);
6    }
7    for (int i = 0; i < 3; i++) {
8        res[i] = heap.poll();
9    }
10    return res;
11}
12
1class HeapItem {
2    constructor(item, priority = item) {
3        this.item = item;
4        this.priority = priority;
5    }
6}
7
8class MinHeap {
9    constructor() {
10        this.heap = [];
11    }
12
13    push(node) {
14        // insert the new node at the end of the heap array
15        this.heap.push(node);
16        // find the correct position for the new node
17        this.bubble_up();
18    }
19
20    bubble_up() {
21        let index = this.heap.length - 1;
22
23        while (index > 0) {
24            const element = this.heap[index];
25            const parentIndex = Math.floor((index - 1) / 2);
26            const parent = this.heap[parentIndex];
27
28            if (parent.priority <= element.priority) break;
29            // if the parent is bigger than the child then swap the parent and child
30            this.heap[index] = parent;
31            this.heap[parentIndex] = element;
32            index = parentIndex;
33        }
34    }
35
36    pop() {
37        const min = this.heap[0];
38        this.heap[0] = this.heap[this.size() - 1];
39        this.heap.pop();
40        this.bubble_down();
41        return min;
42    }
43
44    bubble_down() {
45        let index = 0;
46        let min = index;
47        const n = this.heap.length;
48
49        while (index < n) {
50            const left = 2 * index + 1;
51            const right = left + 1;
52
53            if (left < n && this.heap[left].priority < this.heap[min].priority) {
54                min = left;
55            }
56            if (right < n && this.heap[right].priority < this.heap[min].priority) {
57                min = right;
58            }
59            if (min === index) break;
60            [this.heap[min], this.heap[index]] = [this.heap[index], this.heap[min]];
61            index = min;
62        }
63    }
64
65    peek() {
66        return this.heap[0];
67    }
68
69    size() {
70        return this.heap.length;
71    }
72}
73
74function fun(arr) {
75    const heap = new MinHeap();
76    for (const x of arr) {
77        heap.push(new HeapItem(x));
78    }
79    const res = [];
80    for (let i = 0; i < 3; i++) {
81        res.push(heap.pop().item);
82    }
83    return res;
84}
85

Python Solution

1class Solution:
2    def lengthOfLongestSubstring(self, s: str) -> int:
3        # Initialize a set to store unique characters of the substring
4        unique_chars = set()
5      
6        # Initialize pointers for the sliding window
7        left_pointer = 0
8        max_length = 0
9      
10        # Iterate over the string using the right_pointer
11        for right_pointer, char in enumerate(s):
12          
13            # if the current char is already in the set,
14            # remove characters from the left until the current char
15            # is no longer in the set to assure all unique substring
16            while char in unique_chars:
17                unique_chars.remove(s[left_pointer])
18                left_pointer += 1  # Shrink the window from the left side
19          
20            # Add the current char to the set as it's unique in current window
21            unique_chars.add(char)
22          
23            # Update the max_length if the current window size is greater
24            max_length = max(max_length, right_pointer - left_pointer + 1)
25      
26        # Return the length of the longest substring without repeating characters
27        return max_length
28

Java Solution

1class Solution {
2    // Method to calculate the length of the longest substring without repeating characters
3    public int lengthOfLongestSubstring(String s) {
4        // Use a HashSet to store the characters in the current window without duplicates
5        Set<Character> charSet = new HashSet<>();
6        int leftPointer = 0; // Initialize the left pointer for the sliding window
7        int maxLength = 0; // Variable to keep track of the longest substring length
8      
9        // Iterate through the string with the right pointer
10        for (int rightPointer = 0; rightPointer < s.length(); ++rightPointer) {
11            char currentChar = s.charAt(rightPointer); // Current character at the right pointer
12          
13            // If currentChar is already in the set, it means we have found a repeating character
14            // We slide the left pointer of the window to the right until the duplicate is removed
15            while (charSet.contains(currentChar)) {
16                charSet.remove(s.charAt(leftPointer++));
17            }
18          
19            // Add the current character to the set as it is now unique in the current window
20            charSet.add(currentChar);
21          
22            // Calculate the length of the current window (rightPointer - leftPointer + 1)
23            // Update the maxLength if the current window is larger
24            maxLength = Math.max(maxLength, rightPointer - leftPointer + 1);
25        }
26      
27        // Return the length of the longest substring without repeating characters
28        return maxLength;
29    }
30}
31

C++ Solution

1#include <string>
2#include <unordered_set>
3#include <algorithm>
4
5class Solution {
6public:
7    int lengthOfLongestSubstring(std::string s) {
8        // This unordered set is used to store the characters that are currently in the 
9        // longest substring without repeating characters.
10        std::unordered_set<char> charSet;
11      
12        // The starting index of the substring.
13        int start = 0;
14        // The length of the longest substring without repeating characters.
15        int maxLength = 0;
16      
17        // Iterate over the string.
18        for (int end = 0; end < s.size(); ++end) {
19            // If the character at the current ending index of the substring already exists
20            // in the character set, continue to remove characters from the start of the
21            // substring until the character is no longer in the set.
22            while (charSet.count(s[end])) {
23                charSet.erase(s[start]);
24                start += 1;
25            }
26          
27            // Insert the current character into the set.
28            charSet.insert(s[end]);
29          
30            // Calculate the length of the current substring and update maxLength
31            // if this length is the largest we've found so far.
32            maxLength = std::max(maxLength, end - start + 1);
33        }
34        // Return the length of the longest substring found.
35        return maxLength;
36    }
37};
38

Typescript Solution

1function lengthOfLongestSubstring(s: string): number {
2    // Initialize the length of the longest substring without repeating characters
3    let maxLength = 0;
4  
5    // Create a Set to store the unique characters of the current substring
6    const seenCharacters = new Set<string>();
7  
8    // Use two pointers i and j to denote the start and end of the substring
9    let i = 0;
10    let j = 0;
11  
12    while (i < s.length) {
13        // If the current character is already in the set, remove characters from the set starting from the beginning
14        // until the current character is no longer in the set
15        while (seenCharacters.has(s[i])) {
16            seenCharacters.delete(s[j]);
17            j++;
18        }
19      
20        // Add the current character to the set
21        seenCharacters.add(s[i]);
22      
23        // Calculate the length of the current substring and update the maxLength if needed
24        maxLength = Math.max(maxLength, i - j + 1);
25      
26        // Move to the next character
27        i++;
28    }
29  
30    // Return the length of the longest substring without repeating characters
31    return maxLength;
32}
33
Fast Track Your Learning with Our Quick Skills Quiz:

Which of the following is a min heap?

Time and Space Complexity

Time Complexity

The time complexity of the code is O(2n) which simplifies to O(n), where n is the length of the string s. This is because in the worst case, each character will be visited twice by the two pointers i and j - once when j encounters the character and once when i moves past the character after it has been found in the set ss. However, each character is processed only a constant number of times, hence we consider the overall time complexity to be linear.

Space Complexity

The space complexity of the code is O(min(n, m)), where n is the size of the string s and m is the size of the character set (the number of unique characters in s). In the worst case, the set ss can store all unique characters of the string if all characters in the string are distinct. However, m is the limiting factor since it represents the size of the character set that can be stored in ss. Therefore, if n is larger than m, the space complexity is limited by m rather than n.

Learn more about how to find time and space complexity quickly.


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫