Facebook Pixel

28. Find the Index of the First Occurrence in a String

EasyTwo PointersStringString Matching
Leetcode Link

Problem Description

You are given two strings: haystack and needle. Your task is to find the first occurrence of the needle string within the haystack string.

The problem asks you to return the starting index position where needle first appears as a substring in haystack. If needle is not found anywhere in haystack, you should return -1.

For example:

  • If haystack = "hello" and needle = "ll", the function should return 2 because "ll" first appears at index 2 in "hello"
  • If haystack = "aaaaa" and needle = "bba", the function should return -1 because "bba" does not exist in "aaaaa"

The solution uses a straightforward sliding window approach. It iterates through the haystack string and at each position i, checks if the substring starting at i with length equal to needle matches the needle string. The iteration continues for n - m + 1 positions where n is the length of haystack and m is the length of needle, ensuring we don't go out of bounds. When a match is found, the starting index is immediately returned. If no match is found after checking all valid positions, -1 is returned.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

When we need to find a substring within a larger string, the most natural approach is to check every possible position where the substring could start. Think of it like looking for a word in a sentence - you would scan from left to right, checking if each position could be the beginning of the word you're searching for.

The key insight is that we only need to check positions where there's enough remaining characters to fit the entire needle. If haystack has length n and needle has length m, the last possible starting position would be at index n - m. Any position beyond that wouldn't have enough characters left to match the full needle.

For each valid starting position i, we can extract a substring of length m from haystack starting at position i using slice notation haystack[i : i + m]. We then directly compare this substring with needle. Python's string comparison handles the character-by-character matching for us internally.

This leads us to iterate through positions from 0 to n - m (inclusive), which gives us exactly n - m + 1 positions to check. At each position, we perform the substring extraction and comparison. As soon as we find a match, we can immediately return that index since we're looking for the first occurrence. If we complete the entire loop without finding a match, we know the needle doesn't exist in haystack and return -1.

The beauty of this approach lies in its simplicity - it directly translates our intuitive understanding of substring searching into code without requiring complex data structures or algorithms.

Learn more about Two Pointers patterns.

Solution Approach

The implementation uses a simple traversal approach to find the first occurrence of needle in haystack.

Algorithm Steps:

  1. Initialize Variables: First, we store the lengths of both strings for efficiency:

    • n = len(haystack) - length of the string we're searching in
    • m = len(needle) - length of the string we're searching for
  2. Set Up the Loop: We iterate through each valid starting position in haystack:

    • The loop runs from i = 0 to i = n - m (inclusive)
    • This gives us exactly n - m + 1 positions to check
    • We use range(n - m + 1) to generate these indices
  3. Check Each Position: At each index i:

    • Extract a substring from haystack using slice notation: haystack[i : i + m]
    • This gives us a substring of length m starting at position i
    • Compare this substring directly with needle using the equality operator ==
  4. Return on First Match:

    • If haystack[i : i + m] == needle evaluates to True, we immediately return i
    • This ensures we return the index of the first occurrence
  5. Handle No Match Case:

    • If the loop completes without finding a match, we return -1
    • This indicates that needle is not present in haystack

Example Walkthrough:

Let's trace through haystack = "hello" and needle = "ll":

  • n = 5, m = 2
  • Loop iterations: i ranges from 0 to 3 (since 5 - 2 + 1 = 4 iterations)
    • i = 0: Check "he" vs "ll" β†’ No match
    • i = 1: Check "el" vs "ll" β†’ No match
    • i = 2: Check "ll" vs "ll" β†’ Match found! Return 2

The time complexity is O((n - m + 1) Γ— m) where the outer loop runs n - m + 1 times and each string comparison takes O(m) time. The space complexity is O(1) as we only use a constant amount of extra space.

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through finding needle = "abc" in haystack = "ababcab".

Initial Setup:

  • haystack = "ababcab" (length n = 7)
  • needle = "abc" (length m = 3)
  • We'll check positions from index 0 to 4 (n - m = 7 - 3 = 4)

Step-by-step Process:

Position i = 0:
haystack: a b a b c a b
          ^ ^ ^
Extract:  "aba"
Compare:  "aba" == "abc"? β†’ No match, continue

Position i = 1:
haystack: a b a b c a b
            ^ ^ ^
Extract:  "bab"
Compare:  "bab" == "abc"? β†’ No match, continue

Position i = 2:
haystack: a b a b c a b
              ^ ^ ^
Extract:  "abc"
Compare:  "abc" == "abc"? β†’ Match found! Return 2

The function returns 2 as the first occurrence of "abc" starts at index 2.

Another Example - No Match Case:

Let's try needle = "xyz" in haystack = "hello":

  • n = 5, m = 3
  • Check positions 0 to 2 (5 - 3 = 2, so 3 positions total)
i = 0: "hel" != "xyz"
i = 1: "ell" != "xyz"  
i = 2: "llo" != "xyz"

All positions checked, no match found β†’ Return -1

Solution Implementation

1class Solution:
2    def strStr(self, haystack: str, needle: str) -> int:
3        """
4        Find the first occurrence of needle in haystack.
5      
6        Args:
7            haystack: The string to search in
8            needle: The substring to search for
9          
10        Returns:
11            The index of the first occurrence of needle in haystack,
12            or -1 if needle is not found
13        """
14        # Get lengths of both strings
15        haystack_length = len(haystack)
16        needle_length = len(needle)
17      
18        # Iterate through all possible starting positions in haystack
19        # We only need to check positions where there's enough space for needle
20        for start_index in range(haystack_length - needle_length + 1):
21            # Check if substring starting at current position matches needle
22            if haystack[start_index:start_index + needle_length] == needle:
23                # Found a match, return the starting index
24                return start_index
25      
26        # No match found in the entire haystack
27        return -1
28
1class Solution {
2    /**
3     * Find the first occurrence of needle in haystack using a sliding window approach.
4     * Returns the index of the first occurrence, or -1 if needle is not found.
5     * 
6     * @param haystack the string to search in
7     * @param needle the string to search for
8     * @return the starting index of the first occurrence of needle in haystack, or -1 if not found
9     */
10    public int strStr(String haystack, String needle) {
11        // Handle empty needle case - empty string is always found at index 0
12        if (needle.isEmpty()) {
13            return 0;
14        }
15
16        int haystackLength = haystack.length();
17        int needleLength = needle.length();
18      
19        // Pointers for traversing haystack and needle respectively
20        int haystackPointer = 0;  // Current position in haystack
21        int needlePointer = 0;     // Current position in needle being matched
22      
23        // Iterate through the haystack
24        while (haystackPointer < haystackLength) {
25            // Check if current characters match
26            if (haystack.charAt(haystackPointer) == needle.charAt(needlePointer)) {
27                // Special case: needle has only one character and it matches
28                if (needleLength == 1) {
29                    return haystackPointer;
30                }
31              
32                // Move both pointers forward to continue matching
33                haystackPointer++;
34                needlePointer++;
35            } else {
36                // Characters don't match - backtrack in haystack
37                // Move haystack pointer back to the next starting position
38                // (original start position + 1)
39                haystackPointer = haystackPointer - needlePointer + 1;
40              
41                // Reset needle pointer to start matching from beginning
42                needlePointer = 0;
43            }
44
45            // Check if we've matched the entire needle
46            if (needlePointer == needleLength) {
47                // Return the starting index of the match
48                return haystackPointer - needlePointer;
49            }
50        }
51      
52        // Needle not found in haystack
53        return -1;
54    }
55}
56
1class Solution {
2private:
3    // Build the failure function (partial match table) for KMP algorithm
4    // The failure function tells us where to continue matching after a mismatch
5    vector<int> buildFailureFunction(const string& pattern) {
6        int patternLength = pattern.length();
7        vector<int> failureTable(patternLength);
8      
9        // First character always has failure value of -1
10        failureTable[0] = -1;
11      
12        int currentIndex = 0;
13        int prefixIndex = -1;
14      
15        // Build the failure table by comparing pattern with itself
16        while (currentIndex < patternLength) {
17            // If there's a mismatch, backtrack using the failure table
18            while (prefixIndex >= 0 && pattern[currentIndex] != pattern[prefixIndex]) {
19                prefixIndex = failureTable[prefixIndex];
20            }
21          
22            currentIndex++;
23            prefixIndex++;
24          
25            // Prevent out of bounds access
26            if (currentIndex >= patternLength) {
27                break;
28            }
29          
30            // Optimization: if next characters match, we can skip further
31            if (pattern[currentIndex] == pattern[prefixIndex]) {
32                failureTable[currentIndex] = failureTable[prefixIndex];
33            } else {
34                failureTable[currentIndex] = prefixIndex;
35            }
36        }
37      
38        return failureTable;
39    }
40
41public:
42    // Find the first occurrence of needle in haystack using KMP algorithm
43    // Returns the index of first occurrence, or -1 if not found
44    int strStr(string haystack, string needle) {
45        // Empty needle matches at position 0
46        if (needle.length() == 0) {
47            return 0;
48        }
49      
50        // Build the failure function for the pattern (needle)
51        vector<int> failureTable = buildFailureFunction(needle);
52      
53        // Calculate the maximum starting position for a potential match
54        int maxStartPosition = haystack.length() - needle.length() + 1;
55      
56        // Try each possible starting position
57        for (int startPos = 0; startPos < maxStartPosition; startPos++) {
58            int patternIndex = 0;  // Current position in needle
59            int textIndex = startPos;  // Current position in haystack
60          
61            // Match characters one by one
62            while (patternIndex < needle.length() && textIndex < haystack.length()) {
63                // If characters don't match
64                if (haystack[textIndex] != needle[patternIndex]) {
65                    // Use failure function to skip unnecessary comparisons
66                    if (failureTable[patternIndex] >= 0) {
67                        patternIndex = failureTable[patternIndex];
68                        continue;  // Retry with new pattern position
69                    } else {
70                        break;  // No more backtracking possible, try next start position
71                    }
72                }
73              
74                // Characters match, move both pointers forward
75                textIndex++;
76                patternIndex++;
77            }
78          
79            // Check if we found a complete match
80            if (patternIndex >= needle.length()) {
81                return textIndex - patternIndex;  // Return starting index of match
82            }
83        }
84      
85        // No match found
86        return -1;
87    }
88};
89
1/**
2 * Finds the first occurrence of needle string in haystack string
3 * @param haystack - The string to search in
4 * @param needle - The string to search for
5 * @returns The index of first occurrence of needle in haystack, or -1 if not found
6 */
7function strStr(haystack: string, needle: string): number {
8    // Get the length of both strings
9    const haystackLength: number = haystack.length;
10    const needleLength: number = needle.length;
11  
12    // Iterate through possible starting positions in haystack
13    // Stop when remaining characters are less than needle length
14    for (let startIndex: number = 0; startIndex <= haystackLength - needleLength; startIndex++) {
15        // Flag to track if current position matches the needle
16        let isMatch: boolean = true;
17      
18        // Check each character of needle against haystack starting from current position
19        for (let needleIndex: number = 0; needleIndex < needleLength; needleIndex++) {
20            // Compare characters at corresponding positions
21            if (haystack[startIndex + needleIndex] !== needle[needleIndex]) {
22                // Characters don't match, mark as not matching and exit inner loop
23                isMatch = false;
24                break;
25            }
26        }
27      
28        // If all characters matched, return the starting index
29        if (isMatch) {
30            return startIndex;
31        }
32    }
33  
34    // Needle not found in haystack, return -1
35    return -1;
36}
37

Time and Space Complexity

The time complexity of this code is O((n - m) Γ— m), where n is the length of the haystack string and m is the length of the needle string.

This is because:

  • The outer loop runs n - m + 1 times (iterating through all possible starting positions in the haystack)
  • For each iteration, the string slicing operation haystack[i : i + m] creates a substring of length m, which takes O(m) time
  • The comparison haystack[i : i + m] == needle also takes O(m) time to compare two strings of length m
  • Therefore, the total time complexity is O((n - m + 1) Γ— m), which simplifies to O((n - m) Γ— m)

The space complexity is O(1) if we consider that string slicing in Python creates a new string object temporarily for comparison, but this temporary space doesn't scale with input size beyond the fixed substring length m. Since m is part of the input rather than additional space that grows, the space complexity is constant.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrect Loop Boundary Calculation

One of the most frequent mistakes is setting up the wrong loop boundary, which can lead to either missing valid positions or causing index out of bounds errors.

Incorrect implementations:

# WRONG: Goes too far and causes index out of bounds
for i in range(n - m + 2):  # Should be n - m + 1
    if haystack[i:i + m] == needle:
        return i

# WRONG: Stops too early and misses valid positions
for i in range(n - m):  # Missing the +1
    if haystack[i:i + m] == needle:
        return i

Why this happens: The correct formula n - m + 1 represents the number of valid starting positions. For example, with haystack = "hello" (n=5) and needle = "lo" (m=2), we need to check positions 0, 1, 2, and 3 (which is 4 positions total = 5 - 2 + 1).

Solution: Always use range(n - m + 1) or carefully validate your boundary conditions with edge cases.

2. Not Handling Edge Cases

Failing to handle special cases like empty strings or when needle is longer than haystack.

Problematic code:

def strStr(haystack: str, needle: str) -> int:
    n = len(haystack)
    m = len(needle)
    # Missing edge case handling
    for i in range(n - m + 1):  # This could be negative!
        if haystack[i:i + m] == needle:
            return i
    return -1

Issues that arise:

  • If needle is empty, should return 0 (by convention)
  • If needle is longer than haystack, n - m + 1 becomes negative, causing unexpected behavior

Solution: Add explicit edge case checks:

def strStr(haystack: str, needle: str) -> int:
    if not needle:  # Empty needle
        return 0
    if len(needle) > len(haystack):  # Needle longer than haystack
        return -1
  
    n = len(haystack)
    m = len(needle)
    for i in range(n - m + 1):
        if haystack[i:i + m] == needle:
            return i
    return -1

3. Inefficient String Comparison

Using character-by-character comparison in a nested loop instead of Python's built-in string comparison.

Inefficient approach:

def strStr(haystack: str, needle: str) -> int:
    n = len(haystack)
    m = len(needle)
  
    for i in range(n - m + 1):
        match = True
        for j in range(m):  # Unnecessary nested loop
            if haystack[i + j] != needle[j]:
                match = False
                break
        if match:
            return i
    return -1

Why it's problematic: While this works, it's more verbose and potentially less efficient than using Python's optimized string slicing and comparison. Python's built-in string operations are implemented in C and are highly optimized.

Solution: Use Python's string slicing as shown in the original solution:

if haystack[i:i + m] == needle:
    return i

4. Using Built-in Methods Incorrectly

While Python provides str.find() and str.index() methods, using them incorrectly or not understanding their behavior can cause issues.

Misunderstanding built-in methods:

def strStr(haystack: str, needle: str) -> int:
    # This works but defeats the purpose of the exercise
    return haystack.find(needle)
  
    # Or using index() which raises ValueError
    try:
        return haystack.index(needle)
    except ValueError:
        return -1

Issue: While these work, they don't demonstrate understanding of the underlying algorithm, which is often the point of the exercise in interviews or learning contexts.

Solution: Implement the logic manually to show algorithmic thinking, but know that haystack.find(needle) is the most Pythonic solution for production code.

Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More