28. Find the Index of the First Occurrence in a String
Problem Description
You are given two strings: haystack
and needle
. Your task is to find the first occurrence of the needle
string within the haystack
string.
The problem asks you to return the starting index position where needle
first appears as a substring in haystack
. If needle
is not found anywhere in haystack
, you should return -1
.
For example:
- If
haystack = "hello"
andneedle = "ll"
, the function should return2
because "ll" first appears at index 2 in "hello" - If
haystack = "aaaaa"
andneedle = "bba"
, the function should return-1
because "bba" does not exist in "aaaaa"
The solution uses a straightforward sliding window approach. It iterates through the haystack
string and at each position i
, checks if the substring starting at i
with length equal to needle
matches the needle
string. The iteration continues for n - m + 1
positions where n
is the length of haystack
and m
is the length of needle
, ensuring we don't go out of bounds. When a match is found, the starting index is immediately returned. If no match is found after checking all valid positions, -1
is returned.
Intuition
When we need to find a substring within a larger string, the most natural approach is to check every possible position where the substring could start. Think of it like looking for a word in a sentence - you would scan from left to right, checking if each position could be the beginning of the word you're searching for.
The key insight is that we only need to check positions where there's enough remaining characters to fit the entire needle
. If haystack
has length n
and needle
has length m
, the last possible starting position would be at index n - m
. Any position beyond that wouldn't have enough characters left to match the full needle
.
For each valid starting position i
, we can extract a substring of length m
from haystack
starting at position i
using slice notation haystack[i : i + m]
. We then directly compare this substring with needle
. Python's string comparison handles the character-by-character matching for us internally.
This leads us to iterate through positions from 0
to n - m
(inclusive), which gives us exactly n - m + 1
positions to check. At each position, we perform the substring extraction and comparison. As soon as we find a match, we can immediately return that index since we're looking for the first occurrence. If we complete the entire loop without finding a match, we know the needle
doesn't exist in haystack
and return -1
.
The beauty of this approach lies in its simplicity - it directly translates our intuitive understanding of substring searching into code without requiring complex data structures or algorithms.
Learn more about Two Pointers patterns.
Solution Approach
The implementation uses a simple traversal approach to find the first occurrence of needle
in haystack
.
Algorithm Steps:
-
Initialize Variables: First, we store the lengths of both strings for efficiency:
n = len(haystack)
- length of the string we're searching inm = len(needle)
- length of the string we're searching for
-
Set Up the Loop: We iterate through each valid starting position in
haystack
:- The loop runs from
i = 0
toi = n - m
(inclusive) - This gives us exactly
n - m + 1
positions to check - We use
range(n - m + 1)
to generate these indices
- The loop runs from
-
Check Each Position: At each index
i
:- Extract a substring from
haystack
using slice notation:haystack[i : i + m]
- This gives us a substring of length
m
starting at positioni
- Compare this substring directly with
needle
using the equality operator==
- Extract a substring from
-
Return on First Match:
- If
haystack[i : i + m] == needle
evaluates toTrue
, we immediately returni
- This ensures we return the index of the first occurrence
- If
-
Handle No Match Case:
- If the loop completes without finding a match, we return
-1
- This indicates that
needle
is not present inhaystack
- If the loop completes without finding a match, we return
Example Walkthrough:
Let's trace through haystack = "hello"
and needle = "ll"
:
n = 5
,m = 2
- Loop iterations:
i
ranges from0
to3
(since5 - 2 + 1 = 4
iterations)i = 0
: Check"he"
vs"ll"
β No matchi = 1
: Check"el"
vs"ll"
β No matchi = 2
: Check"ll"
vs"ll"
β Match found! Return2
The time complexity is O((n - m + 1) Γ m)
where the outer loop runs n - m + 1
times and each string comparison takes O(m)
time. The space complexity is O(1)
as we only use a constant amount of extra space.
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through finding needle = "abc"
in haystack = "ababcab"
.
Initial Setup:
haystack = "ababcab"
(length n = 7)needle = "abc"
(length m = 3)- We'll check positions from index 0 to 4 (n - m = 7 - 3 = 4)
Step-by-step Process:
Position i = 0: haystack: a b a b c a b ^ ^ ^ Extract: "aba" Compare: "aba" == "abc"? β No match, continue Position i = 1: haystack: a b a b c a b ^ ^ ^ Extract: "bab" Compare: "bab" == "abc"? β No match, continue Position i = 2: haystack: a b a b c a b ^ ^ ^ Extract: "abc" Compare: "abc" == "abc"? β Match found! Return 2
The function returns 2
as the first occurrence of "abc" starts at index 2.
Another Example - No Match Case:
Let's try needle = "xyz"
in haystack = "hello"
:
- n = 5, m = 3
- Check positions 0 to 2 (5 - 3 = 2, so 3 positions total)
i = 0: "hel" != "xyz" i = 1: "ell" != "xyz" i = 2: "llo" != "xyz"
All positions checked, no match found β Return -1
Solution Implementation
1class Solution:
2 def strStr(self, haystack: str, needle: str) -> int:
3 """
4 Find the first occurrence of needle in haystack.
5
6 Args:
7 haystack: The string to search in
8 needle: The substring to search for
9
10 Returns:
11 The index of the first occurrence of needle in haystack,
12 or -1 if needle is not found
13 """
14 # Get lengths of both strings
15 haystack_length = len(haystack)
16 needle_length = len(needle)
17
18 # Iterate through all possible starting positions in haystack
19 # We only need to check positions where there's enough space for needle
20 for start_index in range(haystack_length - needle_length + 1):
21 # Check if substring starting at current position matches needle
22 if haystack[start_index:start_index + needle_length] == needle:
23 # Found a match, return the starting index
24 return start_index
25
26 # No match found in the entire haystack
27 return -1
28
1class Solution {
2 /**
3 * Find the first occurrence of needle in haystack using a sliding window approach.
4 * Returns the index of the first occurrence, or -1 if needle is not found.
5 *
6 * @param haystack the string to search in
7 * @param needle the string to search for
8 * @return the starting index of the first occurrence of needle in haystack, or -1 if not found
9 */
10 public int strStr(String haystack, String needle) {
11 // Handle empty needle case - empty string is always found at index 0
12 if (needle.isEmpty()) {
13 return 0;
14 }
15
16 int haystackLength = haystack.length();
17 int needleLength = needle.length();
18
19 // Pointers for traversing haystack and needle respectively
20 int haystackPointer = 0; // Current position in haystack
21 int needlePointer = 0; // Current position in needle being matched
22
23 // Iterate through the haystack
24 while (haystackPointer < haystackLength) {
25 // Check if current characters match
26 if (haystack.charAt(haystackPointer) == needle.charAt(needlePointer)) {
27 // Special case: needle has only one character and it matches
28 if (needleLength == 1) {
29 return haystackPointer;
30 }
31
32 // Move both pointers forward to continue matching
33 haystackPointer++;
34 needlePointer++;
35 } else {
36 // Characters don't match - backtrack in haystack
37 // Move haystack pointer back to the next starting position
38 // (original start position + 1)
39 haystackPointer = haystackPointer - needlePointer + 1;
40
41 // Reset needle pointer to start matching from beginning
42 needlePointer = 0;
43 }
44
45 // Check if we've matched the entire needle
46 if (needlePointer == needleLength) {
47 // Return the starting index of the match
48 return haystackPointer - needlePointer;
49 }
50 }
51
52 // Needle not found in haystack
53 return -1;
54 }
55}
56
1class Solution {
2private:
3 // Build the failure function (partial match table) for KMP algorithm
4 // The failure function tells us where to continue matching after a mismatch
5 vector<int> buildFailureFunction(const string& pattern) {
6 int patternLength = pattern.length();
7 vector<int> failureTable(patternLength);
8
9 // First character always has failure value of -1
10 failureTable[0] = -1;
11
12 int currentIndex = 0;
13 int prefixIndex = -1;
14
15 // Build the failure table by comparing pattern with itself
16 while (currentIndex < patternLength) {
17 // If there's a mismatch, backtrack using the failure table
18 while (prefixIndex >= 0 && pattern[currentIndex] != pattern[prefixIndex]) {
19 prefixIndex = failureTable[prefixIndex];
20 }
21
22 currentIndex++;
23 prefixIndex++;
24
25 // Prevent out of bounds access
26 if (currentIndex >= patternLength) {
27 break;
28 }
29
30 // Optimization: if next characters match, we can skip further
31 if (pattern[currentIndex] == pattern[prefixIndex]) {
32 failureTable[currentIndex] = failureTable[prefixIndex];
33 } else {
34 failureTable[currentIndex] = prefixIndex;
35 }
36 }
37
38 return failureTable;
39 }
40
41public:
42 // Find the first occurrence of needle in haystack using KMP algorithm
43 // Returns the index of first occurrence, or -1 if not found
44 int strStr(string haystack, string needle) {
45 // Empty needle matches at position 0
46 if (needle.length() == 0) {
47 return 0;
48 }
49
50 // Build the failure function for the pattern (needle)
51 vector<int> failureTable = buildFailureFunction(needle);
52
53 // Calculate the maximum starting position for a potential match
54 int maxStartPosition = haystack.length() - needle.length() + 1;
55
56 // Try each possible starting position
57 for (int startPos = 0; startPos < maxStartPosition; startPos++) {
58 int patternIndex = 0; // Current position in needle
59 int textIndex = startPos; // Current position in haystack
60
61 // Match characters one by one
62 while (patternIndex < needle.length() && textIndex < haystack.length()) {
63 // If characters don't match
64 if (haystack[textIndex] != needle[patternIndex]) {
65 // Use failure function to skip unnecessary comparisons
66 if (failureTable[patternIndex] >= 0) {
67 patternIndex = failureTable[patternIndex];
68 continue; // Retry with new pattern position
69 } else {
70 break; // No more backtracking possible, try next start position
71 }
72 }
73
74 // Characters match, move both pointers forward
75 textIndex++;
76 patternIndex++;
77 }
78
79 // Check if we found a complete match
80 if (patternIndex >= needle.length()) {
81 return textIndex - patternIndex; // Return starting index of match
82 }
83 }
84
85 // No match found
86 return -1;
87 }
88};
89
1/**
2 * Finds the first occurrence of needle string in haystack string
3 * @param haystack - The string to search in
4 * @param needle - The string to search for
5 * @returns The index of first occurrence of needle in haystack, or -1 if not found
6 */
7function strStr(haystack: string, needle: string): number {
8 // Get the length of both strings
9 const haystackLength: number = haystack.length;
10 const needleLength: number = needle.length;
11
12 // Iterate through possible starting positions in haystack
13 // Stop when remaining characters are less than needle length
14 for (let startIndex: number = 0; startIndex <= haystackLength - needleLength; startIndex++) {
15 // Flag to track if current position matches the needle
16 let isMatch: boolean = true;
17
18 // Check each character of needle against haystack starting from current position
19 for (let needleIndex: number = 0; needleIndex < needleLength; needleIndex++) {
20 // Compare characters at corresponding positions
21 if (haystack[startIndex + needleIndex] !== needle[needleIndex]) {
22 // Characters don't match, mark as not matching and exit inner loop
23 isMatch = false;
24 break;
25 }
26 }
27
28 // If all characters matched, return the starting index
29 if (isMatch) {
30 return startIndex;
31 }
32 }
33
34 // Needle not found in haystack, return -1
35 return -1;
36}
37
Time and Space Complexity
The time complexity of this code is O((n - m) Γ m)
, where n
is the length of the haystack
string and m
is the length of the needle
string.
This is because:
- The outer loop runs
n - m + 1
times (iterating through all possible starting positions in the haystack) - For each iteration, the string slicing operation
haystack[i : i + m]
creates a substring of lengthm
, which takesO(m)
time - The comparison
haystack[i : i + m] == needle
also takesO(m)
time to compare two strings of lengthm
- Therefore, the total time complexity is
O((n - m + 1) Γ m)
, which simplifies toO((n - m) Γ m)
The space complexity is O(1)
if we consider that string slicing in Python creates a new string object temporarily for comparison, but this temporary space doesn't scale with input size beyond the fixed substring length m
. Since m
is part of the input rather than additional space that grows, the space complexity is constant.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Incorrect Loop Boundary Calculation
One of the most frequent mistakes is setting up the wrong loop boundary, which can lead to either missing valid positions or causing index out of bounds errors.
Incorrect implementations:
# WRONG: Goes too far and causes index out of bounds
for i in range(n - m + 2): # Should be n - m + 1
if haystack[i:i + m] == needle:
return i
# WRONG: Stops too early and misses valid positions
for i in range(n - m): # Missing the +1
if haystack[i:i + m] == needle:
return i
Why this happens: The correct formula n - m + 1
represents the number of valid starting positions. For example, with haystack = "hello"
(n=5) and needle = "lo"
(m=2), we need to check positions 0, 1, 2, and 3 (which is 4 positions total = 5 - 2 + 1).
Solution: Always use range(n - m + 1)
or carefully validate your boundary conditions with edge cases.
2. Not Handling Edge Cases
Failing to handle special cases like empty strings or when needle is longer than haystack.
Problematic code:
def strStr(haystack: str, needle: str) -> int:
n = len(haystack)
m = len(needle)
# Missing edge case handling
for i in range(n - m + 1): # This could be negative!
if haystack[i:i + m] == needle:
return i
return -1
Issues that arise:
- If
needle
is empty, should return 0 (by convention) - If
needle
is longer thanhaystack
,n - m + 1
becomes negative, causing unexpected behavior
Solution: Add explicit edge case checks:
def strStr(haystack: str, needle: str) -> int:
if not needle: # Empty needle
return 0
if len(needle) > len(haystack): # Needle longer than haystack
return -1
n = len(haystack)
m = len(needle)
for i in range(n - m + 1):
if haystack[i:i + m] == needle:
return i
return -1
3. Inefficient String Comparison
Using character-by-character comparison in a nested loop instead of Python's built-in string comparison.
Inefficient approach:
def strStr(haystack: str, needle: str) -> int:
n = len(haystack)
m = len(needle)
for i in range(n - m + 1):
match = True
for j in range(m): # Unnecessary nested loop
if haystack[i + j] != needle[j]:
match = False
break
if match:
return i
return -1
Why it's problematic: While this works, it's more verbose and potentially less efficient than using Python's optimized string slicing and comparison. Python's built-in string operations are implemented in C and are highly optimized.
Solution: Use Python's string slicing as shown in the original solution:
if haystack[i:i + m] == needle: return i
4. Using Built-in Methods Incorrectly
While Python provides str.find()
and str.index()
methods, using them incorrectly or not understanding their behavior can cause issues.
Misunderstanding built-in methods:
def strStr(haystack: str, needle: str) -> int:
# This works but defeats the purpose of the exercise
return haystack.find(needle)
# Or using index() which raises ValueError
try:
return haystack.index(needle)
except ValueError:
return -1
Issue: While these work, they don't demonstrate understanding of the underlying algorithm, which is often the point of the exercise in interviews or learning contexts.
Solution: Implement the logic manually to show algorithmic thinking, but know that haystack.find(needle)
is the most Pythonic solution for production code.
Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?
Recommended Readings
Tech Interview Pattern Two Pointers Introduction If you prefer videos here's a super quick introduction to Two Pointers div class responsive iframe iframe src https www youtube com embed xZ4AfXHQ1VQ title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture allowfullscreen iframe div Two pointers is a common interview
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Donβt Miss This!