28. Find the Index of the First Occurrence in a String
Problem Description
The task is to find the first occurrence of the string needle
within another string haystack
. If needle
is found within haystack
, we should return the starting index of the first occurrence. If needle
is not found, we return -1
. It's important to note that an empty needle
results in 0
, as an empty string is considered to be a substring of any string, including an empty string itself.
Intuition
To solve this problem, the intuitive approach is to scan through the haystack
string and for each position, check whether the substring starting at that position matches the needle
. We can do this in a linear scan, considering the length of needle
is m
and the length of haystack
is n
.
We only need to scan until n - m + 1
in haystack
, since if we start matching any later than that, needle
would overflow the bounds of haystack
. For each position i
starting from 0
to n - m
, we take a substring of haystack
from i
to i + m
and compare it against needle
. If it matches, we know we've found the first occurrence, and we return the index i
. If we reach the end of the scan without finding needle
, we return -1
.
The time complexity of this approach is O((n-m) x m)
since in the worst-case scenario, for each starting position, we might compare m
characters until we find a mismatch. The space complexity is O(1)
as we are not using any extra space proportional to the input size; we are only using a few variables to store the indices and lengths.
Learn more about Two Pointers patterns.
Solution Approach
The implementation of the solution is straightforward, following the idea described in the intuition. Here's a step-by-step explanation of the algorithm and its practical realization in the given Python code:
-
First, we obtain the length of both
haystack
andneedle
to manage our loop and comparisons. Let's denote the length ofhaystack
asn
and the length ofneedle
asm
. -
We use a single loop to iterate over the
haystack
string. The end condition for our loop isn - m + 1
, which ensures that we don't attempt to matchneedle
beyond the point where it could possibly fit insidehaystack
. -
Inside the loop, we take a substring of
haystack
starting from the current indexi
and spanningm
characters (the entire length ofneedle
). In Python, the substring operation ishaystack[i : i + m]
. -
We then compare this substring with
needle
. If they are equivalent, it means we have found the first occurrence ofneedle
inhaystack
. In this case, we return the current indexi
. -
If the loop completes without finding a match, it means
needle
is not a part ofhaystack
. In this final case, we return-1
to indicate the absence ofneedle
inhaystack
.
In terms of algorithms, this approach uses a simple linear scan with a direct string comparison, making it easy to understand and implement. No additional data structures or complex patterns are used, keeping the space complexity to O(1)
.
The key part of the code that performs the above logic is:
for i in range(n - m + 1):
if haystack[i : i + m] == needle:
return i
return -1
This code reflects directly the described steps, iterating through haystack
, extracting substrings, and comparing them with needle
. It's a simple yet effective solution that leverages Python's built-in ability to handle substrings and comparisons elegantly.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate the solution approach.
Consider the following strings:
haystack = "hello"
needle = "ll"
Now, following the solution approach steps:
-
Obtain lengths of
haystack
andneedle
. Here,n = 5
(length of"hello"
) andm = 2
(length of"ll"
). -
Begin a loop to iterate over the
haystack
from index0
to5 - 2 + 1 = 4
. -
Inside the loop, extract substrings of
haystack
of lengthm
. We will have the following comparisons:i = 0
-> compare"he"
with"ll"
-> not a matchi = 1
-> compare"el"
with"ll"
-> not a matchi = 2
-> compare"ll"
with"ll"
-> this is a match!
-
Since we found the match
"ll"
inhaystack
starting at index2
, we can stop our search and return this index. -
If no match was found by the end of the loop, we would return
-1
. However, in this case, we did find theneedle
in thehaystack
, so the loop ceases with a successful outcome, returning2
.
The loop would have continued to i = 3
and compared "lo"
with "ll"
if a match had not been found at i = 2
.
Following the solution approach, this process efficiently finds the first occurrence of needle
in the haystack
, if it exists, and returns its starting index. If the needle
is not present, -1
would be the result.
Solution Implementation
1class Solution:
2 def strStr(self, haystack: str, needle: str) -> int:
3 # Length of the haystack and needle strings
4 haystack_length, needle_length = len(haystack), len(needle)
5
6 # Check all possible starting positions of needle in haystack
7 for start in range(haystack_length - needle_length + 1):
8 # If the substring matching the needle's length equals the needle, return the start index
9 if haystack[start : start + needle_length] == needle:
10 return start
11
12 # If the needle is not found in haystack, return -1
13 return -1
14
15# The method strStr is intended to find the first occurrence of the needle string
16# in the haystack string and return the index at which it begins.
17# If needle is not part of haystack, it returns -1.
18
1class Solution {
2 public int strStr(String haystack, String needle) {
3 // If needle is empty, the index to return is 0 (as per the problem statement).
4 if (needle.isEmpty()) {
5 return 0;
6 }
7
8 // Get the lengths of haystack and needle.
9 int haystackLength = haystack.length();
10 int needleLength = needle.length();
11
12 // Initialize pointers for haystack and needle.
13 int haystackPointer = 0;
14 int needlePointer = 0;
15
16 // Iterate through the haystack.
17 while (haystackPointer < haystackLength) {
18 // Check if the current characters in the haystack and needle are the same.
19 if (haystack.charAt(haystackPointer) == needle.charAt(needlePointer)) {
20 // Special case: if needle length is 1 and characters match, we found the needle.
21 if (needleLength == 1) {
22 return haystackPointer;
23 }
24 // Move both pointers forward.
25 haystackPointer++;
26 needlePointer++;
27 } else {
28 // Current characters do not match. Reset haystackPointer back by the amount
29 // needlePointer had advanced, then move forward by one to search from next position.
30 haystackPointer -= needlePointer - 1;
31 // Reset needlePointer back to the start of the needle.
32 needlePointer = 0;
33 }
34
35 // Check if the needle has been found within the haystack.
36 if (needlePointer == needleLength) {
37 // The start of the substring in haystack that matches the needle
38 // is at the difference between current haystackPointer and the length of the needle.
39 return haystackPointer - needlePointer;
40 }
41 }
42
43 // Needle was not found in the haystack. Return -1 as specified in the problem statement.
44 return -1;
45 }
46}
47
1class Solution {
2private:
3 // Constructs the 'next' array used in the KMP algorithm to optimize matching
4 vector<int> buildNextArray(string pattern) {
5 vector<int> next(pattern.length());
6 next[0] = -1; // Initialization with -1 for the first character
7 int index = 0; // Index in the pattern string
8 int prefixIndex = -1; // Index of the longest prefix that is also a suffix
9 int patternLength = pattern.length();
10 while (index < patternLength) {
11 // When there is a mismatch or it's the first iteration
12 while (prefixIndex >= 0 && pattern[index] != pattern[prefixIndex])
13 prefixIndex = next[prefixIndex];
14 index++, prefixIndex++;
15 // If we have not reached the end of the pattern
16 if (index < patternLength) {
17 // Record the length of the longest prefix which is also suffix
18 if (pattern[index] == pattern[prefixIndex])
19 next[index] = next[prefixIndex];
20 else
21 next[index] = prefixIndex;
22 }
23 }
24 return next;
25 }
26
27public:
28 // Function to find the first occurrence of `needle` in `haystack`
29 int strStr(string haystack, string needle) {
30 if (needle.empty()) // If the needle is empty, return 0 as per convention
31 return 0;
32
33 vector<int> nextArray = buildNextArray(needle);
34
35 int haystackLength = haystack.length(); // Length of the haystack string
36 int needleLength = needle.length(); // Length of the needle string
37 int len = haystackLength - needleLength + 1; // Compute the limit of searching
38 for (int i = 0; i < len; ++i) {
39 int needleIndex = 0; // Index for the needle string
40 int haystackIndex = i; // Starting index in the haystack string
41 // Search while the characters match and we are within both strings
42 while (needleIndex < needleLength && haystackIndex < haystackLength) {
43 if (haystack[haystackIndex] != needle[needleIndex]) {
44 if (nextArray[needleIndex] >= 0) {
45 needleIndex = nextArray[needleIndex];
46 continue; // Use the next array to skip some comparisons
47 } else
48 break; // Mismatch without a valid sub-prefix match
49 }
50 ++haystackIndex, ++needleIndex;
51 }
52 // When the whole needle string has been traversed, return the starting index
53 if (needleIndex == needleLength)
54 return haystackIndex - needleIndex;
55 }
56
57 return -1; // If the needle has not been found, return -1
58 }
59};
60
1/**
2 * Finds the first occurrence of the `needle` in `haystack`, and returns its index.
3 * If `needle` is not found in `haystack`, returns -1.
4 *
5 * @param haystack - The string to be searched within.
6 * @param needle - The string to find in the haystack.
7 * @returns The index at which the needle is found in the haystack, or -1 if not found.
8 */
9function strStr(haystack: string, needle: string): number {
10 // Length of the haystack and needle strings
11 const haystackLength: number = haystack.length;
12 const needleLength: number = needle.length;
13
14 // Early return if the needle is an empty string
15 if (needleLength === 0) return 0;
16
17 // Loop through each character in the haystack until there's no room left for the needle
18 for (let i = 0; i <= haystackLength - needleLength; i++) {
19 // Assume that the needle is found unless a mismatch is found
20 let isMatch: boolean = true;
21
22 // Check each character of the needle against the haystack
23 for (let j = 0; j < needleLength; j++) {
24 if (haystack[i + j] !== needle[j]) {
25 // If characters do not match, set isMatch to false and break out of the inner loop
26 isMatch = false;
27 break;
28 }
29 }
30
31 // If the needle was found in the haystack, return the starting index `i`
32 if (isMatch) {
33 return i;
34 }
35 }
36
37 // If the needle was not found in the haystack, return -1
38 return -1;
39}
40
Time and Space Complexity
The time complexity of the provided code is O((n - m + 1) * m)
, where n
is the length of the haystack string and m
is the length of the needle string. The for
loop iterates up to (n - m + 1)
times for the worst-case scenario, and the ==
operation inside the loop takes O(m)
time to compare the substring to the needle.
The space complexity of the code is O(1)
since only a few variables are used and there is no additional space allocated that is dependent on the input size.
Learn more about how to find time and space complexity quickly using problem constraints.
How does merge sort divide the problem into subproblems?
Recommended Readings
Tech Interview Pattern Two Pointers Introduction If you prefer videos here's a super quick introduction to Two Pointers div class responsive iframe iframe src https www youtube com embed xZ4AfXHQ1VQ title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture allowfullscreen iframe div Two pointers is a common interview
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Donāt Miss This!