2788. Split Strings by Separator


Problem Description

In this problem, we have an input array words which consists of strings, and a separator, which is a single character. Our task is to take each string in words and split it into substrings based on where the separator occurs. Once we split the strings, we need to exclude any empty substrings that may result from this process. The output should be a new array of strings which represents the initial strings split by the separator, maintaining the original order of the strings in words. Importantly, the separator itself should not be included in the resulting strings.

Intuition

The intuition behind the solution is to use the built-in string function split() which is available in Python. The split() function takes a character or substring and splits the string at every occurrence of this character or substring. This works perfectly for our use case with the separator. The idea is to iterate through each string in the words array, apply the split(separator) function to divide it into substrings, and collect the resulting substrings in the new output array. However, because we should not include any empty strings in the output, we add a condition to only include substrings that are not empty (if s).

This approach allows us to construct the desired output array with each string being split by the separator and empty strings being excluded, all while keeping the order of the originally provided strings intact.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

What's the output of running the following function using input [30, 20, 10, 100, 33, 12]?

1def fun(arr: List[int]) -> List[int]:
2    import heapq
3    heapq.heapify(arr)
4    res = []
5    for i in range(3):
6        res.append(heapq.heappop(arr))
7    return res
8
1public static int[] fun(int[] arr) {
2    int[] res = new int[3];
3    PriorityQueue<Integer> heap = new PriorityQueue<>();
4    for (int i = 0; i < arr.length; i++) {
5        heap.add(arr[i]);
6    }
7    for (int i = 0; i < 3; i++) {
8        res[i] = heap.poll();
9    }
10    return res;
11}
12
1class HeapItem {
2    constructor(item, priority = item) {
3        this.item = item;
4        this.priority = priority;
5    }
6}
7
8class MinHeap {
9    constructor() {
10        this.heap = [];
11    }
12
13    push(node) {
14        // insert the new node at the end of the heap array
15        this.heap.push(node);
16        // find the correct position for the new node
17        this.bubble_up();
18    }
19
20    bubble_up() {
21        let index = this.heap.length - 1;
22
23        while (index > 0) {
24            const element = this.heap[index];
25            const parentIndex = Math.floor((index - 1) / 2);
26            const parent = this.heap[parentIndex];
27
28            if (parent.priority <= element.priority) break;
29            // if the parent is bigger than the child then swap the parent and child
30            this.heap[index] = parent;
31            this.heap[parentIndex] = element;
32            index = parentIndex;
33        }
34    }
35
36    pop() {
37        const min = this.heap[0];
38        this.heap[0] = this.heap[this.size() - 1];
39        this.heap.pop();
40        this.bubble_down();
41        return min;
42    }
43
44    bubble_down() {
45        let index = 0;
46        let min = index;
47        const n = this.heap.length;
48
49        while (index < n) {
50            const left = 2 * index + 1;
51            const right = left + 1;
52
53            if (left < n && this.heap[left].priority < this.heap[min].priority) {
54                min = left;
55            }
56            if (right < n && this.heap[right].priority < this.heap[min].priority) {
57                min = right;
58            }
59            if (min === index) break;
60            [this.heap[min], this.heap[index]] = [this.heap[index], this.heap[min]];
61            index = min;
62        }
63    }
64
65    peek() {
66        return this.heap[0];
67    }
68
69    size() {
70        return this.heap.length;
71    }
72}
73
74function fun(arr) {
75    const heap = new MinHeap();
76    for (const x of arr) {
77        heap.push(new HeapItem(x));
78    }
79    const res = [];
80    for (let i = 0; i < 3; i++) {
81        res.push(heap.pop().item);
82    }
83    return res;
84}
85

Solution Approach

The solution leverages the power of list comprehension in Python, which is a compact way to process elements in a list and construct a new list based on certain criteria. There are three main parts to our list comprehension:

  1. Iterating over each word in the input words list.
  2. Splitting each word by the given separator to create a list of substrings.
  3. Filtering out any resulting empty strings.

These steps can be broken down as follows:

  • Iteration: The outer loop for w in words goes through each string w present in the input list words.

  • Split Operation: For each string w, w.split(separator) is called. This function returns a list of substrings obtained by splitting w at every occurrence of the character separator.

  • Filtering Empty Strings: Inside the list comprehension, after the split operation, there's another loop for s in w.split(separator). Here, s represents each substring result from the split. The if s part ensures that only non-empty s (substrings) get included in the final list. This filtering is important to meet the problem's requirement to exclude empty strings.

  • Creating the Output List: The list comprehension [s for w in words for s in w.split(separator) if s] then constructs the output list. This list includes all non-empty substrings obtained from splitting each of the strings in words by the separator.

Code Analysis: The given solution code uses no additional data structures other than the list to hold the final result. It's a direct application of string manipulation methods and list comprehension, which are very efficient for this kind of task. The overall time complexity is O(n * m), where n is the number of strings in the input list and m is the average length of the strings, assuming split() is O(m) in time.

In conclusion, this implementation is a straightforward and effective way of achieving the desired functionality using Python's string manipulation capabilities and list comprehension.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which technique can we use to find the middle of a linked list?

Example Walkthrough

Let's walk through the provided solution approach with a small example to better understand how it works. Suppose we have the following inputs:

  • words: ["apple,berry,cherry", "lemon##lime", "banana--melon"]
  • separator: "-"

We want our output to exclude the separator and not to include any empty strings that may come from split operation. Here's how the solution approach applies to our example:

Step 1: Iterating over each word We start by taking the first string in the words list which is "apple,berry,cherry". Since our separator is "-", this string does not get split as it does not contain the separator.

Step 2: Splitting each word We then move to the second string "lemon##lime". Again, this does not contain our separator, so it remains unsplit.

Step 3: Filtering out empty strings Next, we take the third string "banana--melon". Using the split('-') function on this string gives us ["banana", "", "melon"]. We find that there is an empty string present as a result of the split operation where two separators were adjacent.

Step 4: Applying the list comprehension As per the list comprehension [s for w in words for s in w.split(separator) if s], we construct a new list by including each non-empty substring. For our example, the list comprehension step would look like this:

1result = [s for w in ["apple,berry,cherry", "lemon##lime", "banana--melon"] for s in w.split("-") if s]

This will iterate over each word, split it by the separator, and only include the non-empty substrings. In this case, the first two words won't get split because they do not contain the separator. In the third word, "banana--melon", it will split into ["banana", "", "melon"] and exclude the empty string.

The final output array after applying the list comprehension to our example would therefore be:

1["apple,berry,cherry", "lemon##lime", "banana", "melon"]

Conclusion: The initial strings are split wherever the separator is encountered, and any strings that would have been empty after the split are removed from the output. The order of the elements is preserved from the original list to the output list.

Solution Implementation

1from typing import List
2
3class Solution:
4    def splitWordsBySeparator(self, words: List[str], separator: str) -> List[str]:
5        # Initialize an empty list to store the split words
6        split_words = []
7      
8        # Iterate through each word in the input list 'words'
9        for word in words:
10            # Split the word by the given separator
11            word_parts = word.split(separator)
12            # Append each non-empty part of the split word to 'split_words'
13            for part in word_parts:
14                if part:  # Ensure to add only non-empty strings
15                    split_words.append(part)
16      
17        # Return the list containing all split, non-empty words
18        return split_words
19
1import java.util.regex.Pattern; // Importing the Pattern class for regex operations.
2import java.util.ArrayList; // Importing the ArrayList class for dynamic array operations.
3import java.util.List; // Importing the List interface for using lists.
4
5// Class name should be capitalized and descriptive
6class WordSplitter {
7
8    // Method to split strings in a list by a given separator and return a list of substrings
9    public List<String> splitWordsBySeparator(List<String> words, char separator) {
10        // Creating a list to store the resulting substrings
11        List<String> splitWords = new ArrayList<>();
12      
13        // Iterating through each string in the list of words
14        for (String word : words) {
15            // Splitting the current word by the separator and escaping it if it's a special regex character
16            String[] parts = word.split(Pattern.quote(String.valueOf(separator)));
17          
18            // Adding each non-empty part to the result list
19            for (String part : parts) {
20                if (!part.isEmpty()) {
21                    splitWords.add(part);
22                }
23            }
24        }
25      
26        // Returning the list of split substrings
27        return splitWords;
28    }
29}
30
1#include <vector>
2#include <string>
3#include <sstream>
4
5class Solution {
6public:
7    // Splits words in the vector by a specified separator and
8    // returns a vector of the subsequently separated strings.
9    // Empty strings resulting from the split are not included in the results.
10    std::vector<std::string> splitWordsBySeparator(std::vector<std::string>& words, char separator) {
11        std::vector<std::string> results;
12      
13        // Iterate over each word in the input vector
14        for (auto& word : words) {
15            // Split the word by the provided separator
16            std::vector<std::string> splitParts = split(word, separator);
17          
18            // Add non-empty parts to the result vector
19            for (auto& part : splitParts) {
20                if (!part.empty()) {
21                    results.emplace_back(part);
22                }
23            }
24        }
25      
26        return results;
27    }
28
29    // Splits a string by a given delimiter and returns a vector of the parts.
30    std::vector<std::string> split(std::string& inputString, char delimiter) {
31        std::vector<std::string> parts;
32        std::stringstream stream(inputString);
33        std::string segment;
34      
35        // Use getline to split the string by the delimiter and add each part to the vector
36        while (getline(stream, segment, delimiter)) {
37            parts.push_back(segment);
38        }
39      
40        return parts;
41    }
42};
43
1// This function takes an array of strings and a separator, then splits each
2// string in the array by the separator, and returns a new array with the split segments.
3// Empty strings resulted from the split are excluded from the result.
4// 
5// @param words - An array of strings to be split.
6// @param separator - A string representing the separator to be used for splitting the words.
7// @returns An array of strings containing the non-empty segments after splitting.
8function splitWordsBySeparator(words: string[], separator: string): string[] {
9    // Initialize an array to store the result segments.
10    const splitWords: string[] = [];
11
12    // Loop through each word in the input array.
13    for (const word of words) {
14        // Split the current word by the given separator.
15        const splits = word.split(separator);
16
17        // Loop through the split segments of the current word.
18        for (const split of splits) {
19            // Add the non-empty segments to the result array.
20            if (split.length > 0) {
21                splitWords.push(split);
22            }
23        }
24    }
25
26    // Return the result array containing all non-empty segments.
27    return splitWords;
28}
29
Not Sure What to Study? Take the 2-min Quiz:

What's the output of running the following function using input [30, 20, 10, 100, 33, 12]?

1def fun(arr: List[int]) -> List[int]:
2    import heapq
3    heapq.heapify(arr)
4    res = []
5    for i in range(3):
6        res.append(heapq.heappop(arr))
7    return res
8
1public static int[] fun(int[] arr) {
2    int[] res = new int[3];
3    PriorityQueue<Integer> heap = new PriorityQueue<>();
4    for (int i = 0; i < arr.length; i++) {
5        heap.add(arr[i]);
6    }
7    for (int i = 0; i < 3; i++) {
8        res[i] = heap.poll();
9    }
10    return res;
11}
12
1class HeapItem {
2    constructor(item, priority = item) {
3        this.item = item;
4        this.priority = priority;
5    }
6}
7
8class MinHeap {
9    constructor() {
10        this.heap = [];
11    }
12
13    push(node) {
14        // insert the new node at the end of the heap array
15        this.heap.push(node);
16        // find the correct position for the new node
17        this.bubble_up();
18    }
19
20    bubble_up() {
21        let index = this.heap.length - 1;
22
23        while (index > 0) {
24            const element = this.heap[index];
25            const parentIndex = Math.floor((index - 1) / 2);
26            const parent = this.heap[parentIndex];
27
28            if (parent.priority <= element.priority) break;
29            // if the parent is bigger than the child then swap the parent and child
30            this.heap[index] = parent;
31            this.heap[parentIndex] = element;
32            index = parentIndex;
33        }
34    }
35
36    pop() {
37        const min = this.heap[0];
38        this.heap[0] = this.heap[this.size() - 1];
39        this.heap.pop();
40        this.bubble_down();
41        return min;
42    }
43
44    bubble_down() {
45        let index = 0;
46        let min = index;
47        const n = this.heap.length;
48
49        while (index < n) {
50            const left = 2 * index + 1;
51            const right = left + 1;
52
53            if (left < n && this.heap[left].priority < this.heap[min].priority) {
54                min = left;
55            }
56            if (right < n && this.heap[right].priority < this.heap[min].priority) {
57                min = right;
58            }
59            if (min === index) break;
60            [this.heap[min], this.heap[index]] = [this.heap[index], this.heap[min]];
61            index = min;
62        }
63    }
64
65    peek() {
66        return this.heap[0];
67    }
68
69    size() {
70        return this.heap.length;
71    }
72}
73
74function fun(arr) {
75    const heap = new MinHeap();
76    for (const x of arr) {
77        heap.push(new HeapItem(x));
78    }
79    const res = [];
80    for (let i = 0; i < 3; i++) {
81        res.push(heap.pop().item);
82    }
83    return res;
84}
85

Time and Space Complexity

Time Complexity

The time complexity of the function is determined by the number of operations it needs to perform in relation to the input size. Here, we consider both the number of words n and the average length of the words m.

The function consists of a list comprehension with a nested loop - for every word w in words, it performs a .split(separator) operation. The .split(separator) operation is O(m) where m is the length of the word being split. This is because .split() goes through each character in the word w to check if it matches the separator.

After splitting, the words are iterated over again to filter out empty strings. This operation can potentially include each character from the previous step, which keeps the complexity at the same level - O(m * n).

Therefore, the overall time complexity is O(m * n), where n is the number of words and m is the average length of those words.

Space Complexity

The space complexity of the function depends on the space required to store the output list.

In the worst case, if the separator is not present in any of the words, then the list comprehension will include all the original words. In this case, the space complexity is equal to the total number of characters in all words (plus the space required for list overhead), which is O(m * n).

However, if the separator is present, then additional strings will be added to the output for each split occurrence. In the absolute worst case, if every character is a separator, the size of the output list could be up to twice the number of characters (since you get an empty string before and after each character), which leads to a space complexity of O(2 * m * n). Since constant factors are dropped in Big O notation, this also simplifies to O(m * n).

So, the space complexity is also O(m * n) under the assumption that the separator can be arbitrarily often in the words.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Consider the classic dynamic programming of longest increasing subsequence:

Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.

For example, the length of LIS for [50, 3, 10, 7, 40, 80] is 4 and LIS is [3, 7, 40, 80].

What is the recurrence relation?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫