2047. Number of Valid Words in a Sentence

EasyString
Leetcode Link

Problem Description

The problem presents a scenario where we have a sentence composed of several tokens (words or punctuation) which can contain lowercase letters, hyphens, punctuation marks, and digits. However, not every token qualifies as a valid word. A token is only considered valid if it meets specific criteria:

  1. The token can only have lowercase letters, hyphens (-), and/or punctuation marks (no digits).
  2. If there's a hyphen, it should occur only once within the token and must be flanked by lowercase letters on both sides (meaning it cannot be at the start or end of a token, and cannot have spaces or punctuation around it).
  3. Punctuation marks can appear at most once per token and must be at the end of the token (so no punctuation in the middle or start of the token, and no multiple punctuation marks).

Lastly, the goal of the problem is to count how many of these valid tokens exist in the given sentence.

Intuition

To solve this problem, we need a method that can validate individual tokens according to the rules outlined. This calls for checking each token against all three conditions for a valid word.

The steps we can follow for this approach are:

  1. Split the sentence into individual tokens based on spaces, because spaces are the delimiters that separate tokens within the sentence.
  2. For each token, check whether it meets the criteria of a valid word:
    • It has no digits.
    • The hyphen placement (if present) conforms to the rules.
    • The punctuation placement (if present) conforms to the rules.

We only need to increment our count of valid words when a token passes all these checks.

The given Python code defines a function check that encapsulates the validation logic for a single token. This helper function processes each character within the token, enforcing the rules by returning False if any are violated, indicating the token is not a valid word.

After defining the check function, the primary logic of the countValidWords function is to split the input sentence into tokens and sum up the number of tokens that are valid. This is achieved with a generator expression that applies check to each token, with the sum function tallying the number of True results (True being equivalent to 1 in Python), hence giving the count of valid words.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

A person thinks of a number between 1 and 1000. You may ask any number questions to them, provided that the question can be answered with either "yes" or "no".

What is the minimum number of questions you needed to ask so that you are guaranteed to know the number that the person is thinking?

Solution Approach

The solution employs a straightforward approach that focuses on string manipulation and validation rules based on the criteria given for what constitutes a valid word within a sentence.

The algorithm can be broken down into the following steps:

  1. Tokenize the sentence: The first step involves taking the input sentence and splitting it into a list of tokens using Python's str.split() method. This method by default splits the string based on whitespace, providing us with individual tokens to validate.

  2. Define the validation logic: The check function is a custom function defined to encapsulate the validation logic for each individual token obtained from the sentence. This function iterates over each character within the token using a for loop combined with the enumerate function, which also provides the index for each character.

  3. Implement validation rules:

    • No Digits Allowed: If any character in the token is a digit, the check function immediately returns False, indicating that the token is not valid.
    • Hyphen Constraints: If a hyphen is encountered, several conditions are checked:
      • There must not be another hyphen in the token (hyphen flag is used to check this).
      • The hyphen cannot be at the start or the end of the token (i == 0 or i == len(token) - 1).
      • The character before and after the hyphen must both be lowercase letters. This is validated using islower() method on the token's characters at indexes i - 1 and i + 1.
    • Punctuation Constraints: Punctuation ('!', '.', ',') is only allowed at the end of the token. If any such punctuation character is found not at the last index (i < len(token) - 1), False is returned signaling invalidity.
  4. Count Valid Words: The countValidWords function uses a generator expression sum(check(token) for token in sentence.split()) which applies the check function to each token and sums up True (considered as 1 in Python) outcomes. Each True outcome signifies a valid token, hence the sum gives us the total number of valid words in the sentence.

Overall, the solution is elegant due to its simplicity and Python's powerful string manipulation capabilities. No special data structures are required, as everything is handled with built-in operations on strings and collections.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which of these pictures shows the visit order of a depth-first search?

Example Walkthrough

Let’s apply the solution approach to a small example sentence:

1"cat. 1dog- are you-sure? end."
  1. Tokenize the sentence: The sentence is split into the following tokens based on whitespace:
1["cat.", "1dog-", "are", "you-sure?", "end."]
  1. Define the validation logic: For each token, we apply the check custom function to determine if it's valid.

  2. Implement validation rules: We check each token:

  • "cat." is a valid token because:

    • There are no digits.
    • There is no hyphen.
    • Punctuation (the period) is at the end.
  • "1dog-" is invalid because:

    • There is a digit ('1').
    • The hyphen is at the end (should be flanked by lowercase characters).
  • "are" is a valid token because:

    • There are only lowercase letters.
  • "you-sure?" is invalid because:

    • Even though the hyphen is correctly placed between lowercase letters, punctuation ('?') is not at the end due to the presence of 'e' after it.
  • "end." is a valid token because:

    • There are no digits.
    • There is no hyphen.
    • Punctuation (the period) is at the end.
  1. Count Valid Words: Applying the check function to each token, we conclude that there are three valid tokens: ["cat.", "are", "end."]. Thus, the sum(check(token) for token in sentence.split()) will result in 3, indicating the sentence contains three valid words.

By following these steps, we can iterate over any sentence and effectively count the number of valid words present according to the specified rules.

Solution Implementation

1class Solution:
2    def count_valid_words(self, sentence: str) -> int:
3        """Counts the number of valid words in a sentence according to specific rules."""
4      
5        # Define a nested function to check if a single token/word is valid
6        def is_valid(token):
7            has_hyphen = False  # Used to ensure only one hyphen is present
8            for index, char in enumerate(token):
9                # Check that there are no digits and punctuation comes only at the end
10                if char.isdigit() or (char in '!.,' and index < len(token) - 1):
11                    return False
12                # Check for valid hyphen use
13                if char == '-':
14                    # Invalid if there's already a hyphen, or it's at the start or end of the token,
15                    # or if it's not surrounded by lowercase letters
16                    if (
17                        has_hyphen
18                        or index == 0
19                        or index == len(token) - 1
20                        or not token[index - 1].islower()
21                        or not token[index + 1].islower()
22                    ):
23                        return False
24                    has_hyphen = True  # Set flag to true if a hyphen is found
25            return True  # If all conditions are met, the token is valid
26
27        # Split the sentence into tokens and count how many are valid
28        return sum(is_valid(token) for token in sentence.split())
29
30# Example usage:
31# sol = Solution()
32# result = sol.count_valid_words("Example sentence with -correctly placed hyphens, numbers like 4 or digits.")
33# print(result)  # Prints the count of valid words according to the rules provided
34
1class Solution {
2    public int countValidWords(String sentence) {
3        int validWordCount = 0;
4
5        // Split the input sentence into tokens by spaces
6        for (String token : sentence.split(" ")) {
7            // If the token is a valid word, increment the count
8            if (isValidWord(token)) {
9                ++validWordCount;
10            }
11        }
12
13        return validWordCount;
14    }
15
16    /**
17     * Check if the given token is a valid word.
18     *
19     * @param token the token to check
20     * @return true if the token is a valid word, false otherwise
21     */
22    private boolean isValidWord(String token) {
23        int length = token.length();
24
25        // Empty tokens are not valid
26        if (length == 0) {
27            return false;
28        }
29
30        // To track if a hyphen has occurred already
31        boolean hasHyphenOccurred = false;
32
33        for (int i = 0; i < length; ++i) {
34            char character = token.charAt(i);
35
36            // If the character is a digit or a punctuation mark is not at the end, it's invalid
37            if (Character.isDigit(character) || (i < length - 1 && (character == '!' || character == '.' || character == ','))) {
38                return false;
39            }
40
41            // Check the rules for hyphen appearance in the token
42            if (character == '-') {
43                // If there's already a hyphen, or it's the first/last character, or it's not surrounded by letters, it's invalid
44                if (hasHyphenOccurred || i == 0 || i == length - 1 || !Character.isLetter(token.charAt(i - 1))
45                        || !Character.isLetter(token.charAt(i + 1))) {
46                    return false;
47                }
48                // Mark that a hyphen has occurred
49                hasHyphenOccurred = true;
50            }
51        }
52
53        // If all checks passed, the token is a valid word
54        return true;
55    }
56}
57
1#include <iostream>
2#include <string>
3#include <regex>
4#include <vector>
5
6// Function declarations
7bool isValid(const std::string& str);
8std::vector<std::string> split(const std::string& str, const std::regex& re);
9
10// Counts the number of valid words in a sentence.
11int countValidWords(const std::string& sentence) {
12    // Use regex to trim whitespace from the ends and separate by one or more spaces.
13    std::regex re("\\s+");
14    std::vector<std::string> words = split(sentence, re);
15
16    int validWordCount = 0;
17
18    // Iterate through each word to check its validity.
19    for (const std::string& word : words) {
20        if (isValid(word)) {
21            validWordCount++; // Increment the count if the word is valid.
22        }
23    }
24
25    return validWordCount; // Return the total number of valid words.
26}
27
28// Checks if the given string is a valid word according to the specified rules.
29bool isValid(const std::string& str) {
30    const size_t length = str.length();
31    bool hasHyphen = false; // Flag to indicate the presence of a hyphen.
32
33    // Iterate over each character of the string.
34    for (size_t i = 0; i < length; ++i) {
35        const char& ch = str[i];
36
37        // Check for digits.
38        if (isdigit(ch)) {
39            return false; // If a digit is found, the word is invalid.
40        }
41
42        // Check for hyphens.
43        if (ch == '-') {
44            if (hasHyphen) return false; // Invalid if more than one hyphen is found.
45            hasHyphen = true; // Set the flag if a hyphen is found.
46
47            // Check the characters before and after the hyphen to ensure they are lowercase letters.
48            if (i == 0 || i == length - 1 || !islower(str[i - 1]) || !islower(str[i + 1])) {
49                return false; // Invalid if characters adjacent to hyphen are not lowercase letters.
50            }
51        }
52
53        // Check for punctuation (excluding hyphens). The punctuation must be at the end of the word to be valid.
54        if (std::ispunct(ch) && ch != '-' && i != length - 1) {
55            return false; // Invalid if punctuation occurs before the end of the word.
56        }
57    }
58    return true; // If all checks pass, the word is valid.
59}
60
61// Splits a string based on the given regex separator into a vector of strings.
62std::vector<std::string> split(const std::string& str, const std::regex& re) {
63    std::sregex_token_iterator it(str.begin(), str.end(), re, -1);
64    std::sregex_token_iterator reg_end;
65    return {it, reg_end};
66}
67
68int main() {
69    std::string sentence = "This is a test sentence - with some, punctuation. And numbers 123.";
70    int count = countValidWords(sentence);
71    std::cout << "The sentence contains " << count << " valid words." << std::endl;
72    return 0;
73}
74
1// Counts the number of valid words in a sentence.
2function countValidWords(sentence: string): number {
3    // Split the sentence into words, trimming whitespace from the ends and separating by one or more spaces.
4    const words = sentence.trim().split(/\s+/);
5    let validWordCount = 0;
6  
7    // Iterate through each word to check its validity.
8    for (const word of words) {
9        if (isValid(word)) {
10            validWordCount++; // Increment the count if the word is valid.
11        }
12    }
13
14    return validWordCount; // Return the total number of valid words.
15}
16
17// Checks if the given string is a valid word according to the specified rules.
18function isValid(str: string): boolean {
19    const length = str.length;
20    let hasHyphen = false; // Flag to indicate the presence of a hyphen.
21
22    // Iterate over each character of the string.
23    for (let i = 0; i < length; i++) {
24        const char = str.charAt(i);
25
26        // Check for digits.
27        if (/^[0-9]$/.test(char)) {
28            return false; // If a digit is found, the word is invalid.
29        }
30
31        // Check for hyphens.
32        if (char === '-') {
33            if (hasHyphen) return false; // Invalid if more than one hyphen is found.
34            hasHyphen = true; // Set the flag if a hyphen is found.
35
36            // Check the characters before and after the hyphen to ensure they are lowercase letters.
37            const prevChar = str.charAt(i - 1);
38            const nextChar = str.charAt(i + 1);
39            if (!/^[a-z]$/.test(prevChar) || !/^[a-z]$/.test(nextChar)) {
40                return false; // Invalid if characters adjacent to hyphen are not lowercase letters.
41            }
42        }
43
44        // Check for punctuation (excluding hyphens). The punctuation must be at the end of the word to be valid.
45        if (/^[\!\.\,]$/.test(char) && i !== length - 1) {
46            return false; // Invalid if punctuation occurs before the end of the word.
47        }
48    }
49    return true; // If all checks pass, the word is valid.
50}
51
Not Sure What to Study? Take the 2-min Quiz:

Which of the following is a min heap?

Time and Space Complexity

The given Python code defines a function countValidWords which takes a string and returns the number of valid tokens within the string. The validity of each token is determined by a set of rules implemented in the helper function check.

Time Complexity

The time complexity of the function is determined by several factors:

  1. Splitting the sentence: sentence.split() takes O(n) time, where n is the length of the input string since it iterates over the entire string to split it into tokens.

  2. Iteration over each token: The for loop in the check function iterates over each character of a token. In the worst case, all characters are part of a single token, resulting in O(n) time complexity. However, because split would already divide the sentence into t tokens, this iteration happens t times. Therefore, if m is the length of the longest token, the complexity of checking all tokens is O(t * m).

  3. The check function is applied to each token resulting from the split. Assuming that there are t tokens and the longest token has a length of m, the cumulative time complexity is then O(t * m). However, since t * m is bound by n (the length of the input string), the time complexity of this operation is effectively O(n).

Therefore, the total time complexity of the countValidWords function is O(n), where n is the length of the input string, as all operations are linear with respect to the length of the string.

Space Complexity

The space complexity is determined by the storage required for the tokens and the internal state during the execution:

  1. Splitting the sentence into tokens creates a list of strings. The space taken by this list is proportional to the number of tokens and their lengths. However, since this storage does not exceed the total size of the original string, the space complexity contributed by the token list is O(n).

  2. The check function does not require any additional significant space that scales with the input, using only a few boolean variables and indexes.

  3. There are no other data structures in use that grow with the size of the input.

Given these considerations, the space complexity of the function is O(n), where n is the length of the input string.

In conclusion, the time complexity of the code is O(n) and the space complexity is also O(n), where n is the length of the input string.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Breadth first search can be used to find the shortest path between two nodes in a directed graph.


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫