2047. Number of Valid Words in a Sentence
Problem Description
The problem presents a scenario where we have a sentence composed of several tokens (words or punctuation) which can contain lowercase letters, hyphens, punctuation marks, and digits. However, not every token qualifies as a valid word. A token is only considered valid if it meets specific criteria:
- The token can only have lowercase letters, hyphens (-), and/or punctuation marks (no digits).
- If there's a hyphen, it should occur only once within the token and must be flanked by lowercase letters on both sides (meaning it cannot be at the start or end of a token, and cannot have spaces or punctuation around it).
- Punctuation marks can appear at most once per token and must be at the end of the token (so no punctuation in the middle or start of the token, and no multiple punctuation marks).
Lastly, the goal of the problem is to count how many of these valid tokens exist in the given sentence.
Intuition
To solve this problem, we need a method that can validate individual tokens according to the rules outlined. This calls for checking each token against all three conditions for a valid word.
The steps we can follow for this approach are:
- Split the sentence into individual tokens based on spaces, because spaces are the delimiters that separate tokens within the sentence.
- For each token, check whether it meets the criteria of a valid word:
- It has no digits.
- The hyphen placement (if present) conforms to the rules.
- The punctuation placement (if present) conforms to the rules.
We only need to increment our count of valid words when a token passes all these checks.
The given Python code defines a function check
that encapsulates the validation logic for a single token. This helper function processes each character within the token, enforcing the rules by returning False
if any are violated, indicating the token is not a valid word.
After defining the check
function, the primary logic of the countValidWords
function is to split the input sentence into tokens and sum up the number of tokens that are valid. This is achieved with a generator expression that applies check
to each token, with the sum
function tallying the number of True
results (True
being equivalent to 1
in Python), hence giving the count of valid words.
Solution Approach
The solution employs a straightforward approach that focuses on string manipulation and validation rules based on the criteria given for what constitutes a valid word within a sentence.
The algorithm can be broken down into the following steps:
-
Tokenize the sentence: The first step involves taking the input
sentence
and splitting it into a list of tokens using Python'sstr.split()
method. This method by default splits the string based on whitespace, providing us with individual tokens to validate. -
Define the validation logic: The
check
function is a custom function defined to encapsulate the validation logic for each individual token obtained from the sentence. This function iterates over each character within the token using afor
loop combined with theenumerate
function, which also provides the index for each character. -
Implement validation rules:
- No Digits Allowed: If any character in the token is a digit, the
check
function immediately returnsFalse
, indicating that the token is not valid. - Hyphen Constraints: If a hyphen is encountered, several conditions are checked:
- There must not be another hyphen in the token (
hyphen
flag is used to check this). - The hyphen cannot be at the start or the end of the token (
i == 0
ori == len(token) - 1
). - The character before and after the hyphen must both be lowercase letters. This is validated using
islower()
method on the token's characters at indexesi - 1
andi + 1
.
- There must not be another hyphen in the token (
- Punctuation Constraints: Punctuation (
'!', '.', ','
) is only allowed at the end of the token. If any such punctuation character is found not at the last index (i < len(token) - 1
),False
is returned signaling invalidity.
- No Digits Allowed: If any character in the token is a digit, the
-
Count Valid Words: The
countValidWords
function uses a generator expressionsum(check(token) for token in sentence.split())
which applies thecheck
function to each token and sums upTrue
(considered as1
in Python) outcomes. EachTrue
outcome signifies a valid token, hence the sum gives us the total number of valid words in the sentence.
Overall, the solution is elegant due to its simplicity and Python's powerful string manipulation capabilities. No special data structures are required, as everything is handled with built-in operations on strings and collections.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Letās apply the solution approach to a small example sentence:
"cat. 1dog- are you-sure? end."
- Tokenize the sentence: The sentence is split into the following tokens based on whitespace:
["cat.", "1dog-", "are", "you-sure?", "end."]
-
Define the validation logic: For each token, we apply the
check
custom function to determine if it's valid. -
Implement validation rules: We check each token:
-
"cat." is a valid token because:
- There are no digits.
- There is no hyphen.
- Punctuation (the period) is at the end.
-
"1dog-" is invalid because:
- There is a digit ('1').
- The hyphen is at the end (should be flanked by lowercase characters).
-
"are" is a valid token because:
- There are only lowercase letters.
-
"you-sure?" is invalid because:
- Even though the hyphen is correctly placed between lowercase letters, punctuation ('?') is not at the end due to the presence of 'e' after it.
-
"end." is a valid token because:
- There are no digits.
- There is no hyphen.
- Punctuation (the period) is at the end.
- Count Valid Words: Applying the
check
function to each token, we conclude that there are three valid tokens: ["cat.", "are", "end."]. Thus, thesum(check(token) for token in sentence.split())
will result in3
, indicating the sentence contains three valid words.
By following these steps, we can iterate over any sentence and effectively count the number of valid words present according to the specified rules.
Solution Implementation
1class Solution:
2 def count_valid_words(self, sentence: str) -> int:
3 """Counts the number of valid words in a sentence according to specific rules."""
4
5 # Define a nested function to check if a single token/word is valid
6 def is_valid(token):
7 has_hyphen = False # Used to ensure only one hyphen is present
8 for index, char in enumerate(token):
9 # Check that there are no digits and punctuation comes only at the end
10 if char.isdigit() or (char in '!.,' and index < len(token) - 1):
11 return False
12 # Check for valid hyphen use
13 if char == '-':
14 # Invalid if there's already a hyphen, or it's at the start or end of the token,
15 # or if it's not surrounded by lowercase letters
16 if (
17 has_hyphen
18 or index == 0
19 or index == len(token) - 1
20 or not token[index - 1].islower()
21 or not token[index + 1].islower()
22 ):
23 return False
24 has_hyphen = True # Set flag to true if a hyphen is found
25 return True # If all conditions are met, the token is valid
26
27 # Split the sentence into tokens and count how many are valid
28 return sum(is_valid(token) for token in sentence.split())
29
30# Example usage:
31# sol = Solution()
32# result = sol.count_valid_words("Example sentence with -correctly placed hyphens, numbers like 4 or digits.")
33# print(result) # Prints the count of valid words according to the rules provided
34
1class Solution {
2 public int countValidWords(String sentence) {
3 int validWordCount = 0;
4
5 // Split the input sentence into tokens by spaces
6 for (String token : sentence.split(" ")) {
7 // If the token is a valid word, increment the count
8 if (isValidWord(token)) {
9 ++validWordCount;
10 }
11 }
12
13 return validWordCount;
14 }
15
16 /**
17 * Check if the given token is a valid word.
18 *
19 * @param token the token to check
20 * @return true if the token is a valid word, false otherwise
21 */
22 private boolean isValidWord(String token) {
23 int length = token.length();
24
25 // Empty tokens are not valid
26 if (length == 0) {
27 return false;
28 }
29
30 // To track if a hyphen has occurred already
31 boolean hasHyphenOccurred = false;
32
33 for (int i = 0; i < length; ++i) {
34 char character = token.charAt(i);
35
36 // If the character is a digit or a punctuation mark is not at the end, it's invalid
37 if (Character.isDigit(character) || (i < length - 1 && (character == '!' || character == '.' || character == ','))) {
38 return false;
39 }
40
41 // Check the rules for hyphen appearance in the token
42 if (character == '-') {
43 // If there's already a hyphen, or it's the first/last character, or it's not surrounded by letters, it's invalid
44 if (hasHyphenOccurred || i == 0 || i == length - 1 || !Character.isLetter(token.charAt(i - 1))
45 || !Character.isLetter(token.charAt(i + 1))) {
46 return false;
47 }
48 // Mark that a hyphen has occurred
49 hasHyphenOccurred = true;
50 }
51 }
52
53 // If all checks passed, the token is a valid word
54 return true;
55 }
56}
57
1#include <iostream>
2#include <string>
3#include <regex>
4#include <vector>
5
6// Function declarations
7bool isValid(const std::string& str);
8std::vector<std::string> split(const std::string& str, const std::regex& re);
9
10// Counts the number of valid words in a sentence.
11int countValidWords(const std::string& sentence) {
12 // Use regex to trim whitespace from the ends and separate by one or more spaces.
13 std::regex re("\\s+");
14 std::vector<std::string> words = split(sentence, re);
15
16 int validWordCount = 0;
17
18 // Iterate through each word to check its validity.
19 for (const std::string& word : words) {
20 if (isValid(word)) {
21 validWordCount++; // Increment the count if the word is valid.
22 }
23 }
24
25 return validWordCount; // Return the total number of valid words.
26}
27
28// Checks if the given string is a valid word according to the specified rules.
29bool isValid(const std::string& str) {
30 const size_t length = str.length();
31 bool hasHyphen = false; // Flag to indicate the presence of a hyphen.
32
33 // Iterate over each character of the string.
34 for (size_t i = 0; i < length; ++i) {
35 const char& ch = str[i];
36
37 // Check for digits.
38 if (isdigit(ch)) {
39 return false; // If a digit is found, the word is invalid.
40 }
41
42 // Check for hyphens.
43 if (ch == '-') {
44 if (hasHyphen) return false; // Invalid if more than one hyphen is found.
45 hasHyphen = true; // Set the flag if a hyphen is found.
46
47 // Check the characters before and after the hyphen to ensure they are lowercase letters.
48 if (i == 0 || i == length - 1 || !islower(str[i - 1]) || !islower(str[i + 1])) {
49 return false; // Invalid if characters adjacent to hyphen are not lowercase letters.
50 }
51 }
52
53 // Check for punctuation (excluding hyphens). The punctuation must be at the end of the word to be valid.
54 if (std::ispunct(ch) && ch != '-' && i != length - 1) {
55 return false; // Invalid if punctuation occurs before the end of the word.
56 }
57 }
58 return true; // If all checks pass, the word is valid.
59}
60
61// Splits a string based on the given regex separator into a vector of strings.
62std::vector<std::string> split(const std::string& str, const std::regex& re) {
63 std::sregex_token_iterator it(str.begin(), str.end(), re, -1);
64 std::sregex_token_iterator reg_end;
65 return {it, reg_end};
66}
67
68int main() {
69 std::string sentence = "This is a test sentence - with some, punctuation. And numbers 123.";
70 int count = countValidWords(sentence);
71 std::cout << "The sentence contains " << count << " valid words." << std::endl;
72 return 0;
73}
74
1// Counts the number of valid words in a sentence.
2function countValidWords(sentence: string): number {
3 // Split the sentence into words, trimming whitespace from the ends and separating by one or more spaces.
4 const words = sentence.trim().split(/\s+/);
5 let validWordCount = 0;
6
7 // Iterate through each word to check its validity.
8 for (const word of words) {
9 if (isValid(word)) {
10 validWordCount++; // Increment the count if the word is valid.
11 }
12 }
13
14 return validWordCount; // Return the total number of valid words.
15}
16
17// Checks if the given string is a valid word according to the specified rules.
18function isValid(str: string): boolean {
19 const length = str.length;
20 let hasHyphen = false; // Flag to indicate the presence of a hyphen.
21
22 // Iterate over each character of the string.
23 for (let i = 0; i < length; i++) {
24 const char = str.charAt(i);
25
26 // Check for digits.
27 if (/^[0-9]$/.test(char)) {
28 return false; // If a digit is found, the word is invalid.
29 }
30
31 // Check for hyphens.
32 if (char === '-') {
33 if (hasHyphen) return false; // Invalid if more than one hyphen is found.
34 hasHyphen = true; // Set the flag if a hyphen is found.
35
36 // Check the characters before and after the hyphen to ensure they are lowercase letters.
37 const prevChar = str.charAt(i - 1);
38 const nextChar = str.charAt(i + 1);
39 if (!/^[a-z]$/.test(prevChar) || !/^[a-z]$/.test(nextChar)) {
40 return false; // Invalid if characters adjacent to hyphen are not lowercase letters.
41 }
42 }
43
44 // Check for punctuation (excluding hyphens). The punctuation must be at the end of the word to be valid.
45 if (/^[\!\.\,]$/.test(char) && i !== length - 1) {
46 return false; // Invalid if punctuation occurs before the end of the word.
47 }
48 }
49 return true; // If all checks pass, the word is valid.
50}
51
Time and Space Complexity
The given Python code defines a function countValidWords
which takes a string and returns the number of valid tokens within the string. The validity of each token is determined by a set of rules implemented in the helper function check
.
Time Complexity
The time complexity of the function is determined by several factors:
-
Splitting the sentence:
sentence.split()
takesO(n)
time, wheren
is the length of the input string since it iterates over the entire string to split it into tokens. -
Iteration over each token: The
for
loop in thecheck
function iterates over each character of a token. In the worst case, all characters are part of a single token, resulting inO(n)
time complexity. However, becausesplit
would already divide the sentence intot
tokens, this iteration happenst
times. Therefore, ifm
is the length of the longest token, the complexity of checking all tokens isO(t * m)
. -
The
check
function is applied to each token resulting from the split. Assuming that there aret
tokens and the longest token has a length ofm
, the cumulative time complexity is thenO(t * m)
. However, sincet * m
is bound byn
(the length of the input string), the time complexity of this operation is effectivelyO(n)
.
Therefore, the total time complexity of the countValidWords
function is O(n)
, where n
is the length of the input string, as all operations are linear with respect to the length of the string.
Space Complexity
The space complexity is determined by the storage required for the tokens and the internal state during the execution:
-
Splitting the sentence into tokens creates a list of strings. The space taken by this list is proportional to the number of tokens and their lengths. However, since this storage does not exceed the total size of the original string, the space complexity contributed by the token list is
O(n)
. -
The
check
function does not require any additional significant space that scales with the input, using only a few boolean variables and indexes. -
There are no other data structures in use that grow with the size of the input.
Given these considerations, the space complexity of the function is O(n)
, where n
is the length of the input string.
In conclusion, the time complexity of the code is O(n)
and the space complexity is also O(n)
, where n
is the length of the input string.
Learn more about how to find time and space complexity quickly using problem constraints.
Which algorithm should you use to find a node that is close to the root of the tree?
Recommended Readings
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Donāt Miss This!