Amazon Online Assessment (OA) 2021 - Most Common Word with Exclusion List | HackerRank SHL

Find the most frequently used word, which is not listed in ignored_keywords, within a paragraph. A paragraph is a single line of words that may contain punctuation marks, mixed with uppercase and lowercase letters. The word comparison should not be case sensitive, and the output word is expected to be in lowercase.

Examples

Example 1:

Input:

paragraph = "If this book was written today in the midst of the slew of dystopian novels that come out, it may not have stood out. But, this book was way ahead of its time." ignored_keywords = ["of", "was", "the"]

Output: "book"
Explanation:

"of" appears three times and "was", "the" appear twice, but they are in the ignored_keywords list. The next most common word is "book", which appears twice.

Constraints

There is always at least one word in the paragraph and at least one word in the list of excluded keywords. The most common keyword frequency count that is not in ignored_keywords will always be unique. Keywords in the exclusion list only consist of lowercase alphabetical characters.

Try it yourself

Solution

1import re
2from typing import Counter, List
3
4def most_common_word(paragraph: str, banned: List[str]) -> str:
5    ban = set(banned)
6    counts = Counter(
7        word
8        for m in re.finditer(r'\w+', paragraph)
9        for word in [m[0].lower()]
10        if word not in ban
11    )
12    [(word, _)] = counts.most_common(1)
13    return word
14
15if __name__ == '__main__':
16    paragraph = input()
17    banned = input().split()
18    res = most_common_word(paragraph, banned)
19    print(res)
20