Amazon Online Assessment (OA) - Top K Frequently Mentioned Keywords

Find the keywords that are most frequently mentioned in a list of snippets. Return the top k keywords from the most frequent to the least frequently mentioned, ignoring case sensitivity.

A mention is defined as a word that appears at least once in a snippets, ie. if it appears more than once in a snippet it only counts once. Sort alphabetically if multiple keywords are mentioned the same number of times.

Input

k: a number

keywords: a list of words

snippets: a list of snippets containing a single contiguous paragraph each

Output

A list of words sorted from most frequenly mentioned to least frequently mentioned.

Examples

Example 1:

Input:

k = 2

keywords = ["gatsby", "american", "novel"]
snippets = [
  "Classic. Yes. The great American novel. Hmph, so I heard.",
  "Most American high school students are assigned to read this novel.",
  "The Great Gatsby is often described as a paean to the Great American Dream",
]

Output: ["american", "novel"]

Explanation:

american appears 3 times and novel appears 2 times.

Examples

Example 2:

Input:

k = 2

keywords = ["gatsby", "american", "novel"]
snippets = [
  "The opening of The Great Gatsby -- its first 3-4 pages -- ranks among the best of any novel in the English language.",
  "It is masterful, some may say the great American novel.",
  "The Great Gatsby is a 1925 novel written by American author F. Scott Fitzgerald",
]

Output: ["novel", "american"]

Explanation:

novel appears 3 times, but both american and gatsby appear twice. However, we must only return one of those, and american appears first alphabetically.

Try it yourself

Solution

1
-
from typing import List
1
+
from heapq import heappop, heappush
2
+
import re
3
+
from typing import Counter, List
4
+
5
+
class Down:
6
+
    def __init__(self, value):
7
+
        self.value = value
8
+
9
+
    def __lt__(self, other):
10
+
        return self.value > other.value
11
+
2
12
def top_mentioned(k: int, keywords: List[str], reviews: List[str]) -> List[str]:
3
-
    # WRITE YOUR BRILLIANT CODE HERE
13
+
    patt = re.compile(r'\b(:?{})\b'.format('|'.join(keywords)), flags=re.IGNORECASE)
4
-
    return []
14
+
    counts = Counter(
15
+
        word
16
+
        for review in reviews
17
+
        for word in set(match[0].lower() for match in patt.finditer(review))
18
+
    )
19
+
    queue = []
20
+
    for word, count in counts.items():
21
+
        heappush(queue, (count, Down(word)))
22
+
        if len(queue) > k:
23
+
            heappop(queue)
24
+
    return [heappop(queue)[1].value for _ in range(k)][::-1]
25
+
5
26
if __name__ == '__main__':
6
27
    k = int(input())
7
28
    keywords = input().split()
8
29
    reviews = [input() for _ in range(int(input()))]
9
30
    res = top_mentioned(k, keywords, reviews)
10
31
    print(' '.join(res))