1152. Analyze User Website Visit Pattern
Problem Description
In this problem, we are given the browsing history of multiple users stored in three arrays: username
, timestamp
, and website
. Each index i
in these arrays represents a single browsing event, where username[i]
is the user who visited website[i]
at time timestamp[i]
. The goal is to find a sequence of three websites, which we will refer to as a "pattern," that is most popular among the users. A pattern here means any three websites visited in a specific order, not necessarily different. The popularity or "score" of a pattern is determined by counting how many unique users visited the websites in the same order as the pattern. The task is to return the pattern with the highest score, and in case of a tie, return the lexicographically smallest pattern.
Intuition
The solution to this problem involves several steps. First, we need to organize the browsing events for each user in chronological order. This is essential because we are interested in patterns which depend on the sequence of website visits.
To accomplish this, we start by combining the given arrays into a single list of tuples, each consisting of a username, a timestamp, and a website. We then sort this list by timestamp to ensure that each user's visits are in the correct order.
With the data sorted, we can now create a mapping from each user to their list of visited websites in chronological order. This is done using a dictionary, where each key is a username, and the corresponding value is the list of websites they visited.
We then calculate the score for every possible pattern. To ensure we count each pattern only once per user, we use a set to collect all unique patterns visited by a user. After processing all users' data, we will have the total count of unique users for each pattern across all users.
Finally, we sort the patterns by their score in descending order and lexicographically. The first pattern in this sorted list is our answer, as it has the highest score, and in the case of a tie, it is the lexicographically smallest among those with the highest scores.
The provided solution performs these steps and correctly identifies the most visited pattern as required.
Learn more about Sorting patterns.
Solution Approach
The solution uses a combination of sorting, dictionary usage, sets, and the Counter class from the collections
module in Python to implement an efficient approach toward finding the most visited pattern. Here's a step-by-step breakdown:
-
Sorting by Timestamp: First, we create a list of tuples with the
username
,timestamp
, andwebsite
. We sort this list based on thetimestamp
to ensure that we consider the websites in the order they were visited by each user. This is achieved by the expression:sorted(zip(username, timestamp, website), key=lambda x: x[1])
-
Mapping Users to Websites: We use a defaultdict of lists from the
collections
module to map each user to their visited websites in chronological order. This data structure is useful for aggregating the list of website visits per user without having to check if the user key already exists in the dictionary. We do this by:d = defaultdict(list) for user, _, site in sorted_events: d[user].append(site)
-
Generating Patterns: As we're interested in patterns of three websites, we iterate through each user's websites to generate all possible patterns using a nested loop structure. Since we want unique patterns per user, a set named
s
is used to store these combinations without duplicates:s = set() for i in range(m - 2): for j in range(i + 1, m - 1): for k in range(j + 1, m): s.add((sites[i], sites[j], sites[k]))
-
Counting Patterns' Scores: We use a Counter object to keep track of how many times each pattern occurs across all users. For each user, we add the patterns from the set
s
to the counter, which increments the count for each pattern:cnt = Counter() for t in s: cnt[t] += 1
-
Sorting and Selecting the Top Pattern: Finally, we sort the items in the counter to find the most visited pattern. The sort criteria are the pattern's score in descending order and the pattern itself in lexicographic order for tie-breaking. The top pattern is then selected as the answer:
sorted(cnt.items(), key=lambda x: (-x[1], x[0]))[0][0]
The combination of sorting, dictionary and set usage to eliminate duplicates, and a frequency counter (Counter class) makes this approach efficient and effective for solving the problem.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's illustrate the solution approach with a small example.
Suppose we have the following input data:
username
: ["Jane", "Jane", "Jane", "Alex", "Alex", "Alex"]timestamp
: [1, 2, 3, 4, 5, 6]website
: ["A", "B", "C", "A", "B", "D"]
Following the algorithm's steps:
-
Sorting by Timestamp: We combine
username
,timestamp
, andwebsite
into a list of tuples and sort it. The sorted result will be:[("Jane", 1, "A"), ("Jane", 2, "B"), ("Jane", 3, "C"), ("Alex", 4, "A"), ("Alex", 5, "B"), ("Alex", 6, "D")]
-
Mapping Users to Websites: We create a dictionary (
defaultdict
) to map each user to the list of websites they visited in order:d["Jane"] = ["A", "B", "C"] d["Alex"] = ["A", "B", "D"]
-
Generating Patterns: We generate patterns for each user. For "Jane", the pattern would be:
s["Jane"] = {("A", "B", "C")}
And for "Alex":
s["Alex"] = {("A", "B", "D")}
Note that both users have the pattern ("A", "B") in common but have different third websites visited.
-
Counting Patterns' Scores: We count the occurrences of patterns across all users. For this example:
cnt[("A", "B", "C")] = 1 cnt[("A", "B", "D")] = 1
Each pattern has been visited by one unique user.
-
Sorting and Selecting the Top Pattern: We sort patterns based on their scores and lexicographically. After sorting:
patterns_sorted = [("A", "B", "C"), ("A", "B", "D")]
Since the scores are the same, we pick the lexicographically smallest pattern, which is
("A", "B", "C")
.
Hence, the most visited pattern in this example would be ("A", "B", "C")
, as it is the lexicographically smallest among the highest scoring patterns.
Solution Implementation
1from collections import defaultdict, Counter
2from typing import List
3
4class Solution:
5 def mostVisitedPattern(
6 self, usernames: List[str], timestamps: List[int], websites: List[str]
7 ) -> List[str]:
8 # Create a dictionary to store the sites visited by each user
9 users_visits = defaultdict(list)
10 # Sort the data by timestamp and group websites by username
11 for user, _, site in sorted(zip(usernames, timestamps, websites), key=lambda x: x[1]):
12 users_visits[user].append(site)
13
14 # Counter for tracking the frequency of each 3-sequence pattern
15 patterns_count = Counter()
16
17 # Iterate through each user's visited sites
18 for sites in users_visits.values():
19 number_of_sites = len(sites)
20 unique_patterns = set() # set to store unique 3-sequence patterns
21 if number_of_sites > 2: # Check if user has visited more than 2 sites
22 # Generate all possible 3-sequence combinations
23 for i in range(number_of_sites - 2):
24 for j in range(i + 1, number_of_sites - 1):
25 for k in range(j + 1, number_of_sites):
26 unique_patterns.add((sites[i], sites[j], sites[k]))
27
28 # Update the count of each unique pattern
29 for pattern in unique_patterns:
30 patterns_count[pattern] += 1
31
32 # Sort the patterns first by frequency (descending) and then lexicographically, and return the most common pattern
33 return sorted(patterns_count.items(), key=lambda x: (-x[1], x[0]))[0][0]
34
35# The logic of the method remains unchanged.
36# The method name is kept the same as per the requirement.
37# Standard naming convention is followed for variables such as `usernames`, `timestamps`, `websites`, `users_visits`, `patterns_count`, and `number_of_sites`.
38# Comments in English are added to explain each block of code.
39
1import java.util.*;
2
3// Definition of Node as a custom data structure to hold user visit information.
4class Node {
5 String user;
6 int timestamp;
7 String website;
8
9 Node(String user, int timestamp, String website) {
10 this.user = user;
11 this.timestamp = timestamp;
12 this.website = website;
13 }
14}
15
16class Solution {
17 public List<String> mostVisitedPattern(String[] usernames, int[] timestamps, String[] websites) {
18 // Map to hold data for each user and their list of Node objects (timestamps and websites visited).
19 Map<String, List<Node>> userData = new HashMap<>();
20 int visitCount = usernames.length; // Total number of website visits
21
22 // Constructing the user data map from usernames, timestamps, and websites.
23 for (int i = 0; i < visitCount; ++i) {
24 String user = usernames[i];
25 int ts = timestamps[i];
26 String site = websites[i];
27 userData.computeIfAbsent(user, k -> new ArrayList<>()).add(new Node(user, ts, site));
28 }
29
30 // Map to hold the count of each unique 3-sequence pattern.
31 Map<String, Integer> patternFrequency = new HashMap<>();
32
33 // Process each user's site visit history to calculate the pattern frequency.
34 for (List<Node> sites : userData.values()) {
35 int visitSize = sites.size();
36 Set<String> sequences = new HashSet<>();
37
38 // Check if there are at least 3 sites visited to form a valid pattern.
39 if (visitSize > 2) {
40 // Sort the user's visit by timestamp
41 Collections.sort(sites, (a, b) -> a.timestamp - b.timestamp);
42
43 // Iterate through combinations to form unique 3-sequence keys for this user
44 for (int i = 0; i < visitSize - 2; ++i) {
45 for (int j = i + 1; j < visitSize - 1; ++j) {
46 for (int k = j + 1; k < visitSize; ++k) {
47 sequences.add(sites.get(i).website + "," +
48 sites.get(j).website + "," +
49 sites.get(k).website);
50 }
51 }
52 }
53 }
54
55 // Count frequency of each 3-sequence pattern.
56 for (String seq : sequences) {
57 patternFrequency.put(seq, patternFrequency.getOrDefault(seq, 0) + 1);
58 }
59 }
60
61 // Variables to track the maximum frequency and the corresponding pattern.
62 int maxFrequency = 0;
63 String topPattern = "";
64
65 // Iterate over the pattern frequencies to determine the most visited pattern.
66 for (Map.Entry<String, Integer> entry : patternFrequency.entrySet()) {
67 // Compare the count or lexicographical order if counts are equal.
68 if (maxFrequency < entry.getValue() ||
69 (maxFrequency == entry.getValue() && entry.getKey().compareTo(topPattern) < 0)) {
70 maxFrequency = entry.getValue();
71 topPattern = entry.getKey();
72 }
73 }
74
75 // Return the top pattern as a list of sites.
76 return Arrays.asList(topPattern.split(","));
77 }
78}
79
1#include <vector>
2#include <string>
3#include <unordered_map>
4#include <unordered_set>
5#include <sstream>
6#include <algorithm>
7
8class Solution {
9public:
10 // The function returns the 3-sequence visited by the largest number of users.
11 vector<string> mostVisitedPattern(vector<string>& usernames, vector<int>& timestamps, vector<string>& websites) {
12 // Dictionary to store user's website visits mapped with timestamps.
13 unordered_map<string, vector<pair<int, string>>> userVisits;
14 int visitsCount = usernames.size();
15
16 // Collecting the websites each user has visited along with timestamps.
17 for (int i = 0; i < visitsCount; ++i) {
18 string user = usernames[i];
19 int timestamp = timestamps[i];
20 string website = websites[i];
21 userVisits[user].emplace_back(timestamp, website);
22 }
23
24 // Dictionary to count the frequency of each 3-sequence.
25 unordered_map<string, int> sequenceCount;
26
27 // Analysing each user's visits to find all 3-sequences.
28 for (auto& pair : userVisits) {
29 vector<pair<int, string>>& sites = pair.second;
30 unordered_set<string> uniqueSequences;
31
32 // Ensure each user has at least 3 visits to form a 3-sequence.
33 if (sites.size() > 2) {
34 // Sort the websites for each user based on timestamp.
35 sort(sites.begin(), sites.end());
36
37 // Generate all possible 3-sequences.
38 for (int i = 0; i < sites.size() - 2; ++i) {
39 for (int j = i + 1; j < sites.size() - 1; ++j) {
40 for (int k = j + 1; k < sites.size(); ++k) {
41 uniqueSequences.insert(sites[i].second + "," + sites[j].second + "," + sites[k].second);
42 }
43 }
44 }
45 }
46
47 // For each unique 3-sequence, count their occurrences.
48 for (auto& sequence : uniqueSequences) {
49 sequenceCount[sequence]++;
50 }
51 }
52
53 int maxCount = 0;
54 string maxSequence;
55
56 // Find the 3-sequence with the highest frequency.
57 for (auto& pair : sequenceCount) {
58 string sequence = pair.first;
59 int count = pair.second;
60
61 // Choose the lexicographically smallest sequence in case of a tie.
62 if (count > maxCount || (count == maxCount && sequence < maxSequence)) {
63 maxCount = count;
64 maxSequence = sequence;
65 }
66 }
67
68 // Split the sequence into individual websites.
69 return split(maxSequence, ',');
70 }
71
72 // Utility function to split the sequence into website names based on commas.
73 vector<string> split(string& sequence, char delimiter) {
74 vector<string> result;
75 stringstream sequenceStream(sequence);
76 string item;
77
78 // Extract each website name into the result vector.
79 while (getline(sequenceStream, item, delimiter)) {
80 result.push_back(item);
81 }
82 return result;
83 }
84};
85
1type Timestamp = number;
2type Website = string;
3type Username = string;
4type Visit = [Timestamp, Website];
5type SequenceCount = { [sequence: string]: number };
6
7// Dictionary to store user's website visits mapped by timestamps.
8let userVisits: { [username: string]: Visit[] } = {};
9
10// Dictionary to count the frequency of each 3-sequence.
11let sequenceCount: SequenceCount = {};
12
13// The function returns the 3-sequence visited by the largest number of users.
14function mostVisitedPattern(usernames: Username[], timestamps: Timestamp[], websites: Website[]): Website[] {
15 // Initialize the user visits dictionary.
16 userVisits = {};
17 let visitsCount = usernames.length;
18
19 // Collecting the websites each user has visited along with timestamps.
20 for (let i = 0; i < visitsCount; ++i) {
21 const user = usernames[i];
22 const timestamp = timestamps[i];
23 const website = websites[i];
24
25 if (!userVisits[user]) {
26 userVisits[user] = [];
27 }
28 userVisits[user].push([timestamp, website]);
29 }
30
31 // Initialize the sequence count dictionary.
32 sequenceCount = {};
33
34 // Analyzing each user's visits to find all 3-sequences.
35 Object.keys(userVisits).forEach(user => {
36 const sites = userVisits[user];
37 const uniqueSequences = new Set<string>();
38
39 // Ensure each user has at least 3 visits to form a 3-sequence.
40 if (sites.length > 2) {
41 // Sort the websites for each user based on timestamp.
42 sites.sort((a, b) => a[0] - b[0]);
43
44 // Generate all possible 3-sequences.
45 for (let i = 0; i < sites.length - 2; ++i) {
46 for (let j = i + 1; j < sites.length - 1; ++j) {
47 for (let k = j + 1; k < sites.length; ++k) {
48 uniqueSequences.add(`${sites[i][1]},${sites[j][1]},${sites[k][1]}`);
49 }
50 }
51 }
52 }
53
54 // For each unique 3-sequence, count their occurrences.
55 uniqueSequences.forEach(sequence => {
56 if (!sequenceCount[sequence]) {
57 sequenceCount[sequence] = 0;
58 }
59 sequenceCount[sequence]++;
60 });
61 });
62
63 let maxCount = 0;
64 let maxSequence = "";
65
66 // Find the 3-sequence with the highest frequency.
67 Object.keys(sequenceCount).forEach(sequence => {
68 const count = sequenceCount[sequence];
69
70 // Choose the lexicographically smallest sequence in case of a tie.
71 if (count > maxCount || (count === maxCount && sequence < maxSequence)) {
72 maxCount = count;
73 maxSequence = sequence;
74 }
75 });
76
77 // Split the sequence into individual websites.
78 return split(maxSequence, ',');
79}
80
81// Utility function to split the sequence into website names based on the delimiter.
82function split(sequence: string, delimiter: string): string[] {
83 return sequence.split(delimiter);
84}
85
Time and Space Complexity
Time Complexity
The time complexity can be split into a few different parts:
- Sorting the combined list of
(username, timestamp, website)
bytimestamp
. If we sayn
is the number of events, this operation has a complexity ofO(n log n)
. - Constructing the dictionary (
d
). The for-loop runs through each of then
events once, so this step has a complexity ofO(n)
. - Constructing the 3-sequence set and counting. For each user, we generate all possible combinations of website visit sequences of length 3. In the worst case, every user visits
m
websites, and the number of possible combinations isO((m choose 3))
, which isO(m^3/6)
when expanded. Summing this up for all users, ifu
is the number of users, this step would have complexity ofO(u * m^3/6)
. However, sincem = n
in the worst case (a single user visiting all websites), it simplifies toO(u * n^3/6)
. - Sorting the count dictionary
cnt
items which, in the worst case, can have as many asO(n^3/6)
unique sequences. The complexity of this sort isO((n^3/6) log (n^3/6))
, which simplifies toO(n^3 log n)
.
Considering all steps, the overall time complexity would be dominated by O(n log n) + O(n) + O(u * n^3/6) + O(n^3 log n)
which simplifies to O(u * n^3/6) + O(n^3 log n)
due to the higher powers of n
.
Space Complexity
For space complexity, we consider the data structures used:
- The dictionary
d
with the list of sites for each user. In the worst case, this can store up ton
elements, givingO(n)
. - The set
s
which, in the worst case, can containO(n^3/6)
different sequences per user (again considering that a user visits all websites). Therefore, across all users, this could potentially beO(u * n^3/6)
. - The counter
cnt
, which will hold the same number of unique sequences as in sets
, so it also has a possible space complexity ofO(u * n^3/6)
.
Combining these observations, the overall space complexity would also be dictated by the counter and set, giving an upper bound of O(u * n^3/6)
. If u
is not significantly large compared to n
, we can approximate the space complexity to O(n^3)
to reflect the worst-case scenario.
Learn more about how to find time and space complexity quickly using problem constraints.
Consider the classic dynamic programming of longest increasing subsequence:
Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.
For example, the length of LIS for [50, 3, 10, 7, 40, 80]
is 4
and LIS is
[3, 7, 40, 80]
.
What is the recurrence relation?
Recommended Readings
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!