1152. Analyze User Website Visit Pattern


Problem Description

In this problem, we are given the browsing history of multiple users stored in three arrays: username, timestamp, and website. Each index i in these arrays represents a single browsing event, where username[i] is the user who visited website[i] at time timestamp[i]. The goal is to find a sequence of three websites, which we will refer to as a "pattern," that is most popular among the users. A pattern here means any three websites visited in a specific order, not necessarily different. The popularity or "score" of a pattern is determined by counting how many unique users visited the websites in the same order as the pattern. The task is to return the pattern with the highest score, and in case of a tie, return the lexicographically smallest pattern.

Intuition

The solution to this problem involves several steps. First, we need to organize the browsing events for each user in chronological order. This is essential because we are interested in patterns which depend on the sequence of website visits.

To accomplish this, we start by combining the given arrays into a single list of tuples, each consisting of a username, a timestamp, and a website. We then sort this list by timestamp to ensure that each user's visits are in the correct order.

With the data sorted, we can now create a mapping from each user to their list of visited websites in chronological order. This is done using a dictionary, where each key is a username, and the corresponding value is the list of websites they visited.

We then calculate the score for every possible pattern. To ensure we count each pattern only once per user, we use a set to collect all unique patterns visited by a user. After processing all users' data, we will have the total count of unique users for each pattern across all users.

Finally, we sort the patterns by their score in descending order and lexicographically. The first pattern in this sorted list is our answer, as it has the highest score, and in the case of a tie, it is the lexicographically smallest among those with the highest scores.

The provided solution performs these steps and correctly identifies the most visited pattern as required.

Solution Approach

The solution uses a combination of sorting, dictionary usage, sets, and the Counter class from the collections module in Python to implement an efficient approach toward finding the most visited pattern. Here's a step-by-step breakdown:

  1. Sorting by Timestamp: First, we create a list of tuples with the username, timestamp, and website. We sort this list based on the timestamp to ensure that we consider the websites in the order they were visited by each user. This is achieved by the expression:

    1sorted(zip(username, timestamp, website), key=lambda x: x[1])
  2. Mapping Users to Websites: We use a defaultdict of lists from the collections module to map each user to their visited websites in chronological order. This data structure is useful for aggregating the list of website visits per user without having to check if the user key already exists in the dictionary. We do this by:

    1d = defaultdict(list)
    2for user, _, site in sorted_events:
    3    d[user].append(site)
  3. Generating Patterns: As we're interested in patterns of three websites, we iterate through each user's websites to generate all possible patterns using a nested loop structure. Since we want unique patterns per user, a set named s is used to store these combinations without duplicates:

    1s = set()
    2for i in range(m - 2):
    3    for j in range(i + 1, m - 1):
    4        for k in range(j + 1, m):
    5            s.add((sites[i], sites[j], sites[k]))
  4. Counting Patterns' Scores: We use a Counter object to keep track of how many times each pattern occurs across all users. For each user, we add the patterns from the set s to the counter, which increments the count for each pattern:

    1cnt = Counter()
    2for t in s:
    3    cnt[t] += 1
  5. Sorting and Selecting the Top Pattern: Finally, we sort the items in the counter to find the most visited pattern. The sort criteria are the pattern's score in descending order and the pattern itself in lexicographic order for tie-breaking. The top pattern is then selected as the answer:

    1sorted(cnt.items(), key=lambda x: (-x[1], x[0]))[0][0]

The combination of sorting, dictionary and set usage to eliminate duplicates, and a frequency counter (Counter class) makes this approach efficient and effective for solving the problem.

Example Walkthrough

Let's illustrate the solution approach with a small example.

Suppose we have the following input data:

  • username: ["Jane", "Jane", "Jane", "Alex", "Alex", "Alex"]
  • timestamp: [1, 2, 3, 4, 5, 6]
  • website: ["A", "B", "C", "A", "B", "D"]

Following the algorithm's steps:

  1. Sorting by Timestamp: We combine username, timestamp, and website into a list of tuples and sort it. The sorted result will be:

    1[("Jane", 1, "A"), ("Jane", 2, "B"), ("Jane", 3, "C"), ("Alex", 4, "A"), ("Alex", 5, "B"), ("Alex", 6, "D")]
  2. Mapping Users to Websites: We create a dictionary (defaultdict) to map each user to the list of websites they visited in order:

    1d["Jane"] = ["A", "B", "C"]
    2d["Alex"] = ["A", "B", "D"]
  3. Generating Patterns: We generate patterns for each user. For "Jane", the pattern would be:

    1s["Jane"] = {("A", "B", "C")}

    And for "Alex":

    1s["Alex"] = {("A", "B", "D")}

    Note that both users have the pattern ("A", "B") in common but have different third websites visited.

  4. Counting Patterns' Scores: We count the occurrences of patterns across all users. For this example:

    1cnt[("A", "B", "C")] = 1
    2cnt[("A", "B", "D")] = 1

    Each pattern has been visited by one unique user.

  5. Sorting and Selecting the Top Pattern: We sort patterns based on their scores and lexicographically. After sorting:

    1patterns_sorted = [("A", "B", "C"), ("A", "B", "D")]

    Since the scores are the same, we pick the lexicographically smallest pattern, which is ("A", "B", "C").

Hence, the most visited pattern in this example would be ("A", "B", "C"), as it is the lexicographically smallest among the highest scoring patterns.

Python Solution

1from collections import defaultdict, Counter
2from typing import List
3
4class Solution:
5    def mostVisitedPattern(
6        self, usernames: List[str], timestamps: List[int], websites: List[str]
7    ) -> List[str]:
8        # Create a dictionary to store the sites visited by each user
9        users_visits = defaultdict(list)
10        # Sort the data by timestamp and group websites by username
11        for user, _, site in sorted(zip(usernames, timestamps, websites), key=lambda x: x[1]):
12            users_visits[user].append(site)
13
14        # Counter for tracking the frequency of each 3-sequence pattern
15        patterns_count = Counter()
16
17        # Iterate through each user's visited sites
18        for sites in users_visits.values():
19            number_of_sites = len(sites)
20            unique_patterns = set()  # set to store unique 3-sequence patterns
21            if number_of_sites > 2:  # Check if user has visited more than 2 sites
22                # Generate all possible 3-sequence combinations
23                for i in range(number_of_sites - 2):
24                    for j in range(i + 1, number_of_sites - 1):
25                        for k in range(j + 1, number_of_sites):
26                            unique_patterns.add((sites[i], sites[j], sites[k]))
27          
28            # Update the count of each unique pattern
29            for pattern in unique_patterns:
30                patterns_count[pattern] += 1
31      
32        # Sort the patterns first by frequency (descending) and then lexicographically, and return the most common pattern
33        return sorted(patterns_count.items(), key=lambda x: (-x[1], x[0]))[0][0]
34
35# The logic of the method remains unchanged.
36# The method name is kept the same as per the requirement.
37# Standard naming convention is followed for variables such as `usernames`, `timestamps`, `websites`, `users_visits`, `patterns_count`, and `number_of_sites`.
38# Comments in English are added to explain each block of code.
39

Java Solution

1import java.util.*;
2
3// Definition of Node as a custom data structure to hold user visit information.
4class Node {
5    String user;
6    int timestamp;
7    String website;
8
9    Node(String user, int timestamp, String website) {
10        this.user = user;
11        this.timestamp = timestamp;
12        this.website = website;
13    }
14}
15
16class Solution {
17    public List<String> mostVisitedPattern(String[] usernames, int[] timestamps, String[] websites) {
18        // Map to hold data for each user and their list of Node objects (timestamps and websites visited).
19        Map<String, List<Node>> userData = new HashMap<>();
20        int visitCount = usernames.length; // Total number of website visits
21
22        // Constructing the user data map from usernames, timestamps, and websites.
23        for (int i = 0; i < visitCount; ++i) {
24            String user = usernames[i];
25            int ts = timestamps[i];
26            String site = websites[i];
27            userData.computeIfAbsent(user, k -> new ArrayList<>()).add(new Node(user, ts, site));
28        }
29
30        // Map to hold the count of each unique 3-sequence pattern.
31        Map<String, Integer> patternFrequency = new HashMap<>();
32
33        // Process each user's site visit history to calculate the pattern frequency.
34        for (List<Node> sites : userData.values()) {
35            int visitSize = sites.size();
36            Set<String> sequences = new HashSet<>();
37
38            // Check if there are at least 3 sites visited to form a valid pattern.
39            if (visitSize > 2) {
40                // Sort the user's visit by timestamp
41                Collections.sort(sites, (a, b) -> a.timestamp - b.timestamp);
42
43                // Iterate through combinations to form unique 3-sequence keys for this user
44                for (int i = 0; i < visitSize - 2; ++i) {
45                    for (int j = i + 1; j < visitSize - 1; ++j) {
46                        for (int k = j + 1; k < visitSize; ++k) {
47                            sequences.add(sites.get(i).website + "," +
48                                          sites.get(j).website + "," +
49                                          sites.get(k).website);
50                        }
51                    }
52                }
53            }
54
55            // Count frequency of each 3-sequence pattern.
56            for (String seq : sequences) {
57                patternFrequency.put(seq, patternFrequency.getOrDefault(seq, 0) + 1);
58            }
59        }
60
61        // Variables to track the maximum frequency and the corresponding pattern.
62        int maxFrequency = 0;
63        String topPattern = "";
64
65        // Iterate over the pattern frequencies to determine the most visited pattern.
66        for (Map.Entry<String, Integer> entry : patternFrequency.entrySet()) {
67            // Compare the count or lexicographical order if counts are equal.
68            if (maxFrequency < entry.getValue() || 
69                (maxFrequency == entry.getValue() && entry.getKey().compareTo(topPattern) < 0)) {
70                maxFrequency = entry.getValue();
71                topPattern = entry.getKey();
72            }
73        }
74
75        // Return the top pattern as a list of sites.
76        return Arrays.asList(topPattern.split(","));
77    }
78}
79

C++ Solution

1#include <vector>
2#include <string>
3#include <unordered_map>
4#include <unordered_set>
5#include <sstream>
6#include <algorithm>
7
8class Solution {
9public:
10    // The function returns the 3-sequence visited by the largest number of users.
11    vector<string> mostVisitedPattern(vector<string>& usernames, vector<int>& timestamps, vector<string>& websites) {
12        // Dictionary to store user's website visits mapped with timestamps.
13        unordered_map<string, vector<pair<int, string>>> userVisits;
14        int visitsCount = usernames.size();
15
16        // Collecting the websites each user has visited along with timestamps.
17        for (int i = 0; i < visitsCount; ++i) {
18            string user = usernames[i];
19            int timestamp = timestamps[i];
20            string website = websites[i];
21            userVisits[user].emplace_back(timestamp, website);
22        }
23
24        // Dictionary to count the frequency of each 3-sequence.
25        unordered_map<string, int> sequenceCount;
26
27        // Analysing each user's visits to find all 3-sequences.
28        for (auto& pair : userVisits) {
29            vector<pair<int, string>>& sites = pair.second;
30            unordered_set<string> uniqueSequences;
31
32            // Ensure each user has at least 3 visits to form a 3-sequence.
33            if (sites.size() > 2) {
34                // Sort the websites for each user based on timestamp.
35                sort(sites.begin(), sites.end());
36
37                // Generate all possible 3-sequences.
38                for (int i = 0; i < sites.size() - 2; ++i) {
39                    for (int j = i + 1; j < sites.size() - 1; ++j) {
40                        for (int k = j + 1; k < sites.size(); ++k) {
41                            uniqueSequences.insert(sites[i].second + "," + sites[j].second + "," + sites[k].second);
42                        }
43                    }
44                }
45            }
46
47            // For each unique 3-sequence, count their occurrences.
48            for (auto& sequence : uniqueSequences) {
49                sequenceCount[sequence]++;
50            }
51        }
52
53        int maxCount = 0;
54        string maxSequence;
55
56        // Find the 3-sequence with the highest frequency.
57        for (auto& pair : sequenceCount) {
58            string sequence = pair.first;
59            int count = pair.second;
60
61            // Choose the lexicographically smallest sequence in case of a tie.
62            if (count > maxCount || (count == maxCount && sequence < maxSequence)) {
63                maxCount = count;
64                maxSequence = sequence;
65            }
66        }
67
68        // Split the sequence into individual websites.
69        return split(maxSequence, ',');
70    }
71
72    // Utility function to split the sequence into website names based on commas.
73    vector<string> split(string& sequence, char delimiter) {
74        vector<string> result;
75        stringstream sequenceStream(sequence);
76        string item;
77      
78        // Extract each website name into the result vector.
79        while (getline(sequenceStream, item, delimiter)) {
80            result.push_back(item);
81        }
82        return result;
83    }
84};
85

Typescript Solution

1type Timestamp = number;
2type Website = string;
3type Username = string;
4type Visit = [Timestamp, Website];
5type SequenceCount = { [sequence: string]: number };
6
7// Dictionary to store user's website visits mapped by timestamps.
8let userVisits: { [username: string]: Visit[] } = {};
9
10// Dictionary to count the frequency of each 3-sequence.
11let sequenceCount: SequenceCount = {};
12
13// The function returns the 3-sequence visited by the largest number of users.
14function mostVisitedPattern(usernames: Username[], timestamps: Timestamp[], websites: Website[]): Website[] {
15    // Initialize the user visits dictionary.
16    userVisits = {};
17    let visitsCount = usernames.length;
18
19    // Collecting the websites each user has visited along with timestamps.
20    for (let i = 0; i < visitsCount; ++i) {
21        const user = usernames[i];
22        const timestamp = timestamps[i];
23        const website = websites[i];
24
25        if (!userVisits[user]) {
26            userVisits[user] = [];
27        }
28        userVisits[user].push([timestamp, website]);
29    }
30
31    // Initialize the sequence count dictionary.
32    sequenceCount = {};
33
34    // Analyzing each user's visits to find all 3-sequences.
35    Object.keys(userVisits).forEach(user => {
36        const sites = userVisits[user];
37        const uniqueSequences = new Set<string>();
38
39        // Ensure each user has at least 3 visits to form a 3-sequence.
40        if (sites.length > 2) {
41            // Sort the websites for each user based on timestamp.
42            sites.sort((a, b) => a[0] - b[0]);
43
44            // Generate all possible 3-sequences.
45            for (let i = 0; i < sites.length - 2; ++i) {
46                for (let j = i + 1; j < sites.length - 1; ++j) {
47                    for (let k = j + 1; k < sites.length; ++k) {
48                        uniqueSequences.add(`${sites[i][1]},${sites[j][1]},${sites[k][1]}`);
49                    }
50                }
51            }
52        }
53
54        // For each unique 3-sequence, count their occurrences.
55        uniqueSequences.forEach(sequence => {
56            if (!sequenceCount[sequence]) {
57                sequenceCount[sequence] = 0;
58            }
59            sequenceCount[sequence]++;
60        });
61    });
62
63    let maxCount = 0;
64    let maxSequence = "";
65
66    // Find the 3-sequence with the highest frequency.
67    Object.keys(sequenceCount).forEach(sequence => {
68        const count = sequenceCount[sequence];
69
70        // Choose the lexicographically smallest sequence in case of a tie.
71        if (count > maxCount || (count === maxCount && sequence < maxSequence)) {
72            maxCount = count;
73            maxSequence = sequence;
74        }
75    });
76
77    // Split the sequence into individual websites.
78    return split(maxSequence, ',');
79}
80
81// Utility function to split the sequence into website names based on the delimiter.
82function split(sequence: string, delimiter: string): string[] {
83    return sequence.split(delimiter);
84}
85

Time and Space Complexity

Time Complexity

The time complexity can be split into a few different parts:

  • Sorting the combined list of (username, timestamp, website) by timestamp. If we say n is the number of events, this operation has a complexity of O(n log n).
  • Constructing the dictionary (d). The for-loop runs through each of the n events once, so this step has a complexity of O(n).
  • Constructing the 3-sequence set and counting. For each user, we generate all possible combinations of website visit sequences of length 3. In the worst case, every user visits m websites, and the number of possible combinations is O((m choose 3)), which is O(m^3/6) when expanded. Summing this up for all users, if u is the number of users, this step would have complexity of O(u * m^3/6). However, since m = n in the worst case (a single user visiting all websites), it simplifies to O(u * n^3/6).
  • Sorting the count dictionary cnt items which, in the worst case, can have as many as O(n^3/6) unique sequences. The complexity of this sort is O((n^3/6) log (n^3/6)), which simplifies to O(n^3 log n).

Considering all steps, the overall time complexity would be dominated by O(n log n) + O(n) + O(u * n^3/6) + O(n^3 log n) which simplifies to O(u * n^3/6) + O(n^3 log n) due to the higher powers of n.

Space Complexity

For space complexity, we consider the data structures used:

  • The dictionary d with the list of sites for each user. In the worst case, this can store up to n elements, giving O(n).
  • The set s which, in the worst case, can contain O(n^3/6) different sequences per user (again considering that a user visits all websites). Therefore, across all users, this could potentially be O(u * n^3/6).
  • The counter cnt, which will hold the same number of unique sequences as in set s, so it also has a possible space complexity of O(u * n^3/6).

Combining these observations, the overall space complexity would also be dictated by the counter and set, giving an upper bound of O(u * n^3/6). If u is not significantly large compared to n, we can approximate the space complexity to O(n^3) to reflect the worst-case scenario.


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫