1500. Design a File Sharing System


Problem Description

The problem presents a scenario where a file-sharing system is used to distribute a large file split into m chunks, each identified by a unique integer ID from 1 to m. The system must handle user interactions with the file chunks and keep track of which users own which chunks. Users are assigned a unique, reusable ID when they join the system. When they leave, their ID can be reissued to new users. The main operations to implement in the system are:

  • Assigning a unique ID to users when they join, and considering the reuse of IDs from users who have left.
  • Removing user ID information and associated chunks when a user leaves.
  • Processing requests from users for file chunks, returning a list of IDs of the users who own the requested chunk, and updating chunk ownership if a chunk is acquired.

The goal is to implement the FileSharing class with these functionalities, making sure to address certain constraints about chunk and user IDs, and ensuring efficient operation under possible system use cases as indicated in the follow-up questions.

Intuition

To solve this problem, we create a class FileSharing with the following characteristics:

  1. Tracking User IDs: We keep a counter cur to track the latest assigned user ID and a min-heap reused to keep available IDs which can be reassigned once a user leaves.
  2. Chunk Ownership: A defaultdict(set) named user_chunks is used to map each user ID to a set of chunks they own.
  3. Joining the System (join): When a user joins the system, we check if there is any reusable ID in the reused heap; if so, we assign it; otherwise, we increment the cur counter to generate a new ID. Then we add the chunks the user owns to their set in user_chunks.
  4. Leaving the System (leave): When a user leaves, their user ID is added to the reused heap to indicate it's available for future users, and their entry in user_chunks is deleted to remove record of their chunk ownership.
  5. Requesting Chunks (request): Whenever a user requests a chunk, we iterate over user_chunks to find all users who own the requested chunk and return the list sorted. If the list is not empty, it means the requesting user successfully gets the chunk, so we add the chunk to the requester's set of owned chunks.

This approach ensures all operations — join, leave, and request — are performed efficiently. It will remain efficient unless the pattern of usage leads to extreme scenarios like immense heap sizes due to users joining and leaving without requests, or if the balance between join and leave calls becomes unevenly skewed. Follow-ups suggest tweaks for different scenarios, such as when users are identified by IP addresses or for a possible extension to multiple files, pointing to potential scalability issues or optimizations that might be required as system usage evolves.

Solution Approach

To implement the file sharing system described in the problem statement, the solution uses several Python-specific features and data structures effectively.

Data Structures Used:

  1. Min-heap (reused): Python's heapq module is utilized to create a min-heap which is used to manage the reusable user IDs efficiently. The heap property allows for constant time retrieval of the smallest element, which in this case is the smallest available user ID. This ensures IDs are reused in ascending order, fulfilling the requirement of assigning the smallest positive integer not taken by any other user.

  2. Default dictionary (user_chunks): A default dictionary of sets from the collections module is used where each key is the user ID and the value is the set of owned chunks. The set data structure is chosen to store chunk ownership because it allows for quick addition, deletion, and membership checks, and the default dictionary initializes new keys with empty sets without needing explicit checks.

Algorithms and Patterns:

  1. Initialization (__init__ Method): The class is initialized by setting the counter cur to zero, which will track the next unique user ID to be assigned, storing the total number of chunks chunks, preparing an empty min-heap reused for storing reusable user IDs, and preparing a default dictionary user_chunks to map user IDs to their owned chunks.

  2. Joining the System (join Method): To handle a new user joining, we first check if there are any reusable IDs in the heap:

    • If the reused heap is not empty, pop the smallest element from the heap to use as the new user ID.
    • If the reused heap is empty, increment the cur counter to get a new user ID. After acquiring a user ID, we create a new entry in user_chunks for the user and initialize their set of chunks with ownedChunks.
  3. Leaving the System (leave Method): When a user leaves, their user ID is pushed back into the reused heap to indicate that it can be reused by a future user. Their entry in user_chunks is also removed to clean up their owned chunks.

  4. Requesting Chunks (request Method): When a user requests a particular chunk, we must return a list of user IDs that own that chunk. We loop through the user_chunks dictionary to find all users owning the requested chunkID. If the requested chunk is owned by any users, we also add that chunk to the requester's set of owned chunks to indicate they now have it as well.

    • Check if requested chunkID is within valid range, return empty list if not.
    • Iterate through user_chunks to find all users having the requested chunkID.
    • If the returned list, res, has any owners in it, add chunkID to the requester's set of owned chunks.
    • Before returning, we sort the list res to ensure the ascending order of user IDs.

This approach ensures that no user IDs are wasted, the smallest possible ID is always assigned, and chunks ownership is kept updated dynamically in an efficient manner based on user activity.

Example Walkthrough

Let's go through a small example to illustrate how the FileSharing system might operate with users joining, requesting chunks, and leaving the system. Suppose we have a system with m = 3 chunks to distribute. Here's how the solution approach would handle different operations:

  1. Initialization: A FileSharing instance is created. The cur counter is 0, reused heap is empty, and user_chunks is an empty default dictionary.

  2. User A Joins:

    • join is called with A owning no chunks initially.
    • reused heap is empty, so cur is incremented from 0 to 1. User A is given ID 1.
    • user_chunks now has {1: set()} indicating user A with an empty set of chunks.
  3. User B Joins:

    • join is called with B owning chunk 2.
    • reused is still empty, cur is incremented to 2, and user B is given ID 2.
    • user_chunks is updated to {1: set(), 2: {2}} showing B owns chunk 2.
  4. User A Requests Chunk 2:

    • request is called with A requesting chunk 2.
    • User B is found as owning chunk 2, so the system returns [2].
    • Chunk 2 is added to A's set of chunks, so user_chunks now is {1: {2}, 2: {2}}.
  5. User C Joins:

    • join is called with C owning chunks 1 and 3.
    • No IDs available in reused, cur goes to 3. User C is given ID 3.
    • user_chunks is updated to {1: {2}, 2: {2}, 3: {1, 3}}.
  6. User B Leaves:

    • leave is called for user B.
    • B's ID (2) is pushed into the reused heap indicating it's now available.
    • B's entry is removed from user_chunks, which now is {1: {2}, 3: {1, 3}}.
  7. User D Joins:

    • join is called with D owning chunk 3.
    • Reused heap contains 2, so D is assigned this ID, and heap becomes empty again.
    • user_chunks is updated to {1: {2}, 2: {3}, 3: {1, 3}} as the user_chunks for user D contains chunk 3.

This example illustrates user activity involving all three system operations. It shows how the FileSharing system assigns user IDs efficiently, manages chunk ownership, handles user requests, and deals with users leaving while keeping its state coherent and allowing for IDs to be reused.

Python Solution

1from collections import defaultdict
2from heapq import heappush, heappop
3from typing import List
4
5class FileSharing:
6    def __init__(self, m: int):
7        self.current_user_id = 0  # Tracks the current maximum user ID
8        self.total_chunks = m  # Stores the total number of chunks
9        self.available_user_ids = []  # Min-heap of reusable user IDs
10        self.user_chunks_map = defaultdict(set)  # Maps user IDs to the set of owned chunks
11
12    def join(self, owned_chunks: List[int]) -> int:
13        # If there are available user IDs in the heap, reuse the smallest one
14        if self.available_user_ids:
15            user_id = heappop(self.available_user_ids)
16        else:
17            # If not, increment the current user ID and use it
18            self.current_user_id += 1
19            user_id = self.current_user_id
20        # Assign the owned chunks to this user ID
21        self.user_chunks_map[user_id] = set(owned_chunks)
22        return user_id
23
24    def leave(self, user_id: int) -> None:
25        # Add the user ID back to the heap for future reuse
26        heappush(self.available_user_ids, user_id)
27        # Remove this user's data from the mapping
28        del self.user_chunks_map[user_id]
29
30    def request(self, user_id: int, chunk_id: int) -> List[int]:
31        # If the chunk ID is invalid, return an empty list
32        if chunk_id < 1 or chunk_id > self.total_chunks:
33            return []
34      
35        users_with_chunk = []
36        # Iterate over the users to find who owns the requested chunk
37        for uid, chunks in self.user_chunks_map.items():
38            if chunk_id in chunks:
39                users_with_chunk.append(uid)
40              
41        # If the chunk is found with any user, add the chunk to the requester's list
42        if users_with_chunk:
43            self.user_chunks_map[user_id].add(chunk_id)
44        # Return the list of users, sorted, who have the chunk
45        return sorted(users_with_chunk)
46
47
48# Example of how to instantiate and interact with FileSharing class:
49# obj = FileSharing(m)
50# user_id = obj.join(owned_chunks)
51# obj.leave(user_id)
52# list_of_users = obj.request(user_id, chunk_id)
53

Java Solution

1import java.util.*;
2
3public class FileSharing {
4
5    private int maxChunks; // Represents the total number of chunks available
6    private int currentUserID; // Counter for the current highest userID assigned
7    private TreeSet<Integer> availableUserIDs; // Stores reusable user IDs
8    private TreeMap<Integer, Set<Integer>> userToChunksMap; // Maps a user to their owned chunks
9
10    // Constructor initializes the system with the maximum number of chunks available.
11    public FileSharing(int m) {
12        currentUserID = 0;
13        maxChunks = m;
14        availableUserIDs = new TreeSet<>();
15        userToChunksMap = new TreeMap<>();
16    }
17
18    // When a user joins, assign them an ID and set their owned chunks.
19    public int join(List<Integer> ownedChunks) {
20        int userID;
21        if (availableUserIDs.isEmpty()) {
22            ++currentUserID; // Increment the counter if no reusable ID is available
23            userID = currentUserID;
24        } else {
25            userID = availableUserIDs.pollFirst(); // Reuse an available ID
26        }
27        userToChunksMap.put(userID, new HashSet<>(ownedChunks)); // Associate the user with their chunks
28        return userID; // Return the new user ID
29    }
30
31    // When a user leaves, make their user ID available for reuse and remove their chunks.
32    public void leave(int userID) {
33        availableUserIDs.add(userID); // Add the userID to the set of reusable IDs
34        userToChunksMap.remove(userID); // Remove the user's chunk association
35    }
36
37    // When a user requests a chunk, return a list of user IDs that contain the chunk, 
38    // and add the chunk to the user's owned chunks if it exists.
39    public List<Integer> request(int userID, int chunkID) {
40        // Check for a valid chunkID
41        if (chunkID < 1 || chunkID > maxChunks) {
42            return Collections.emptyList(); // Return an empty list if the chunk ID is out of bounds
43        }
44
45        List<Integer> owners = new ArrayList<>(); // Create a list for storing owner IDs of the chunk
46        for (Map.Entry<Integer, Set<Integer>> entry : userToChunksMap.entrySet()) {
47            if (entry.getValue().contains(chunkID)) {
48                owners.add(entry.getKey()); // Add the user ID to the list if they own the chunk
49            }
50        }
51        // If at least one owner is found, add the chunk to the requesting user's owned chunks
52        if (!owners.isEmpty()) {
53            userToChunksMap.computeIfAbsent(userID, k -> new HashSet<>()).add(chunkID);
54        }
55        return owners; // Return the list of owners
56    }
57}
58
59// Usage example of the FileSharing class:
60/*
61FileSharing fileSharingSystem = new FileSharing(m);
62int userID = fileSharingSystem.join(ownedChunks);
63fileSharingSystem.leave(userID);
64List<Integer> ownerUserIDs = fileSharingSystem.request(userID, chunkID);
65*/
66

C++ Solution

1#include <vector>
2#include <set>
3#include <map>
4#include <algorithm>
5#include <iterator>
6
7class FileSharing {
8    private:
9        int maxChunks; // Represents the total number of chunks available
10        int currentUserId; // Counter for the current highest userID assigned
11        std::set<int> availableUserIds; // Stores reusable user IDs
12        std::map<int, std::set<int>> userToChunksMap; // Maps a user to their owned chunks
13
14    public:
15        // Constructor initializes the system with the maximum number of chunks available.
16        FileSharing(int m) : maxChunks(m), currentUserId(0) {}
17
18        // When a user joins, assign them an ID and set their owned chunks.
19        int join(const std::vector<int>& ownedChunks) {
20            int userId;
21            if (availableUserIds.empty()) {
22                // Increment the counter if no reusable ID is available
23                ++currentUserId;
24                userId = currentUserId;
25            } else {
26                // Reuse an available ID
27                userId = *availableUserIds.begin();
28                // Erase the used ID from the set
29                availableUserIds.erase(availableUserIds.begin());
30            }
31            // Associate the user with their chunks
32            userToChunksMap[userId] = std::set<int>(ownedChunks.begin(), ownedChunks.end());
33            return userId; // Return the new user ID
34        }
35
36        // When a user leaves, make their user ID available for reuse and remove their chunks.
37        void leave(int userId) {
38            // Add the userID to the set of reusable IDs
39            availableUserIds.insert(userId);
40            // Remove the user's chunk association
41            userToChunksMap.erase(userId);
42        }
43
44        // When a user requests a chunk, return a list of user IDs that contain the chunk,
45        // and add the chunk to the user's owned chunks if it exists.
46        std::vector<int> request(int userId, int chunkId) {
47            // Check for a valid chunkID
48            std::vector<int> owners;
49            if (chunkId < 1 || chunkId > maxChunks) {
50                // Return an empty vector if the chunk ID is out of bounds
51                return owners;
52            }
53
54            for (const auto& entry : userToChunksMap) {
55                if (entry.second.count(chunkId)) {
56                    // Add the user ID to the vector if they own the chunk
57                    owners.push_back(entry.first);
58                }
59            }
60            // If at least one owner is found, add the chunk to the requesting user's owned chunks
61            if (!owners.empty()) {
62                userToChunksMap[userId].insert(chunkId);
63            }
64            return owners; // Return the vector of owners
65        }
66};
67
68// Usage example of the FileSharing class:
69/*
70FileSharing fileSharingSystem(m);
71int userId = fileSharingSystem.join(ownedChunks);
72fileSharingSystem.leave(userId);
73std::vector<int> ownerUserIds = fileSharingSystem.request(userId, chunkId);
74*/
75

Typescript Solution

1type ChunksMap = Map<number, Set<number>>;
2type UserId = number;
3type ChunkId = number;
4
5let maxChunks: number; // Represents the total number of chunks available
6let currentUserID: UserId = 0; // Counter for the current highest userID assigned
7let availableUserIDs: Set<UserId> = new Set(); // Stores reusable user IDs
8let userToChunksMap: ChunksMap = new Map(); // Maps a user to their owned chunks
9
10// Initializes the system with the maximum number of chunks available.
11function initializeFileSharing(m: number): void {
12    maxChunks = m;
13    currentUserID = 0;
14    availableUserIDs = new Set();
15    userToChunksMap = new Map();
16}
17
18// When a user joins, assign them an ID and set their owned chunks.
19function join(ownedChunks: ChunkId[]): UserId {
20    let userID: UserId;
21    if (availableUserIDs.size === 0) {
22        currentUserID += 1; // Increment the counter if no reusable ID is available
23        userID = currentUserID;
24    } else {
25        userID = Array.from(availableUserIDs).shift()!; // Reuse an available ID
26        availableUserIDs.delete(userID); // Remove the ID from the set of available IDs
27    }
28    userToChunksMap.set(userID, new Set(ownedChunks)); // Associate the user with their chunks
29    return userID; // Return the new user ID
30}
31
32// When a user leaves, make their user ID available for reuse and remove their chunks.
33function leave(userID: UserId): void {
34    availableUserIDs.add(userID); // Add the userID to the set of reusable IDs
35    userToChunksMap.delete(userID); // Remove the user's chunk association
36}
37
38// When a user requests a chunk, return a list of user IDs that contain the chunk,
39// and add the chunk to the user's owned chunks if it exists.
40function request(userID: UserId, chunkID: ChunkId): UserId[] {
41    // Check for a valid chunkID
42    if (chunkID < 1 || chunkID > maxChunks) {
43        return []; // Return an empty list if the chunk ID is out of bounds
44    }
45
46    let owners: UserId[] = []; // Create an array for storing owner IDs of the chunk
47    userToChunksMap.forEach((chunks, ownerID) => {
48        if (chunks.has(chunkID)) {
49            owners.push(ownerID); // Add the user ID to the list if they own the chunk
50        }
51    });
52
53    // If at least one owner is found, add the chunk to the requesting user's owned chunks
54    if (owners.length > 0) {
55        userToChunksMap.get(userID)?.add(chunkID) || userToChunksMap.set(userID, new Set([chunkID]));
56    }
57    return owners; // Return the list of owners
58}
59
60// Usage example, replace `m`, `ownedChunks`, and `chunkID` with actual values
61/*
62initializeFileSharing(m);
63const userID = join(ownedChunks);
64leave(userID);
65const ownerUserIDs = request(userID, chunkID);
66*/
67

Time and Space Complexity

Time Complexity

  • __init__ method:

    • The initializer of the FileSharing class is O(1) because it only involves assigning values to variables.
  • join method:

    • The join method has a time complexity of O(log n) if a reused ID exists due to the heappop operation. Otherwise, it's O(1) when incrementing self.cur.
  • leave method:

    • The complexity is O(log n) because it involves heappush, which introduces logarithmic complexity based on the number of users that have been created and left (collected in the self.reused heap).
  • request method:

    • This method has a complexity of O(u + k log u), where u is the number of users and k is the number of users having the chunk. Traversing through all user chunks to find who has the specific chunkID is O(u). In the worst case, it requires adding each user having the chunk into the response list, which takes O(k), followed by a sort operation O(k log k) (since k <= u, we consider O(k log u) for the sort). Adding the chunkID to the requesting user's set is O(1) assuming a hashing set is used.

Space Complexity

  • For all methods overall:
    • The space complexity is O(u * c) where u is the number of users and c is the average number of chunks each user holds. The predominant storage is the self.user_chunks, which stores a set of chunks for each user.

Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫