1500. Design a File Sharing System
Problem Description
The problem presents a scenario where a file-sharing system is used to distribute a large file split into m
chunks, each identified by a unique integer ID from 1
to m
. The system must handle user interactions with the file chunks and keep track of which users own which chunks. Users are assigned a unique, reusable ID when they join the system. When they leave, their ID can be reissued to new users. The main operations to implement in the system are:
- Assigning a unique ID to users when they join, and considering the reuse of IDs from users who have left.
- Removing user ID information and associated chunks when a user leaves.
- Processing requests from users for file chunks, returning a list of IDs of the users who own the requested chunk, and updating chunk ownership if a chunk is acquired.
The goal is to implement the FileSharing
class with these functionalities, making sure to address certain constraints about chunk and user IDs, and ensuring efficient operation under possible system use cases as indicated in the follow-up questions.
Intuition
To solve this problem, we create a class FileSharing
with the following characteristics:
- Tracking User IDs: We keep a counter
cur
to track the latest assigned user ID and a min-heapreused
to keep available IDs which can be reassigned once a user leaves. - Chunk Ownership: A
defaultdict(set)
nameduser_chunks
is used to map each user ID to a set of chunks they own. - Joining the System (join): When a user joins the system, we check if there is any reusable ID in the
reused
heap; if so, we assign it; otherwise, we increment thecur
counter to generate a new ID. Then we add the chunks the user owns to their set inuser_chunks
. - Leaving the System (leave): When a user leaves, their user ID is added to the
reused
heap to indicate it's available for future users, and their entry inuser_chunks
is deleted to remove record of their chunk ownership. - Requesting Chunks (request): Whenever a user requests a chunk, we iterate over
user_chunks
to find all users who own the requested chunk and return the list sorted. If the list is not empty, it means the requesting user successfully gets the chunk, so we add the chunk to the requester's set of owned chunks.
This approach ensures all operations — join, leave, and request — are performed efficiently. It will remain efficient unless the pattern of usage leads to extreme scenarios like immense heap sizes due to users joining and leaving without requests, or if the balance between join and leave calls becomes unevenly skewed. Follow-ups suggest tweaks for different scenarios, such as when users are identified by IP addresses or for a possible extension to multiple files, pointing to potential scalability issues or optimizations that might be required as system usage evolves.
Learn more about Data Stream and Heap (Priority Queue) patterns.
Solution Approach
To implement the file sharing system described in the problem statement, the solution uses several Python-specific features and data structures effectively.
Data Structures Used:
-
Min-heap (
reused
): Python'sheapq
module is utilized to create a min-heap which is used to manage the reusable user IDs efficiently. The heap property allows for constant time retrieval of the smallest element, which in this case is the smallest available user ID. This ensures IDs are reused in ascending order, fulfilling the requirement of assigning the smallest positive integer not taken by any other user. -
Default dictionary (
user_chunks
): A default dictionary of sets from thecollections
module is used where each key is the user ID and the value is the set of owned chunks. The set data structure is chosen to store chunk ownership because it allows for quick addition, deletion, and membership checks, and the default dictionary initializes new keys with empty sets without needing explicit checks.
Algorithms and Patterns:
-
Initialization (
__init__
Method): The class is initialized by setting the countercur
to zero, which will track the next unique user ID to be assigned, storing the total number of chunkschunks
, preparing an empty min-heapreused
for storing reusable user IDs, and preparing a default dictionaryuser_chunks
to map user IDs to their owned chunks. -
Joining the System (
join
Method): To handle a new user joining, we first check if there are any reusable IDs in the heap:- If the
reused
heap is not empty, pop the smallest element from the heap to use as the new user ID. - If the
reused
heap is empty, increment thecur
counter to get a new user ID. After acquiring a user ID, we create a new entry inuser_chunks
for the user and initialize their set of chunks withownedChunks
.
- If the
-
Leaving the System (
leave
Method): When a user leaves, their user ID is pushed back into thereused
heap to indicate that it can be reused by a future user. Their entry inuser_chunks
is also removed to clean up their owned chunks. -
Requesting Chunks (
request
Method): When a user requests a particular chunk, we must return a list of user IDs that own that chunk. We loop through theuser_chunks
dictionary to find all users owning the requested chunkID. If the requested chunk is owned by any users, we also add that chunk to the requester's set of owned chunks to indicate they now have it as well.- Check if requested
chunkID
is within valid range, return empty list if not. - Iterate through
user_chunks
to find all users having the requestedchunkID
. - If the returned list,
res
, has any owners in it, addchunkID
to the requester's set of owned chunks. - Before returning, we sort the list
res
to ensure the ascending order of user IDs.
- Check if requested
This approach ensures that no user IDs are wasted, the smallest possible ID is always assigned, and chunks ownership is kept updated dynamically in an efficient manner based on user activity.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's go through a small example to illustrate how the FileSharing
system might operate with users joining, requesting chunks, and leaving the system. Suppose we have a system with m = 3
chunks to distribute. Here's how the solution approach would handle different operations:
-
Initialization: A
FileSharing
instance is created. Thecur
counter is 0,reused
heap is empty, anduser_chunks
is an empty default dictionary. -
User A Joins:
join
is called with A owning no chunks initially.reused
heap is empty, socur
is incremented from 0 to 1. User A is given ID 1.user_chunks
now has {1: set()} indicating user A with an empty set of chunks.
-
User B Joins:
join
is called with B owning chunk 2.reused
is still empty,cur
is incremented to 2, and user B is given ID 2.user_chunks
is updated to {1: set(), 2: {2}} showing B owns chunk 2.
-
User A Requests Chunk 2:
request
is called with A requesting chunk 2.- User B is found as owning chunk 2, so the system returns [2].
- Chunk 2 is added to A's set of chunks, so
user_chunks
now is {1: {2}, 2: {2}}.
-
User C Joins:
join
is called with C owning chunks 1 and 3.- No IDs available in
reused
,cur
goes to 3. User C is given ID 3. user_chunks
is updated to {1: {2}, 2: {2}, 3: {1, 3}}.
-
User B Leaves:
leave
is called for user B.- B's ID (2) is pushed into the
reused
heap indicating it's now available. - B's entry is removed from
user_chunks
, which now is {1: {2}, 3: {1, 3}}.
-
User D Joins:
join
is called with D owning chunk 3.- Reused heap contains 2, so D is assigned this ID, and heap becomes empty again.
user_chunks
is updated to {1: {2}, 2: {3}, 3: {1, 3}} as the user_chunks for user D contains chunk 3.
This example illustrates user activity involving all three system operations. It shows how the FileSharing
system assigns user IDs efficiently, manages chunk ownership, handles user requests, and deals with users leaving while keeping its state coherent and allowing for IDs to be reused.
Solution Implementation
1from collections import defaultdict
2from heapq import heappush, heappop
3from typing import List
4
5class FileSharing:
6 def __init__(self, m: int):
7 self.current_user_id = 0 # Tracks the current maximum user ID
8 self.total_chunks = m # Stores the total number of chunks
9 self.available_user_ids = [] # Min-heap of reusable user IDs
10 self.user_chunks_map = defaultdict(set) # Maps user IDs to the set of owned chunks
11
12 def join(self, owned_chunks: List[int]) -> int:
13 # If there are available user IDs in the heap, reuse the smallest one
14 if self.available_user_ids:
15 user_id = heappop(self.available_user_ids)
16 else:
17 # If not, increment the current user ID and use it
18 self.current_user_id += 1
19 user_id = self.current_user_id
20 # Assign the owned chunks to this user ID
21 self.user_chunks_map[user_id] = set(owned_chunks)
22 return user_id
23
24 def leave(self, user_id: int) -> None:
25 # Add the user ID back to the heap for future reuse
26 heappush(self.available_user_ids, user_id)
27 # Remove this user's data from the mapping
28 del self.user_chunks_map[user_id]
29
30 def request(self, user_id: int, chunk_id: int) -> List[int]:
31 # If the chunk ID is invalid, return an empty list
32 if chunk_id < 1 or chunk_id > self.total_chunks:
33 return []
34
35 users_with_chunk = []
36 # Iterate over the users to find who owns the requested chunk
37 for uid, chunks in self.user_chunks_map.items():
38 if chunk_id in chunks:
39 users_with_chunk.append(uid)
40
41 # If the chunk is found with any user, add the chunk to the requester's list
42 if users_with_chunk:
43 self.user_chunks_map[user_id].add(chunk_id)
44 # Return the list of users, sorted, who have the chunk
45 return sorted(users_with_chunk)
46
47
48# Example of how to instantiate and interact with FileSharing class:
49# obj = FileSharing(m)
50# user_id = obj.join(owned_chunks)
51# obj.leave(user_id)
52# list_of_users = obj.request(user_id, chunk_id)
53
1import java.util.*;
2
3public class FileSharing {
4
5 private int maxChunks; // Represents the total number of chunks available
6 private int currentUserID; // Counter for the current highest userID assigned
7 private TreeSet<Integer> availableUserIDs; // Stores reusable user IDs
8 private TreeMap<Integer, Set<Integer>> userToChunksMap; // Maps a user to their owned chunks
9
10 // Constructor initializes the system with the maximum number of chunks available.
11 public FileSharing(int m) {
12 currentUserID = 0;
13 maxChunks = m;
14 availableUserIDs = new TreeSet<>();
15 userToChunksMap = new TreeMap<>();
16 }
17
18 // When a user joins, assign them an ID and set their owned chunks.
19 public int join(List<Integer> ownedChunks) {
20 int userID;
21 if (availableUserIDs.isEmpty()) {
22 ++currentUserID; // Increment the counter if no reusable ID is available
23 userID = currentUserID;
24 } else {
25 userID = availableUserIDs.pollFirst(); // Reuse an available ID
26 }
27 userToChunksMap.put(userID, new HashSet<>(ownedChunks)); // Associate the user with their chunks
28 return userID; // Return the new user ID
29 }
30
31 // When a user leaves, make their user ID available for reuse and remove their chunks.
32 public void leave(int userID) {
33 availableUserIDs.add(userID); // Add the userID to the set of reusable IDs
34 userToChunksMap.remove(userID); // Remove the user's chunk association
35 }
36
37 // When a user requests a chunk, return a list of user IDs that contain the chunk,
38 // and add the chunk to the user's owned chunks if it exists.
39 public List<Integer> request(int userID, int chunkID) {
40 // Check for a valid chunkID
41 if (chunkID < 1 || chunkID > maxChunks) {
42 return Collections.emptyList(); // Return an empty list if the chunk ID is out of bounds
43 }
44
45 List<Integer> owners = new ArrayList<>(); // Create a list for storing owner IDs of the chunk
46 for (Map.Entry<Integer, Set<Integer>> entry : userToChunksMap.entrySet()) {
47 if (entry.getValue().contains(chunkID)) {
48 owners.add(entry.getKey()); // Add the user ID to the list if they own the chunk
49 }
50 }
51 // If at least one owner is found, add the chunk to the requesting user's owned chunks
52 if (!owners.isEmpty()) {
53 userToChunksMap.computeIfAbsent(userID, k -> new HashSet<>()).add(chunkID);
54 }
55 return owners; // Return the list of owners
56 }
57}
58
59// Usage example of the FileSharing class:
60/*
61FileSharing fileSharingSystem = new FileSharing(m);
62int userID = fileSharingSystem.join(ownedChunks);
63fileSharingSystem.leave(userID);
64List<Integer> ownerUserIDs = fileSharingSystem.request(userID, chunkID);
65*/
66
1#include <vector>
2#include <set>
3#include <map>
4#include <algorithm>
5#include <iterator>
6
7class FileSharing {
8 private:
9 int maxChunks; // Represents the total number of chunks available
10 int currentUserId; // Counter for the current highest userID assigned
11 std::set<int> availableUserIds; // Stores reusable user IDs
12 std::map<int, std::set<int>> userToChunksMap; // Maps a user to their owned chunks
13
14 public:
15 // Constructor initializes the system with the maximum number of chunks available.
16 FileSharing(int m) : maxChunks(m), currentUserId(0) {}
17
18 // When a user joins, assign them an ID and set their owned chunks.
19 int join(const std::vector<int>& ownedChunks) {
20 int userId;
21 if (availableUserIds.empty()) {
22 // Increment the counter if no reusable ID is available
23 ++currentUserId;
24 userId = currentUserId;
25 } else {
26 // Reuse an available ID
27 userId = *availableUserIds.begin();
28 // Erase the used ID from the set
29 availableUserIds.erase(availableUserIds.begin());
30 }
31 // Associate the user with their chunks
32 userToChunksMap[userId] = std::set<int>(ownedChunks.begin(), ownedChunks.end());
33 return userId; // Return the new user ID
34 }
35
36 // When a user leaves, make their user ID available for reuse and remove their chunks.
37 void leave(int userId) {
38 // Add the userID to the set of reusable IDs
39 availableUserIds.insert(userId);
40 // Remove the user's chunk association
41 userToChunksMap.erase(userId);
42 }
43
44 // When a user requests a chunk, return a list of user IDs that contain the chunk,
45 // and add the chunk to the user's owned chunks if it exists.
46 std::vector<int> request(int userId, int chunkId) {
47 // Check for a valid chunkID
48 std::vector<int> owners;
49 if (chunkId < 1 || chunkId > maxChunks) {
50 // Return an empty vector if the chunk ID is out of bounds
51 return owners;
52 }
53
54 for (const auto& entry : userToChunksMap) {
55 if (entry.second.count(chunkId)) {
56 // Add the user ID to the vector if they own the chunk
57 owners.push_back(entry.first);
58 }
59 }
60 // If at least one owner is found, add the chunk to the requesting user's owned chunks
61 if (!owners.empty()) {
62 userToChunksMap[userId].insert(chunkId);
63 }
64 return owners; // Return the vector of owners
65 }
66};
67
68// Usage example of the FileSharing class:
69/*
70FileSharing fileSharingSystem(m);
71int userId = fileSharingSystem.join(ownedChunks);
72fileSharingSystem.leave(userId);
73std::vector<int> ownerUserIds = fileSharingSystem.request(userId, chunkId);
74*/
75
1type ChunksMap = Map<number, Set<number>>;
2type UserId = number;
3type ChunkId = number;
4
5let maxChunks: number; // Represents the total number of chunks available
6let currentUserID: UserId = 0; // Counter for the current highest userID assigned
7let availableUserIDs: Set<UserId> = new Set(); // Stores reusable user IDs
8let userToChunksMap: ChunksMap = new Map(); // Maps a user to their owned chunks
9
10// Initializes the system with the maximum number of chunks available.
11function initializeFileSharing(m: number): void {
12 maxChunks = m;
13 currentUserID = 0;
14 availableUserIDs = new Set();
15 userToChunksMap = new Map();
16}
17
18// When a user joins, assign them an ID and set their owned chunks.
19function join(ownedChunks: ChunkId[]): UserId {
20 let userID: UserId;
21 if (availableUserIDs.size === 0) {
22 currentUserID += 1; // Increment the counter if no reusable ID is available
23 userID = currentUserID;
24 } else {
25 userID = Array.from(availableUserIDs).shift()!; // Reuse an available ID
26 availableUserIDs.delete(userID); // Remove the ID from the set of available IDs
27 }
28 userToChunksMap.set(userID, new Set(ownedChunks)); // Associate the user with their chunks
29 return userID; // Return the new user ID
30}
31
32// When a user leaves, make their user ID available for reuse and remove their chunks.
33function leave(userID: UserId): void {
34 availableUserIDs.add(userID); // Add the userID to the set of reusable IDs
35 userToChunksMap.delete(userID); // Remove the user's chunk association
36}
37
38// When a user requests a chunk, return a list of user IDs that contain the chunk,
39// and add the chunk to the user's owned chunks if it exists.
40function request(userID: UserId, chunkID: ChunkId): UserId[] {
41 // Check for a valid chunkID
42 if (chunkID < 1 || chunkID > maxChunks) {
43 return []; // Return an empty list if the chunk ID is out of bounds
44 }
45
46 let owners: UserId[] = []; // Create an array for storing owner IDs of the chunk
47 userToChunksMap.forEach((chunks, ownerID) => {
48 if (chunks.has(chunkID)) {
49 owners.push(ownerID); // Add the user ID to the list if they own the chunk
50 }
51 });
52
53 // If at least one owner is found, add the chunk to the requesting user's owned chunks
54 if (owners.length > 0) {
55 userToChunksMap.get(userID)?.add(chunkID) || userToChunksMap.set(userID, new Set([chunkID]));
56 }
57 return owners; // Return the list of owners
58}
59
60// Usage example, replace `m`, `ownedChunks`, and `chunkID` with actual values
61/*
62initializeFileSharing(m);
63const userID = join(ownedChunks);
64leave(userID);
65const ownerUserIDs = request(userID, chunkID);
66*/
67
Time and Space Complexity
Time Complexity
-
__init__
method:- The initializer of the
FileSharing
class isO(1)
because it only involves assigning values to variables.
- The initializer of the
-
join
method:- The
join
method has a time complexity ofO(log n)
if a reused ID exists due to the heappop operation. Otherwise, it'sO(1)
when incrementingself.cur
.
- The
-
leave
method:- The complexity is
O(log n)
because it involves heappush, which introduces logarithmic complexity based on the number of users that have been created and left (collected in theself.reused
heap).
- The complexity is
-
request
method:- This method has a complexity of
O(u + k log u)
, whereu
is the number of users andk
is the number of users having the chunk. Traversing through all user chunks to find who has the specificchunkID
isO(u)
. In the worst case, it requires adding each user having the chunk into the response list, which takesO(k)
, followed by a sort operationO(k log k)
(sincek <= u
, we considerO(k log u)
for the sort). Adding the chunkID to the requesting user's set isO(1)
assuming a hashing set is used.
- This method has a complexity of
Space Complexity
- For all methods overall:
- The space complexity is
O(u * c)
whereu
is the number of users andc
is the average number of chunks each user holds. The predominant storage is theself.user_chunks
, which stores a set of chunks for each user.
- The space complexity is
Learn more about how to find time and space complexity quickly using problem constraints.
Problem: Given a list of tasks and a list of requirements, compute a sequence of tasks that can be performed, such that we complete every task once while satisfying all the requirements.
Which of the following method should we use to solve this problem?
Recommended Readings
Median of Data Stream Given a stream of numbers find the median number at any given time accurate to 1 decimal place Example add_number 1 add_number 2 add_number 3 get_median 2 0 add_number 4 get_median 2 5 Try it yourself Explanation Intuition Brute force way is to sort the entire list every time we get a new number This would be O N log N for each add_number operation One step up is to notice that the list is sorted before we add any new number to it So every
https algomonster s3 us east 2 amazonaws com cover_photos heap svg Priority Queue and Heap What is the relationship between priority queue and heap Priority Queue is an Abstract Data Type and Heap is the concrete data structure we use to implement a priority queue Priority Queue A priority queue
Patterns The Shortest Path Algorithm for Coding Interviews The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we can determine the
Got a question? Ask the Monster Assistant anything you don't understand.
Still not clear? Submit the part you don't understand to our editors. Or join our Discord and ask the community.