705. Design HashSet

EasyDesignArrayHash TableLinked ListHash Function

Problem Description

This problem asks you to implement your own HashSet data structure without using any built-in hash table libraries.

A HashSet is a data structure that stores unique values and provides fast operations for adding, removing, and checking if a value exists. You need to create a class called MyHashSet with three methods:

add(key): This method takes an integer key and inserts it into the HashSet. If the key already exists, the HashSet remains unchanged.
remove(key): This method takes an integer key and removes it from the HashSet if it exists. If the key doesn't exist, nothing happens.
contains(key): This method takes an integer key and returns true if the key exists in the HashSet, or false if it doesn't.

The solution uses a simple static array approach. Since the problem constraints limit the keys to a specific range (0 to 1,000,000), we can create a boolean array of size 1000001. Each index in the array represents a possible key value, and the boolean value at that index indicates whether the key exists in the HashSet (True) or not (False).

When adding a key, we simply set data[key] = True
When removing a key, we set data[key] = False
When checking if a key exists, we return the value of data[key]

This approach trades memory for simplicity and speed, providing O(1) time complexity for all operations at the cost of O(n) space where n is the range of possible keys.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

When designing a HashSet without built-in libraries, we need to think about how to efficiently store and retrieve values. The key insight is recognizing that the problem has a constraint on the range of possible keys (typically 0 to 1,000,000 in LeetCode problems).

Since we know the exact range of possible values, we can use direct addressing. Think of it like having a row of mailboxes numbered from 0 to 1,000,000. Each mailbox can either be empty (the number doesn't exist in our set) or have a flag raised (the number exists in our set).

Instead of using complex hashing functions or collision resolution strategies that traditional hash tables employ, we can simply use the key itself as the index into our array. This is possible because:

The keys are integers within a known range
We have enough memory to allocate an array for the entire range
We want O(1) operations for add, remove, and contains

The boolean array approach emerges naturally from this thinking:

Each index position represents a potential key value
True at position i means key i is in the set
False at position i means key i is not in the set

This eliminates the need for any hashing function since the key directly maps to its storage location. While this uses more memory than necessary (we allocate space for all possible keys even if we only use a few), it provides the simplest and fastest implementation with guaranteed O(1) time complexity for all operations.

Learn more about Linked List patterns.

Solution Approach

The implementation uses a static array approach where we allocate a fixed-size boolean array to represent our HashSet.

Data Structure:

Create a boolean array data of size 1000001 to cover all possible key values from 0 to 1,000,000
Initialize all elements to False, indicating an empty HashSet

Implementation Details:

Initialization (__init__):
```
self.data = [False] * 1000001
```
We create a list with 1000001 elements, all initialized to False. This represents an empty HashSet where no keys are present initially.
Adding an element (add):
```
self.data[key] = True
```
To add a key to the HashSet, we simply set the value at index key to True. This marks that the key now exists in our set. If the key already exists (the value is already True), this operation has no effect, maintaining the unique property of a set.
Removing an element (remove):
```
self.data[key] = False
```
To remove a key, we set the value at index key to False. This marks the key as no longer present in the set. If the key doesn't exist (the value is already False), this operation still sets it to False, effectively doing nothing as required.
Checking existence (contains):
```
return self.data[key]
```
To check if a key exists, we simply return the boolean value at index key. If it's True, the key exists; if it's False, it doesn't.

Time and Space Complexity:

Time Complexity: O(1) for all operations (add, remove, contains) since we're directly accessing array indices
Space Complexity: O(n) where n is the range of possible keys (1,000,001 in this case)

This approach trades memory for simplicity and speed, providing constant-time operations without the complexity of handling collisions or implementing hash functions.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a series of operations to see how our HashSet implementation works:

Initial State: We create a new HashSet. Our data array of size 1000001 is initialized with all False values.

data = [False, False, False, False, False, ...]  # all 1000001 positions are False
        index: 0     1     2     3     4    ...

Operation 1: add(3) We want to add key 3 to our HashSet.

We set data[3] = True
The array becomes:

data = [False, False, False, True, False, ...]
        index: 0     1     2    3     4    ...

Operation 2: add(7) We add key 7 to our HashSet.

We set data[7] = True
The array becomes:

data = [False, False, False, True, False, False, False, True, False, ...]
        index: 0     1     2    3     4     5     6     7     8    ...

Operation 3: contains(3) We check if key 3 exists in our HashSet.

We return data[3], which is True
Result: True (key 3 exists)

Operation 4: contains(5) We check if key 5 exists in our HashSet.

We return data[5], which is False
Result: False (key 5 doesn't exist)

Operation 5: add(3) (Adding duplicate) We try to add key 3 again.

We set data[3] = True (it was already True)
The array remains unchanged - this maintains the unique property of a set

data = [False, False, False, True, False, False, False, True, False, ...]
        index: 0     1     2    3     4     5     6     7     8    ...

Operation 6: remove(7) We remove key 7 from our HashSet.

We set data[7] = False
The array becomes:

data = [False, False, False, True, False, False, False, False, False, ...]
        index: 0     1     2    3     4     5     6     7      8    ...

Operation 7: remove(5) (Removing non-existent key) We try to remove key 5, which doesn't exist.

We set data[5] = False (it was already False)
Nothing changes - no error occurs

Operation 8: contains(7) We check if key 7 exists after removal.

We return data[7], which is False
Result: False (key 7 was successfully removed)

Final State: Our HashSet now contains only the key 3. In the entire array, only position 3 has a True value, while all other positions remain False.

This example demonstrates how each operation directly manipulates or reads from specific array indices, achieving O(1) time complexity for all operations. The key itself serves as the index, eliminating the need for any hash function calculations.

Solution Implementation

1class MyHashSet:
2    def __init__(self) -> None:
3        """
4        Initialize the HashSet data structure.
5        Uses a boolean array to track presence of keys.
6        Array size is 10^6 + 1 to handle keys in range [0, 10^6].
7        """
8        # Boolean array where index represents the key
9        # True means key exists, False means it doesn't
10        self.data = [False] * 1000001
11
12    def add(self, key: int) -> None:
13        """
14        Add a key to the HashSet.
15      
16        Args:
17            key: Integer value to be added (0 <= key <= 10^6)
18        """
19        # Set the corresponding index to True to indicate key exists
20        self.data[key] = True
21
22    def remove(self, key: int) -> None:
23        """
24        Remove a key from the HashSet if it exists.
25      
26        Args:
27            key: Integer value to be removed (0 <= key <= 10^6)
28        """
29        # Set the corresponding index to False to indicate key doesn't exist
30        self.data[key] = False
31
32    def contains(self, key: int) -> bool:
33        """
34        Check if a key exists in the HashSet.
35      
36        Args:
37            key: Integer value to check (0 <= key <= 10^6)
38          
39        Returns:
40            bool: True if key exists, False otherwise
41        """
42        # Return the boolean value at the key's index
43        return self.data[key]
44
45
46# Your MyHashSet object will be instantiated and called as such:
47# obj = MyHashSet()
48# obj.add(key)
49# obj.remove(key)
50# param_3 = obj.contains(key)
51

1/**
2 * Implementation of a HashSet data structure using a boolean array.
3 * This approach uses direct addressing where the key itself serves as the index.
4 * Time Complexity: O(1) for all operations (add, remove, contains)
5 * Space Complexity: O(n) where n is the range of possible keys (0 to 1000000)
6 */
7class MyHashSet {
8    // Boolean array to store presence of elements
9    // Index represents the key, value represents whether the key exists
10    private boolean[] data;
11  
12    // Maximum size based on problem constraints (keys range from 0 to 10^6)
13    private static final int MAX_SIZE = 1000001;
14
15    /**
16     * Constructor to initialize the HashSet
17     * Creates a boolean array with all values initially set to false
18     */
19    public MyHashSet() {
20        data = new boolean[MAX_SIZE];
21    }
22
23    /**
24     * Adds a key to the HashSet
25     * @param key The integer key to be added (0 <= key <= 10^6)
26     */
27    public void add(int key) {
28        data[key] = true;
29    }
30
31    /**
32     * Removes a key from the HashSet
33     * @param key The integer key to be removed
34     */
35    public void remove(int key) {
36        data[key] = false;
37    }
38
39    /**
40     * Checks if the HashSet contains a specific key
41     * @param key The integer key to check for presence
42     * @return true if the key exists in the HashSet, false otherwise
43     */
44    public boolean contains(int key) {
45        return data[key];
46    }
47}
48
49/**
50 * Your MyHashSet object will be instantiated and called as such:
51 * MyHashSet obj = new MyHashSet();
52 * obj.add(key);
53 * obj.remove(key);
54 * boolean param_3 = obj.contains(key);
55 */
56

1class MyHashSet {
2private:
3    // Boolean array to track presence of elements (0 to 1000000)
4    bool data[1000001];
5
6public:
7    // Constructor: Initialize all elements to false
8    MyHashSet() {
9        memset(data, false, sizeof(data));
10    }
11  
12    // Add a key to the hash set
13    void add(int key) {
14        data[key] = true;
15    }
16  
17    // Remove a key from the hash set
18    void remove(int key) {
19        data[key] = false;
20    }
21  
22    // Check if the hash set contains a specific key
23    bool contains(int key) {
24        return data[key];
25    }
26};
27
28/**
29 * Your MyHashSet object will be instantiated and called as such:
30 * MyHashSet* obj = new MyHashSet();
31 * obj->add(key);
32 * obj->remove(key);
33 * bool param_3 = obj->contains(key);
34 */
35

1// Boolean array to store presence of keys in the hash set
2// Index represents the key, value represents whether the key exists
3let data: boolean[] = new Array(1000001).fill(false);
4
5/**
6 * Adds a key to the hash set
7 * @param key - The integer key to add (0 <= key <= 10^6)
8 */
9function add(key: number): void {
10    data[key] = true;
11}
12
13/**
14 * Removes a key from the hash set
15 * @param key - The integer key to remove
16 */
17function remove(key: number): void {
18    data[key] = false;
19}
20
21/**
22 * Checks if a key exists in the hash set
23 * @param key - The integer key to check
24 * @returns true if the key exists, false otherwise
25 */
26function contains(key: number): boolean {
27    return data[key];
28}
29

Time and Space Complexity

Time Complexity:

add(key): O(1) - Direct array access and assignment at index key
remove(key): O(1) - Direct array access and assignment at index key
contains(key): O(1) - Direct array access and return at index key

All three operations perform constant-time array indexing operations, confirming the reference answer that the time complexity is O(1).

Space Complexity:

O(1000001) or O(10^6) - The implementation uses a fixed-size boolean array of 1,000,001 elements regardless of how many keys are actually stored
This can be considered O(1) space if we treat the maximum key value as a constant bound
Alternatively, if the maximum possible key value is n, then space complexity is O(n)

The approach trades space for time efficiency by using a direct-addressing table where each possible key maps to a unique index in the array.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Memory Inefficiency for Sparse Data

The most significant pitfall of this static array approach is the massive memory waste when dealing with sparse data. Even if you only store a few keys (e.g., 5 keys), you're still allocating memory for 1,000,001 boolean values. This uses approximately 1MB of memory regardless of how many elements you actually store.

Example scenario:

hashset = MyHashSet()
hashset.add(5)
hashset.add(999999)
# Only 2 elements stored, but using memory for 1,000,001 slots

Solution: Implement a dynamic hashing approach with buckets:

class MyHashSet:
    def __init__(self):
        self.bucket_size = 1000
        self.buckets = [[] for _ in range(self.bucket_size)]
  
    def _hash(self, key):
        return key % self.bucket_size
  
    def add(self, key):
        bucket_index = self._hash(key)
        if key not in self.buckets[bucket_index]:
            self.buckets[bucket_index].append(key)
  
    def remove(self, key):
        bucket_index = self._hash(key)
        if key in self.buckets[bucket_index]:
            self.buckets[bucket_index].remove(key)
  
    def contains(self, key):
        bucket_index = self._hash(key)
        return key in self.buckets[bucket_index]

2. Fixed Range Limitation

The current implementation assumes keys will always be in the range [0, 1,000,000]. If the problem constraints change or you need to handle negative numbers or larger values, the code will fail with an IndexError.

Example of failure:

hashset = MyHashSet()
hashset.add(-1)  # IndexError: list index out of range
hashset.add(1000001)  # IndexError: list index out of range

Solution: Add input validation or use a more flexible data structure:

def add(self, key: int) -> None:
    if 0 <= key <= 1000000:
        self.data[key] = True
    # Alternatively, use a dictionary for unlimited range:
    # self.data[key] = True  # where self.data = {}

3. Boolean Array vs Bit Array

Using a list of booleans in Python is memory-inefficient. Each boolean in a Python list takes up more memory than necessary (typically 28 bytes per boolean object).

Solution: Use a bit array for better memory efficiency:

class MyHashSet:
    def __init__(self):
        # Using bitwise operations for memory efficiency
        self.data = [0] * 31251  # 1000001 bits / 32 bits per integer
  
    def add(self, key):
        index = key // 32
        bit_position = key % 32
        self.data[index] |= (1 << bit_position)
  
    def remove(self, key):
        index = key // 32
        bit_position = key % 32
        self.data[index] &= ~(1 << bit_position)
  
    def contains(self, key):
        index = key // 32
        bit_position = key % 32
        return (self.data[index] & (1 << bit_position)) != 0

This reduces memory usage by approximately 32x compared to the boolean array approach.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan: