Leetcode 705. Design HashSet

Problem Explanation:

A HashSet is a type of data structure that allows you to store unique elements in no particular order. The primary operations that can be performed on a HashSet are add (to insert a value), remove (to delete a value), and contains (to check if a value exists).

The problem here is about designing a HashSet without using any built-in hash table libraries. We need to design the HashSet such that it includes add, remove, and contains operations.

For instance, consider a scenario where a HashSet object is instantiated. We then add the values 1 and 2 to the HashSet. If we check if the HashSet contains the value 1, it should return true. If we check if it contains the value 3, it should return false because we didn't add 3 to the HashSet. If we add 2 again and check for its existence, it should still return true. But if we remove 2 and then check if it exists, it should return false because we have already removed it from the HashSet.

Approach Explanation:

The mechanism behind a hashset is that it uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. In case of collision (when the hash function returns the same output for more than one inputs), it's handled using chaining (store multiple items in the same bucket). In this problem, since the maximum number of values we can have is known and relatively small (0 to 1,000,000), we can use an array (or list) where the indices represent the elements and the values at those indices represent their existence.

Contrary to typical hashset implementations, we will handle collision by simply overwriting the existing value because a HashSet doesn't allow duplicate values.

If we consider adding a value to the HashSet, we will simply change the value at that index in the array to true. If we want to remove a value, we change the value at that index to false. And if we want to check if the HashSet contains a value, we return the value at that index (which is a boolean).

Python Solution:

1
2python
3class MyHashSet:
4
5    def __init__(self):
6        """
7        Initialize your data structure here.
8        """
9        self.set = [False] * 1000001 # Initializing a list with a size of 1000001 and setting all initial values to False
10
11    def add(self, key: int) -> None:
12        self.set[key] = True # Set the value at index 'key' to True
13
14    def remove(self, key: int) -> None:
15        self.set[key] = False # Set the value at index 'key' to False
16
17    def contains(self, key: int) -> bool:
18        """
19        Returns true if this set contains the specified element
20        """
21        return self.set[key] # Return the boolean value at index 'key'

Java Solution:

1
2java
3class MyHashSet {
4    boolean[] set;
5
6    public MyHashSet() {
7      // Initializing an array with a size of 1000001 and setting all initial values to false
8        set = new boolean[1000001];
9    }
10
11    public void add(int key) {
12        // Set the value at index 'key' to true
13        set[key] = true;
14    }
15
16    public void remove(int key) {
17        // Set the value at index 'key' to false
18        set[key] = false;
19    }
20
21    public boolean contains(int key) {
22        // Return the boolean value at index 'key'
23        return set[key];
24    }
25}

Javascript Solution:

1
2javascript
3var MyHashSet = function() {
4    // Initializing an array with a size of 1000001
5    this.set = new Array(1000001).fill(false);
6};
7
8MyHashSet.prototype.add = function(key) {
9    // Set the value at index 'key' to true
10    this.set[key] = true;
11};
12
13MyHashSet.prototype.remove = function(key) {
14    // Set the value at index 'key' to false
15    this.set[key] = false;
16};
17
18MyHashSet.prototype.contains = function(key) {
19    // Return the boolean value at index 'key'
20    return this.set[key];
21};

C++ Solution:

1
2cpp
3class MyHashSet {
4private:  
5    vector<bool> set;
6public:
7    MyHashSet() {
8        // Initializing a vector with a size of 1000001 and setting all initial values to false
9        set.resize(1000001, false);
10    }
11
12    void add(int key) {
13        // Set the value at index 'key' to true
14        set[key] = true;
15    }
16
17    void remove(int key) {
18        // Set the value at index 'key' to false
19        set[key] = false;
20    }
21
22    bool contains(int key) {
23        // Return the boolean value at index 'key'
24        return set[key];
25    }
26};

C# Solution:

1
2csharp
3public class MyHashSet {
4    private bool[] set;
5
6    public MyHashSet() {
7        // Initializing an array with a size of 1000001 and setting all initial values to false
8        set = new bool[1000001];
9    }
10
11    public void Add(int key) {
12        // Set the value at index 'key' to true
13        set[key] = true;
14    }
15
16    public void Remove(int key) {
17        // Set the value at index 'key' to false
18        set[key] = false;
19    }
20
21    public bool Contains(int key) {
22        // Return the boolean value at index 'key'
23        return set[key];
24    }
25}

In summary, the problem can be solved by creating a data structure (list / array / vector) that contains a very large number of elements, and treating the indices of this data structure as the keys. When you want to add an element, set the value at the index corresponding to that element to true. When you want to remove an element, set the value at the index corresponding to that element to false. When you want to check whether an element is present, simply check whether the value at the index corresponding to that element is true.

This is definitely not the standard way to implement a HashSet, but it works perfectly fine for this problem because we know that the key values are going to be between 0 and 1,000,000.

In a real-world scenario where the keys are not limited to a specific range, this approach would not be feasible, because you would need a data structure of an enormous size to handle all possible keys. In a standard HashSet, the keys are hashed to a much smaller range of indices, and collisions are dealt with using chaining or open addressing.

Remember, one of the most important traits of a good software engineer is to discern which approach is suitable for a given situation. Therefore, it is important to know both the 'by the book' method and the 'trick' method, and when to apply each of them.


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫