What is Hashing?
Keys and Values
A hash table stores data as key-value pairs. The key identifies what you're looking for. The value is the data you want.
- Customer lookup: Key = customer ID, Value = customer name
- Word count: Key = word, Value = count
- Phone book: Key = name, Value = phone number
You search by key. The hash table returns the value instantly.
The Hash Function
A hash function transforms a key into a number. This number tells you where to find the value in an array.
The hash function takes any key—a number, string, or object—and returns an integer. This integer becomes an array index.
Example: Hash function receives "Alice" and returns 42. The hash table stores Alice's data at array index 42.
Hash Code to Index
Hash codes are large numbers. Arrays have limited size. The modulo operation (%) converts any hash code to a valid array index.
index = hash_code % array_capacity
If the array has 10 slots, % 10 gives indices 0 through 9. Hash code 5,789,604,752 becomes index 2.
1# Python's built-in hash function
2key = "Alice"
3hash_code = hash(key)
4print(hash_code) # Large number like 5789604752029982571
5
6# Use hash code to find array index
7capacity = 10
8index = hash_code % capacity # Index between 0-9
Python provides hash() for built-in types. The % operator converts large numbers to array indices.
Direct Access via Hash
Now you can jump directly to data:
- Hash the key to get a number
- Convert the number to an array index with
% - Look up the value at that index
Searching for "Alice"? Hash "Alice" to get index 2. Check array[2]. Done in O(1) time.
The Magic of O(1) Lookup
No matter how many items you store, lookups take constant time:
- 10 items: 1 step (hash + array access)
- 10,000 items: 1 step
- 10,000,000 items: 1 step
Compare this to linear search that takes 5,000,000 steps on average for 10,000,000 items.
Collisions
One problem: different keys can produce the same index. Such as "Alice" and "Bob" both hash to index 2. This is a collision.
Hash tables handle collisions. We'll cover collision handling in a later article. For now, know that hash tables still achieve O(1) average performance despite collisions.