The Lookup Problem
Finding Data Fast
You have a list of 10,000 customer records. A customer calls asking about order #7842. You need to find that order. How long does it take?
If your data is in an array, you check each element one by one until you find order #7842. This is linear search. On average, you check half the array. With 10,000 records, that's 5,000 comparisons.
Linear Search is Slow
Linear search takes O(n) time. As your data grows, search time grows proportionally:
- 1,000 items: ~500 comparisons average
- 10,000 items: ~5,000 comparisons average
- 1,000,000 items: ~500,000 comparisons average
This gets slow fast. Modern applications handle millions of records. Users expect instant results.
1# Linear search through customer orders
2orders = [1234, 5678, 7842, 9012, ...] # 10,000 orders
3
4target = 7842
5for i in range(len(orders)):
6 if orders[i] == target:
7 print(f"Found at index {i}")
8 break
9# Checked 3 items before finding target
Python checks each element sequentially until finding the target.
The Direct Access Dream
What if you could jump directly to the data you need? Like knowing exactly which file cabinet drawer contains a document, instead of opening every drawer.
Arrays give you this power when you know the index. orders[2] is instant—O(1). But you need to know that order #7842 lives at index 2.
The problem: order numbers don't match array indices. Order #7842 might be at index 2, index 5,000, or anywhere.
Hash Tables: The Solution
Hash tables solve this problem. They transform your search key (like order #7842) into an array index. This makes lookups nearly instant—O(1) on average.
Instead of checking 5,000 items, you check one. The hash table calculates where to look.
The next articles explain how hash tables achieve this magic.