Facebook Pixel

Contention and Scarcity: Striped Locks and Counters

A single global lock eliminates race conditions, but under high concurrency it introduces a different problem: all threads serialize through one bottleneck. Every thread that needs a database connection, a per-user request count, or a shared counter must wait behind every other thread.

A coarse-grained lock is straightforward to reason about but serializes access. Fine-grained locking recovers parallelism by partitioning shared state so that threads contend only when they genuinely need the same piece of data. The techniques in this article — lock striping, read-write locks, atomic counters, and bounded resource pools — each reduce unnecessary waiting while preserving correctness.

The two subgraphs below contrast the bottleneck of a single global lock against the parallelism recovered by striped locks.

The coarse-lock bottleneck

A connection pool manages a shared set of database connections. Threads borrow a connection, use it, and return it. A single lock protecting the entire pool serializes every borrow and return, even when the threads need different connections.

1import threading
2from collections import deque
3
4# Bad: one global lock — every thread waits regardless of which connection it needs
5class ConnectionPool:
6    def __init__(self, size: int) -> None:
7        self._connections: deque["Connection"] = deque(
8            Connection() for _ in range(size)
9        )
10        self._lock = threading.Lock()   # one lock for the entire pool
11
12    def acquire(self) -> "Connection":
13        with self._lock:                # Thread A holds this while Thread B–Z wait
14            if not self._connections:
15                raise RuntimeError("pool exhausted")
16            return self._connections.popleft()
17
18    def release(self, conn: "Connection") -> None:
19        with self._lock:
20            self._connections.append(conn)

Lock striping

Lock striping replaces one global lock with N smaller locks, each protecting a partition of the data. Threads that need different partitions run in parallel; only threads that need the same partition contend. Choosing N is a tuning decision: more stripes reduce contention at the cost of more memory and initialization overhead.

For a per-user rate limiter, the natural stripe key is the user ID. Two threads serving different users never contend; two threads serving the same user still contend on that user's stripe, which is exactly the contention that should be serialized. To map an arbitrary user-ID string onto a fixed lock, the code hashes it and takes the hash mod STRIPES — a hash spreads keys evenly, and mod 256 folds the result into one of 256 buckets. The literal 256 is a typical default: enough stripes to cut contention to near nothing, without paying for thousands of lock objects.

1import threading
2import hashlib
3
4# Good: one lock per stripe — threads for different users never contend
5class RateLimiter:
6    STRIPES = 256
7
8    def __init__(self, limit_per_minute: int) -> None:
9        self._limit = limit_per_minute
10        self._counts: dict[str, int] = {}
11        # 256 independent locks instead of one global lock
12        self._locks = [threading.Lock() for _ in range(self.STRIPES)]
13
14    def _stripe(self, user_id: str) -> threading.Lock:
15        index = int(hashlib.md5(user_id.encode()).hexdigest(), 16) % self.STRIPES
16        return self._locks[index]
17
18    def allow(self, user_id: str) -> bool:
19        with self._stripe(user_id):      # lock is scoped to this user's stripe
20            count = self._counts.get(user_id, 0)
21            if count >= self._limit:
22                return False
23            self._counts[user_id] = count + 1
24            return True

These sketches isolate the contention mechanics — per-stripe locks and per-user atomic counters — and leave the per-minute window out. The Python and Java counters grow for the life of the process (a real limiter resets each window, e.g. an external scheduler calling reset, or keying the count by a minute bucket the way the TypeScript/Redis tab does with minuteBucket() and a 60-second TTL).

Reduced contention in sequence

The sequence diagram below contrasts a single-lock pool against a striped design. With striping, Thread A (serving user 101) and Thread B (serving user 202) proceed without any waiting.

Invest in Yourself
Your new job is waiting. 83% of people that complete the program get a job offer. Unlock unlimited access to all content and features.
Go Pro