Design a Rate Limiter
A rate limiter has compact domain logic — "has this client exceeded their quota?" — the mark of a utility problem where the design work is precision, not breadth. That work centers on three concerns: choosing an algorithm whose state is cheap and deterministic, keeping per-client accounting independent, and making the allow/deny check safe when many threads evaluate the same client simultaneously. The design separates the policy-enforcement shell from the pluggable algorithm inside it.
See It in Action
The demo runs the finished design: one rate limiter giving each client an independent quota. Pick a client, spend its allowance with each request, and watch it refill over time. Spamming one client never touches another's allowance — the per-client independence at the heart of the design.
Clarifying Requirements
The functional requirements we will commit to: the limiter is keyed per client string; each client gets a quota of requests per configurable window; a request is either allowed or denied; and the accounting is computed lazily at request time using integer arithmetic so the behaviour is deterministic and reproducible in tests. Timestamps are supplied explicitly rather than read from a wall clock, making the whole system testable without mocking.
Which counting algorithm enforces that quota is the first design decision, not a requirement. Several schemes can do it and they differ sharply in memory and burst behavior. Before reading on, decide which one limits bursts without paying unbounded memory per client.
Decision checkpoint
A per-client quota of N requests per window can be enforced several ways. Use a fixed-window counter (count requests, reset the count every window), a sliding-window log (timestamp every request, count those in the trailing window), or a token bucket (tokens refill at a steady rate, each request spends one)?
So the design uses a token bucket: each client holds tokens that refill at a steady rate, and a request is allowed only when a token is available to spend. The flowchart traces what that does on every request:
Core Entities
The token bucket is chosen, but one fork remains before the class list. The requirements still name sliding-window log and fixed-window counter as options a real system might switch to, so: should the counting algorithm be baked directly into the limiter, or sit behind an interface the limiter holds a reference to? Decide before reading on.
Decision checkpoint
Token bucket is the algorithm today, but sliding-window log and fixed-window counter are named as future options. Should RateLimiter wire TokenBucket in directly, or stay independent of which algorithm runs?
With that decided, the entity list is three things.
A RateLimitStrategy is the pluggable algorithm interface. It answers one question: "should this request be allowed, given the current timestamp?" The interface lets us swap token bucket for sliding-window log or fixed-window counter without touching the surrounding machinery.
A TokenBucket is a concrete strategy, and the mental model is worth building before the code.
Picture a bucket that holds up to capacity tokens. Every request takes one token out; if the
bucket is empty, the request is denied. Tokens are added back at a steady rate — capacity tokens
over each window of time — filling only up to the brim. So capacity is the size of the burst
you tolerate and window is how long a fully drained bucket takes to refill. With capacity = 10
and window = 60s, a client can fire 10 requests at once, then about one more every 6 seconds as
tokens trickle back.
Refill is lazy. Rather than a background timer adding tokens every few milliseconds — which
would burn CPU on idle clients and want a thread per bucket — the bucket adds nothing until a
request actually arrives. At that moment it asks "how many tokens would have refilled since I
last looked?" and adds them in one step. Same arithmetic, computed only when it matters; the
last_refill timestamp is the "when I last looked" marker.
That count of would-be tokens is refill = floor(elapsed * capacity / window). The refill rate is
capacity / window tokens per unit of time, so over elapsed units the bucket has earned
elapsed * capacity / window of them; floor rounds down because you cannot add a fraction of a
token. With capacity = 10, window = 60 seconds, and elapsed = 18 seconds, that is
floor(18 * 10 / 60) = floor(3) = 3 tokens. That floor is also where the accounting gets subtle.
Decision checkpoint
After adding refill = floor(elapsed * capacity / window) tokens, what should last_refill become? Picture a request that earned 2.4 tokens' worth of elapsed time but floor granted only 2.
So when the refill would fill the bucket (tokens + refill >= capacity), tokens are set to
capacity and last_refill is snapped to the current timestamp, discarding idle overflow time.
Otherwise tokens are added and last_refill advances by floor(refill * window / capacity), so
the fractional remainder carries forward and avoids drift.
The limiter has to enforce the quota separately for each client. That raises one more structural question before the orchestrator takes shape.
Decision checkpoint
The quota is per client. Should the limiter keep one shared bucket for all clients, or one bucket per client?
A RateLimiter is the orchestrator. It owns a dictionary mapping client strings to their per-client bucket. When a request arrives, it looks up or creates the bucket and delegates to the strategy. This is the one object the outside world talks to.
The entities-overview diagram shows how the three objects are connected at runtime:
Designing the Classes
With the entities settled, the next job is to derive each class's state and behavior directly from the requirements — no fields added because they "seem natural," only fields demanded by a specific requirement. The guiding principle is Tell, Don't Ask: each class enforces its own rules internally rather than exposing raw data for callers to manipulate.
RateLimitStrategy
State: none. The interface carries no data — it is a pure behavioral contract.
Behavior: one method, allow(timestamp: int) -> bool. The timestamp is passed in rather than
read from a wall clock so every concrete strategy is independently testable without mocking time.
The return value is a boolean; the caller decides what string to return to the user, keeping the
strategy focused entirely on the allow/deny decision.
TokenBucket
The requirements specify a token bucket with a configurable capacity and refill window. Refill is lazy and uses integer arithmetic. These three sentences determine the state entirely.
State: _capacity (the ceiling — tokens never exceed this), _window (the number of time units
for a full refill), _tokens (the current count, decremented on each allowed request), and
_last_refill (the timestamp of the last refill event). A new bucket starts at capacity tokens
with _last_refill set to the client's first-request timestamp — not to zero — so no free refill
is awarded on arrival.
Behavior: allow(timestamp: int) -> bool. This single method does three things in order: compute
how many tokens to add, update the token count and pointer, then consume one token if any remain.
The critical design decision is the integer refill math:
refill = floor(elapsed * capacity / window)
This avoids floats entirely. elapsed * capacity is computed first (integer multiplication), then
divided by window with floor division. The result is exact and deterministic across all runtimes.
The bucket then branches on whether the refill would fill it. If tokens + refill >= capacity,
tokens are set to capacity and _last_refill is snapped to timestamp — discarding idle
overflow time. This prevents a long idle gap followed by many same-timestamp requests from each
re-triggering a refill and granting more than capacity allows in total; after snapping, every
subsequent same-timestamp call sees elapsed = 0 and adds nothing. If the refill only partially
fills the bucket, tokens are added and _last_refill advances by floor(refill * window / capacity) — exactly the wall-time consumed by the tokens just added — preserving the fractional
remainder for the next call.
RateLimiter
The requirements say the limiter is keyed per client. Each client gets an independent bucket with
the same capacity and window. These requirements determine the state: _capacity and _window
(held so new buckets can be constructed on demand) and _buckets (a dictionary from client string
to RateLimitStrategy).
Behavior: request(client: str, timestamp: int) -> str. The method looks up the client in
_buckets; if the client is new, it creates a fresh TokenBucket with the client's first
timestamp as start. It then delegates to bucket.allow(timestamp) and returns "allow" or
"deny". The method never touches token counts directly — that is Tell, Don't Ask in action.
RateLimiter tells the bucket "decide this request," and the bucket decides.
Note that _buckets is typed as dict[str, RateLimitStrategy] — the orchestrator knows only the
interface, not the concrete class. Swapping TokenBucket for a SlidingWindowLog requires no
change to RateLimiter.
Reading the diagram: the RateLimiter holds a map of RateLimitStrategy instances — each a
TokenBucket at runtime, one per client. The map is keyed on the client string and created lazily
on first request. The public surface the outside world touches is just
rate_limiter(capacity, window, instructions) — a single entry point that replays a command
stream. Everything else is private.
Try It Yourself
Before reading the implementation, build it. Implement a function
rate_limiter(capacity, window, instructions) that replays a sequence of commands and returns
one result string per request.
Parameters:
capacity is the maximum number of tokens in each client's bucket (also the initial fill level).
window is the number of time units required for a full refill. The refill rate is
capacity / window tokens per time unit, computed with integer math:
refill = floor(elapsed * capacity / window). If tokens + refill >= capacity, tokens are set
to capacity and last_refill is snapped to timestamp (discarding idle overflow time, so a
long idle gap followed by many same-timestamp requests yields exactly capacity allows).
Otherwise, last_refill advances by floor(refill * window / capacity), preserving the
fractional remainder.
Commands:
Each instruction is a list of strings. The only command is
["request", client, timestamp]: at integer time timestamp, attempt to spend 1 token for
client. Timestamps are non-decreasing. For each request, append "allow" if a token was
available, or "deny" otherwise.
A new client's bucket starts full at capacity tokens, with last_refill set to the timestamp
of their first request (not to zero), so no free refill is awarded on arrival.
Run your solution against the test cases. The reference implementation follows below.
Code Walkthrough
The design is three small classes. Each section below walks through the implementation in the same order as the class design: interface first, then the concrete strategy, then the orchestrator, then the entry function.
The interface is a single abstract method. In Python, the ABC machinery enforces that every
concrete subclass provides its own allow implementation.
There is no state and no logic here. The value of this class is entirely in the type: RateLimiter
holds dict[str, RateLimitStrategy], so the orchestrator is coupled only to the interface. Any
future algorithm that implements allow slots in without changing a single line in RateLimiter.
1from abc import ABC, abstractmethod
2
3
4class RateLimitStrategy(ABC):
5 @abstractmethod
6 def allow(self, timestamp: int) -> bool: ...
Full Solution
1from abc import ABC, abstractmethod
2
3
4class RateLimitStrategy(ABC):
5 @abstractmethod
6 def allow(self, timestamp: int) -> bool: ...
7
8
9class TokenBucket(RateLimitStrategy):
10 """Token bucket with integer lazy-refill math (no floats).
11
12 refill = floor(elapsed * capacity / window). If that fills the bucket,
13 snap last_refill to now so idle time beyond a full bucket is discarded
14 (otherwise a long idle gap would let many same-timestamp requests each
15 re-trigger a refill). Otherwise advance last_refill only by the
16 wall-time the added tokens consumed, preserving the fractional
17 remainder for the next check.
18 """
19
20 def __init__(self, capacity: int, window: int, start: int) -> None:
21 self._capacity: int = capacity
22 self._window: int = window
23 self._tokens: int = capacity
24 self._last_refill: int = start
25
26 def allow(self, timestamp: int) -> bool:
27 elapsed: int = timestamp - self._last_refill
28 if elapsed > 0 and self._window > 0:
29 refill: int = (elapsed * self._capacity) // self._window
30 if refill > 0:
31 if self._tokens + refill >= self._capacity:
32 self._tokens = self._capacity
33 self._last_refill = timestamp
34 else:
35 self._tokens += refill
36 self._last_refill += (
37 refill * self._window
38 ) // self._capacity
39 if self._tokens > 0:
40 self._tokens -= 1
41 return True
42 return False
43
44
45class RateLimiter:
46 def __init__(self, capacity: int, window: int) -> None:
47 self._capacity: int = capacity
48 self._window: int = window
49 self._buckets: dict[str, RateLimitStrategy] = {}
50
51 def request(self, client: str, timestamp: int) -> str:
52 if client not in self._buckets:
53 self._buckets[client] = TokenBucket(
54 self._capacity, self._window, timestamp
55 )
56 return "allow" if self._buckets[client].allow(timestamp) else "deny"
57
58
59def rate_limiter(
60 capacity: int, window: int, instructions: list[list[str]]
61) -> list[str]:
62 limiter = RateLimiter(capacity, window)
63 output: list[str] = []
64 for instruction in instructions:
65 command, *args = instruction
66 if command == "request":
67 client, ts = args[0], int(args[1])
68 output.append(limiter.request(client, ts))
69 return output
70
71if __name__ == "__main__":
72 capacity = int(input())
73 window = int(input())
74 instructions = [input().split() for _ in range(int(input()))]
75 res = rate_limiter(capacity, window, instructions)
76 for line in res:
77 print(line)
781import java.util.ArrayList;
2import java.util.Arrays;
3import java.util.HashMap;
4import java.util.List;
5import java.util.Map;
6import java.util.Scanner;
7
8class Solution {
9 interface RateLimitStrategy {
10 boolean allow(int timestamp);
11 }
12
13 static class TokenBucket implements RateLimitStrategy {
14 private final int capacity;
15 private final int window;
16 private int tokens;
17 private int lastRefill;
18
19 TokenBucket(int capacity, int window, int start) {
20 this.capacity = capacity;
21 this.window = window;
22 this.tokens = capacity;
23 this.lastRefill = start;
24 }
25
26 @Override
27 public boolean allow(int timestamp) {
28 int elapsed = timestamp - lastRefill;
29 if (elapsed > 0 && window > 0) {
30 int refill = elapsed * capacity / window;
31 if (refill > 0) {
32 if (tokens + refill >= capacity) {
33 tokens = capacity;
34 lastRefill = timestamp;
35 } else {
36 tokens += refill;
37 lastRefill += refill * window / capacity;
38 }
39 }
40 }
41 if (tokens > 0) {
42 tokens--;
43 return true;
44 }
45 return false;
46 }
47 }
48
49 static class RateLimiter {
50 private final int capacity;
51 private final int window;
52 private final Map<String, RateLimitStrategy> buckets = new HashMap<>();
53
54 RateLimiter(int capacity, int window) {
55 this.capacity = capacity;
56 this.window = window;
57 }
58
59 String request(String client, int timestamp) {
60 buckets.computeIfAbsent(
61 client, k -> new TokenBucket(capacity, window, timestamp)
62 );
63 return buckets.get(client).allow(timestamp) ? "allow" : "deny";
64 }
65 }
66
67 // Main entry
68 public static List<String> rateLimiter(int capacity, int window, List<List<String>> instructions) {
69 RateLimiter limiter = new RateLimiter(capacity, window);
70 List<String> output = new ArrayList<>();
71 for (List<String> instruction : instructions) {
72 String command = instruction.get(0);
73 if (command.equals("request")) {
74 String client = instruction.get(1);
75 int ts = Integer.parseInt(instruction.get(2));
76 output.add(limiter.request(client, ts));
77 }
78 }
79 return output;
80 }
81
82 public static List<String> splitWords(String s) {
83 return s.isEmpty() ? List.of() : Arrays.asList(s.split(" "));
84 }
85
86 public static void main(String[] args) {
87 java.util.Scanner scanner = new java.util.Scanner(System.in);
88 int capacity = Integer.parseInt(scanner.nextLine());
89 int window = Integer.parseInt(scanner.nextLine());
90 int instructionsLength = Integer.parseInt(scanner.nextLine());
91 List<List<String>> instructions = new ArrayList<>();
92 for (int i = 0; i < instructionsLength; i++) {
93 instructions.add(splitWords(scanner.nextLine()));
94 }
95 scanner.close();
96 List<String> res = rateLimiter(capacity, window, instructions);
97 for (String line : res) {
98 System.out.println(line);
99 }
100 }
101}
1021"use strict";
2
3class TokenBucket {
4 constructor(capacity, window, start) {
5 this.capacity = capacity;
6 this.window = window;
7 this.tokens = capacity;
8 this.lastRefill = start;
9 }
10
11 allow(timestamp) {
12 const elapsed = timestamp - this.lastRefill;
13 if (elapsed > 0 && this.window > 0) {
14 const refill = Math.floor((elapsed * this.capacity) / this.window);
15 if (refill > 0) {
16 if (this.tokens + refill >= this.capacity) {
17 this.tokens = this.capacity;
18 this.lastRefill = timestamp;
19 } else {
20 this.tokens += refill;
21 this.lastRefill += Math.floor((refill * this.window) / this.capacity);
22 }
23 }
24 }
25 if (this.tokens > 0) {
26 this.tokens--;
27 return true;
28 }
29 return false;
30 }
31}
32
33class RateLimiter {
34 constructor(capacity, window) {
35 this.capacity = capacity;
36 this.window = window;
37 this.buckets = new Map();
38 }
39
40 request(client, timestamp) {
41 if (!this.buckets.has(client)) {
42 this.buckets.set(client, new TokenBucket(this.capacity, this.window, timestamp));
43 }
44 return this.buckets.get(client).allow(timestamp) ? "allow" : "deny";
45 }
46}
47
48function rateLimiter(capacity, window, instructions) {
49 const limiter = new RateLimiter(capacity, window);
50 const output = [];
51 for (const instruction of instructions) {
52 const [command, ...args] = instruction;
53 if (command === "request") {
54 output.push(limiter.request(args[0], parseInt(args[1])));
55 }
56 }
57 return output;
58}
59
60function splitWords(s) {
61 return s === "" ? [] : s.split(" ");
62}
63
64function* main() {
65 const capacity = parseInt(yield);
66 const window = parseInt(yield);
67 const instructionsLength = parseInt(yield);
68 const instructions = [];
69 for (let i = 0; i < instructionsLength; i++) {
70 instructions.push(splitWords(yield));
71 }
72 const res = rateLimiter(capacity, window, instructions);
73 for (const line of res) {
74 console.log(line);
75 }
76}
77
78class EOFError extends Error {}
79{
80 const gen = main();
81 const next = (line) => gen.next(line).done && process.exit();
82 let buf = "";
83 next();
84 process.stdin.setEncoding("utf8");
85 process.stdin.on("data", (data) => {
86 const lines = (buf + data).split("\n");
87 buf = lines.pop();
88 lines.forEach(next);
89 });
90 process.stdin.on("end", () => {
91 buf && next(buf);
92 gen.throw(new EOFError());
93 });
94}
951interface RateLimitStrategy {
2 allow(timestamp: number): boolean;
3}
4
5class TokenBucket implements RateLimitStrategy {
6 private tokens: number;
7 private lastRefill: number;
8
9 constructor(
10 private readonly capacity: number,
11 private readonly window: number,
12 start: number
13 ) {
14 this.tokens = capacity;
15 this.lastRefill = start;
16 }
17
18 allow(timestamp: number): boolean {
19 const elapsed = timestamp - this.lastRefill;
20 if (elapsed > 0 && this.window > 0) {
21 const refill = Math.floor((elapsed * this.capacity) / this.window);
22 if (refill > 0) {
23 if (this.tokens + refill >= this.capacity) {
24 this.tokens = this.capacity;
25 this.lastRefill = timestamp;
26 } else {
27 this.tokens += refill;
28 this.lastRefill += Math.floor((refill * this.window) / this.capacity);
29 }
30 }
31 }
32 if (this.tokens > 0) {
33 this.tokens--;
34 return true;
35 }
36 return false;
37 }
38}
39
40class RateLimiter {
41 private buckets = new Map<string, RateLimitStrategy>();
42
43 constructor(
44 private readonly capacity: number,
45 private readonly window: number
46 ) {}
47
48 request(client: string, timestamp: number): string {
49 if (!this.buckets.has(client)) {
50 this.buckets.set(
51 client,
52 new TokenBucket(this.capacity, this.window, timestamp)
53 );
54 }
55 return this.buckets.get(client)!.allow(timestamp) ? "allow" : "deny";
56 }
57}
58
59function rateLimiter(
60 capacity: number,
61 window: number,
62 instructions: string[][]
63): string[] {
64 const limiter = new RateLimiter(capacity, window);
65 const output: string[] = [];
66 for (const instruction of instructions) {
67 const [command, ...args] = instruction;
68 if (command === "request") {
69 output.push(limiter.request(args[0], parseInt(args[1])));
70 }
71 }
72 return output;
73}
74
75function splitWords(s: string): string[] {
76 return s === "" ? [] : s.split(" ");
77}
78
79function* main() {
80 const capacity = parseInt(yield);
81 const window = parseInt(yield);
82 const instructionsLength = parseInt(yield);
83 const instructions = [];
84 for (let i = 0; i < instructionsLength; i++) {
85 instructions.push(splitWords(yield));
86 }
87 const res = rateLimiter(capacity, window, instructions);
88 for (const line of res) {
89 console.log(line);
90 }
91}
92
93class EOFError extends Error {}
94{
95 const gen = main();
96 const next = (line?: string) => gen.next(line ?? "").done && process.exit();
97 let buf = "";
98 next();
99 process.stdin.setEncoding("utf8");
100 process.stdin.on("data", (data: string) => {
101 const lines = (buf + data).split("\n");
102 buf = lines.pop() ?? "";
103 lines.forEach(next);
104 });
105 process.stdin.on("end", () => {
106 buf && next(buf);
107 gen.throw(new EOFError());
108 });
109}
1101#include <algorithm>
2#include <iostream>
3#include <iterator>
4#include <limits>
5#include <sstream>
6#include <string>
7#include <vector>
8
9#include <string>
10#include <vector>
11#include <unordered_map>
12#include <algorithm>
13
14class RateLimitStrategy {
15public:
16 virtual bool allow(int timestamp) = 0;
17 virtual ~RateLimitStrategy() = default;
18};
19
20class TokenBucket : public RateLimitStrategy {
21private:
22 int capacity;
23 int window;
24 int tokens;
25 int lastRefill;
26public:
27 TokenBucket(int capacity, int window, int start)
28 : capacity(capacity), window(window), tokens(capacity), lastRefill(start) {}
29
30 bool allow(int timestamp) override {
31 int elapsed = timestamp - lastRefill;
32 if (elapsed > 0 && window > 0) {
33 int refill = elapsed * capacity / window;
34 if (refill > 0) {
35 if (tokens + refill >= capacity) {
36 tokens = capacity;
37 lastRefill = timestamp;
38 } else {
39 tokens += refill;
40 lastRefill += refill * window / capacity;
41 }
42 }
43 }
44 if (tokens > 0) {
45 tokens--;
46 return true;
47 }
48 return false;
49 }
50};
51
52class RateLimiterObj {
53private:
54 int capacity;
55 int window;
56 std::unordered_map<std::string, TokenBucket*> buckets;
57public:
58 RateLimiterObj(int capacity, int window)
59 : capacity(capacity), window(window) {}
60
61 ~RateLimiterObj() {
62 for (auto& pair : buckets) delete pair.second;
63 }
64
65 std::string request(const std::string& client, int timestamp) {
66 if (buckets.find(client) == buckets.end()) {
67 buckets[client] = new TokenBucket(capacity, window, timestamp);
68 }
69 return buckets[client]->allow(timestamp) ? "allow" : "deny";
70 }
71};
72
73std::vector<std::string> rate_limiter(int capacity, int window, std::vector<std::vector<std::string>> instructions) {
74 RateLimiterObj limiter(capacity, window);
75 std::vector<std::string> output;
76 for (const auto& instruction : instructions) {
77 const std::string& command = instruction[0];
78 if (command == "request") {
79 const std::string& client = instruction[1];
80 int ts = std::stoi(instruction[2]);
81 output.push_back(limiter.request(client, ts));
82 }
83 }
84 return output;
85}
86
87void ignore_line() {
88 std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
89}
90
91template<typename T>
92std::vector<T> get_words() {
93 std::string line;
94 std::getline(std::cin, line);
95 std::istringstream ss{line};
96 ss >> std::boolalpha;
97 std::vector<T> v;
98 std::copy(std::istream_iterator<T>{ss}, std::istream_iterator<T>{}, std::back_inserter(v));
99 return v;
100}
101
102int main() {
103 int capacity;
104 std::cin >> capacity;
105 ignore_line();
106 int window;
107 std::cin >> window;
108 ignore_line();
109 int instructions_length;
110 std::cin >> instructions_length;
111 ignore_line();
112 std::vector<std::vector<std::string>> instructions;
113 for (int i = 0; i < instructions_length; i++) {
114 instructions.emplace_back(get_words<std::string>());
115 }
116 std::vector<std::string> res = rate_limiter(capacity, window, instructions);
117 for (const std::string& line : res) {
118 std::cout << line << '\n';
119 }
120}
1211class TokenBucket
2 def initialize(capacity, window, start)
3 @capacity = capacity
4 @window = window
5 @tokens = capacity
6 @last_refill = start
7 end
8
9 def allow(timestamp)
10 elapsed = timestamp - @last_refill
11 if elapsed > 0 && @window > 0
12 refill = (elapsed * @capacity) / @window
13 if refill > 0
14 if @tokens + refill >= @capacity
15 @tokens = @capacity
16 @last_refill = timestamp
17 else
18 @tokens += refill
19 @last_refill += (refill * @window) / @capacity
20 end
21 end
22 end
23 if @tokens > 0
24 @tokens -= 1
25 return true
26 end
27 false
28 end
29end
30
31class RateLimiterObj
32 def initialize(capacity, window)
33 @capacity = capacity
34 @window = window
35 @buckets = {}
36 end
37
38 def request(client, timestamp)
39 @buckets[client] ||= TokenBucket.new(@capacity, @window, timestamp)
40 @buckets[client].allow(timestamp) ? "allow" : "deny"
41 end
42end
43
44def rate_limiter(capacity, window, instructions)
45 limiter = RateLimiterObj.new(capacity, window)
46 output = []
47 instructions.each do |instruction|
48 command = instruction[0]
49 if command == "request"
50 client = instruction[1]
51 ts = instruction[2].to_i
52 output << limiter.request(client, ts)
53 end
54 end
55 output
56end
57
58if __FILE__ == $0
59 capacity = Integer(gets, 10)
60 window = Integer(gets, 10)
61 instructions = Array.new(Integer(gets, 10)) { gets.split }
62 res = rate_limiter(capacity, window, instructions)
63 res.each { |line| puts(line) }
64end
65Run and Test
The exercise drives the design through a command stream so it can be checked deterministically. The input supplies an explicit timestamp with every request rather than reading a wall clock, which means the same input always produces the same output — no time.sleep, no mocking.
The sequence diagram below traces a single request call through all three objects:
Trace test 2 to see the contract. Capacity is 3, window is 10 (so 3 tokens refill over 10 time
units, meaning 1 token per 10/3 ≈ 3.33 units in float terms, but in integer math
floor(elapsed * 3 / 10)).
Requests at t=0: the bucket starts full at 3 tokens.
request alice 0 → allow (tokens: 3→2) request alice 0 → allow (tokens: 2→1) request alice 0 → allow (tokens: 1→0) request alice 0 → deny (tokens: 0, elapsed=0, refill=0)
At t=10: elapsed = 10, refill = (10 * 3) // 10 = 3. Since tokens + refill = 0 + 3 >= 3 (capacity), the full-bucket branch runs: tokens = 3, last_refill snapped to 10.
request alice 10 → allow (tokens: 3→2) request alice 10 → allow (tokens: 2→1) request alice 10 → allow (tokens: 1→0) request alice 10 → deny (tokens: 0, elapsed=0, refill=0)
The other tests cover two independent clients (test 3, whose buckets are completely separate), a sub-window partial refill at t=2 in a capacity-2/window-4 configuration (test 4), capacity-1 strict one-at-a-time enforcement (test 5), and a 1-token-per-unit partial refill at t=1 in a capacity-5/window-5 configuration (test 6).
Concurrency and Thread Safety
In production, a rate limiter sits in the request path of a multi-threaded server. Many goroutines
or threads evaluate the same client's bucket at the same time. The read-check-decrement sequence
inside allow is not atomic, which creates a race on the last token.
Suppose Alice's bucket holds exactly 1 token and two threads arrive simultaneously:
Both threads see tokens = 1, both pass the check, and both decrement. The limiter allows two
requests where only one should have been permitted. This is the classic lost-update race.
The fix is a per-client lock guarding the read-check-decrement sequence. The RateLimiter
acquires the bucket's lock before calling allow and releases it after:
Thread 2 now sees the post-decrement state and correctly returns deny.
The critical design choice is where to place the lock. A single global lock on the
RateLimiter is simple but serialises all clients: Alice's request blocks Bob's even though their
buckets are completely independent. The right level is one lock per client — but a per-bucket lock
alone is not enough, because the bucket itself is created on the first request and that creation is
its own race. If two first requests for a new client both find no bucket, both create a full one,
and both allow, the later map write hides one of them and a decrement is lost. The fix has two
levels: make get-or-create atomic, then lock the bucket. In Java, ConcurrentHashMap.computeIfAbsent
does both — it creates the bucket exactly once under the map's bin lock, and each bucket carries a
ReentrantLock for the refill-and-decrement. In Python, hold a limiter-level lock (or a striped lock
keyed on the client, below) just long enough to get-or-create the bucket, then acquire that bucket's
threading.Lock for allow.
For very high client counts — millions of distinct keys — even per-client locks can be heavy if
kept as live objects. Lock striping bounds memory: allocate a fixed pool of N locks (typically
128 or 256) and assign each client to a stripe via hash(client_key) % N. The pool size is
constant regardless of client count, and contention is low because clients spread across stripes.
Two clients can share a lock only by hash collision, and the critical section (a few integer
operations) is short enough that the wait is negligible. This is the same idea used in
ConcurrentHashMap's internal segment design.
The deterministic exercise above sidesteps concurrency entirely — timestamps are caller-supplied
and commands are sequential — which is the right choice for a testable interface. In a real
deployment, timestamps come from time.monotonic() or System.nanoTime() inside the lock, so
the refill math and the consume step are always atomic with respect to each other.
Extensions
Every extension is a new concrete class behind the same interface — RateLimiter never changes:
A sliding-window log records the exact timestamp of every request in a deque. On each check,
entries older than the window are evicted, then the remainder is counted. If the count is below
the limit, the new timestamp is recorded and the request is allowed. This gives exact per-second
semantics with no burst at window boundaries, at the cost of O(limit) memory per client.
Implementing it as a second RateLimitStrategy concrete class requires zero changes to
RateLimiter.
A fixed-window counter divides time into fixed buckets (per second, per minute) and keeps a
counter per bucket. It uses O(1) state and is easy to implement in Redis with INCR and a TTL.
The trade-off is a burst at window boundaries: a client can send limit requests at the end of
one window and limit more at the start of the next, doubling the effective rate for one instant.
Distributed rate limiting replaces the in-process state with Redis so multiple server instances
share the same quota per client. The simplest approach is a Redis fixed-window counter: each
allow call increments a key scoped to the client and the current time window (INCRBY), rejects
the request if the count exceeds the limit, and sets a TTL on the key so it expires at window end.
This is a different algorithm from the in-memory token bucket above — it counts requests in a fixed
window rather than tracking a token count and a last-refill timestamp. A true distributed token
bucket requires an atomic Lua script that reads the stored token count and last-refill timestamp,
computes the refill, decrements, and writes back — all in one server-side operation. The
RateLimitStrategy interface absorbs either choice: a RedisFixedWindowCounter implements the
same allow method, routing to Redis instead of an in-memory counter.
Per-tier limits add a second dimension: different client tiers (free, pro, enterprise) get
different capacities and windows. The RateLimiter looks up the tier at client-creation time and
constructs the bucket with tier-appropriate parameters. The strategy interface is unchanged.
Each extension lands by adding a class or swapping an implementation, never by rewriting the
RateLimiter.
Quiz
Test your understanding of the key design decisions in this rate limiter.
Rate Limiter Design Quiz
Why is RateLimitStrategy defined as an interface rather than a concrete base class wired directly into RateLimiter?
Why does each client get its own independent TokenBucket rather than sharing one bucket across all clients?
The TokenBucket updates last_refill differently depending on whether the refill fills the bucket. When does it snap last_refill to the current timestamp, and when does it advance by floor(refill * window / capacity)?
Two threads both read tokens=1 on Alice's bucket simultaneously. Both pass the tokens > 0 check, both decrement, and both return allow — but only one should have been allowed. What is this race called, and how does the design fix it?
How would a distributed rate limiter differ from this in-process design, and how does the RateLimitStrategy interface accommodate it?