Design a Rate Limiter

A rate limiter has compact domain logic — "has this client exceeded their quota?" — the mark of a utility problem where the design work is precision, not breadth. That work centers on three concerns: choosing an algorithm whose state is cheap and deterministic, keeping per-client accounting independent, and making the allow/deny check safe when many threads evaluate the same client simultaneously. The design separates the policy-enforcement shell from the pluggable algorithm inside it.

See It in Action

The demo runs the finished design: one rate limiter giving each client an independent quota. Pick a client, spend its allowance with each request, and watch it refill over time. Spamming one client never touches another's allowance — the per-client independence at the heart of the design.

Rate Limiter

Clarifying Requirements

The functional requirements we will commit to: the limiter is keyed per client string; each client gets a quota of requests per configurable window; a request is either allowed or denied; and the accounting is computed lazily at request time using integer arithmetic so the behaviour is deterministic and reproducible in tests. Timestamps are supplied explicitly rather than read from a wall clock, making the whole system testable without mocking.

Which counting algorithm enforces that quota is the first design decision, not a requirement. Several schemes can do it and they differ sharply in memory and burst behavior. Before reading on, decide which one limits bursts without paying unbounded memory per client.

Decision checkpoint

A per-client quota of N requests per window can be enforced several ways. Use a fixed-window counter (count requests, reset the count every window), a sliding-window log (timestamp every request, count those in the trailing window), or a token bucket (tokens refill at a steady rate, each request spends one)?

So the design uses a token bucket: each client holds tokens that refill at a steady rate, and a request is allowed only when a token is available to spend. The flowchart traces what that does on every request:

Core Entities

The token bucket is chosen, but one fork remains before the class list. The requirements still name sliding-window log and fixed-window counter as options a real system might switch to, so: should the counting algorithm be baked directly into the limiter, or sit behind an interface the limiter holds a reference to? Decide before reading on.

Decision checkpoint

Token bucket is the algorithm today, but sliding-window log and fixed-window counter are named as future options. Should RateLimiter wire TokenBucket in directly, or stay independent of which algorithm runs?

With that decided, the entity list is three things.

A RateLimitStrategy is the pluggable algorithm interface. It answers one question: "should this request be allowed, given the current timestamp?" The interface lets us swap token bucket for sliding-window log or fixed-window counter without touching the surrounding machinery.

A TokenBucket is a concrete strategy, and the mental model is worth building before the code. Picture a bucket that holds up to capacity tokens. Every request takes one token out; if the bucket is empty, the request is denied. Tokens are added back at a steady rate — capacity tokens over each window of time — filling only up to the brim. So capacity is the size of the burst you tolerate and window is how long a fully drained bucket takes to refill. With capacity = 10 and window = 60s, a client can fire 10 requests at once, then about one more every 6 seconds as tokens trickle back.

Refill is lazy. Rather than a background timer adding tokens every few milliseconds — which would burn CPU on idle clients and want a thread per bucket — the bucket adds nothing until a request actually arrives. At that moment it asks "how many tokens would have refilled since I last looked?" and adds them in one step. Same arithmetic, computed only when it matters; the last_refill timestamp is the "when I last looked" marker.

That count of would-be tokens is refill = floor(elapsed * capacity / window). The refill rate is capacity / window tokens per unit of time, so over elapsed units the bucket has earned elapsed * capacity / window of them; floor rounds down because you cannot add a fraction of a token. With capacity = 10, window = 60 seconds, and elapsed = 18 seconds, that is floor(18 * 10 / 60) = floor(3) = 3 tokens. That floor is also where the accounting gets subtle.

Decision checkpoint

After adding refill = floor(elapsed * capacity / window) tokens, what should last_refill become? Picture a request that earned 2.4 tokens' worth of elapsed time but floor granted only 2.

So when the refill would fill the bucket (tokens + refill >= capacity), tokens are set to capacity and last_refill is snapped to the current timestamp, discarding idle overflow time. Otherwise tokens are added and last_refill advances by floor(refill * window / capacity), so the fractional remainder carries forward and avoids drift.

The limiter has to enforce the quota separately for each client. That raises one more structural question before the orchestrator takes shape.

Decision checkpoint

The quota is per client. Should the limiter keep one shared bucket for all clients, or one bucket per client?

A RateLimiter is the orchestrator. It owns a dictionary mapping client strings to their per-client bucket. When a request arrives, it looks up or creates the bucket and delegates to the strategy. This is the one object the outside world talks to.

The entities-overview diagram shows how the three objects are connected at runtime:

Designing the Classes

With the entities settled, the next job is to derive each class's state and behavior directly from the requirements — no fields added because they "seem natural," only fields demanded by a specific requirement. The guiding principle is Tell, Don't Ask: each class enforces its own rules internally rather than exposing raw data for callers to manipulate.

RateLimitStrategy

State: none. The interface carries no data — it is a pure behavioral contract.

Behavior: one method, allow(timestamp: int) -> bool. The timestamp is passed in rather than read from a wall clock so every concrete strategy is independently testable without mocking time. The return value is a boolean; the caller decides what string to return to the user, keeping the strategy focused entirely on the allow/deny decision.

TokenBucket

The requirements specify a token bucket with a configurable capacity and refill window. Refill is lazy and uses integer arithmetic. These three sentences determine the state entirely.

State: _capacity (the ceiling — tokens never exceed this), _window (the number of time units for a full refill), _tokens (the current count, decremented on each allowed request), and _last_refill (the timestamp of the last refill event). A new bucket starts at capacity tokens with _last_refill set to the client's first-request timestamp — not to zero — so no free refill is awarded on arrival.

Behavior: allow(timestamp: int) -> bool. This single method does three things in order: compute how many tokens to add, update the token count and pointer, then consume one token if any remain. The critical design decision is the integer refill math:

refill = floor(elapsed * capacity / window)

This avoids floats entirely. elapsed * capacity is computed first (integer multiplication), then divided by window with floor division. The result is exact and deterministic across all runtimes. The bucket then branches on whether the refill would fill it. If tokens + refill >= capacity, tokens are set to capacity and _last_refill is snapped to timestamp — discarding idle overflow time. This prevents a long idle gap followed by many same-timestamp requests from each re-triggering a refill and granting more than capacity allows in total; after snapping, every subsequent same-timestamp call sees elapsed = 0 and adds nothing. If the refill only partially fills the bucket, tokens are added and _last_refill advances by floor(refill * window / capacity) — exactly the wall-time consumed by the tokens just added — preserving the fractional remainder for the next call.

RateLimiter

The requirements say the limiter is keyed per client. Each client gets an independent bucket with the same capacity and window. These requirements determine the state: _capacity and _window (held so new buckets can be constructed on demand) and _buckets (a dictionary from client string to RateLimitStrategy).

Behavior: request(client: str, timestamp: int) -> str. The method looks up the client in _buckets; if the client is new, it creates a fresh TokenBucket with the client's first timestamp as start. It then delegates to bucket.allow(timestamp) and returns "allow" or "deny". The method never touches token counts directly — that is Tell, Don't Ask in action. RateLimiter tells the bucket "decide this request," and the bucket decides.

Note that _buckets is typed as dict[str, RateLimitStrategy] — the orchestrator knows only the interface, not the concrete class. Swapping TokenBucket for a SlidingWindowLog requires no change to RateLimiter.

Reading the diagram: the RateLimiter holds a map of RateLimitStrategy instances — each a TokenBucket at runtime, one per client. The map is keyed on the client string and created lazily on first request. The public surface the outside world touches is just rate_limiter(capacity, window, instructions) — a single entry point that replays a command stream. Everything else is private.

Try It Yourself

Before reading the implementation, build it. Implement a function rate_limiter(capacity, window, instructions) that replays a sequence of commands and returns one result string per request.

Parameters:

capacity is the maximum number of tokens in each client's bucket (also the initial fill level). window is the number of time units required for a full refill. The refill rate is capacity / window tokens per time unit, computed with integer math: refill = floor(elapsed * capacity / window). If tokens + refill >= capacity, tokens are set to capacity and last_refill is snapped to timestamp (discarding idle overflow time, so a long idle gap followed by many same-timestamp requests yields exactly capacity allows). Otherwise, last_refill advances by floor(refill * window / capacity), preserving the fractional remainder.

Commands:

Each instruction is a list of strings. The only command is ["request", client, timestamp]: at integer time timestamp, attempt to spend 1 token for client. Timestamps are non-decreasing. For each request, append "allow" if a token was available, or "deny" otherwise.

A new client's bucket starts full at capacity tokens, with last_refill set to the timestamp of their first request (not to zero), so no free refill is awarded on arrival.

Run your solution against the test cases. The reference implementation follows below.

Code Walkthrough

The design is three small classes. Each section below walks through the implementation in the same order as the class design: interface first, then the concrete strategy, then the orchestrator, then the entry function.

The interface is a single abstract method. In Python, the ABC machinery enforces that every concrete subclass provides its own allow implementation.

There is no state and no logic here. The value of this class is entirely in the type: RateLimiter holds dict[str, RateLimitStrategy], so the orchestrator is coupled only to the interface. Any future algorithm that implements allow slots in without changing a single line in RateLimiter.

1from abc import ABC, abstractmethod
2
3
4class RateLimitStrategy(ABC):
5    @abstractmethod
6    def allow(self, timestamp: int) -> bool: ...

Full Solution

1from abc import ABC, abstractmethod
2
3
4class RateLimitStrategy(ABC):
5    @abstractmethod
6    def allow(self, timestamp: int) -> bool: ...
7
8
9class TokenBucket(RateLimitStrategy):
10    """Token bucket with integer lazy-refill math (no floats).
11
12    refill = floor(elapsed * capacity / window). If that fills the bucket,
13    snap last_refill to now so idle time beyond a full bucket is discarded
14    (otherwise a long idle gap would let many same-timestamp requests each
15    re-trigger a refill). Otherwise advance last_refill only by the
16    wall-time the added tokens consumed, preserving the fractional
17    remainder for the next check.
18    """
19
20    def __init__(self, capacity: int, window: int, start: int) -> None:
21        self._capacity: int = capacity
22        self._window: int = window
23        self._tokens: int = capacity
24        self._last_refill: int = start
25
26    def allow(self, timestamp: int) -> bool:
27        elapsed: int = timestamp - self._last_refill
28        if elapsed > 0 and self._window > 0:
29            refill: int = (elapsed * self._capacity) // self._window
30            if refill > 0:
31                if self._tokens + refill >= self._capacity:
32                    self._tokens = self._capacity
33                    self._last_refill = timestamp
34                else:
35                    self._tokens += refill
36                    self._last_refill += (
37                        refill * self._window
38                    ) // self._capacity
39        if self._tokens > 0:
40            self._tokens -= 1
41            return True
42        return False
43
44
45class RateLimiter:
46    def __init__(self, capacity: int, window: int) -> None:
47        self._capacity: int = capacity
48        self._window: int = window
49        self._buckets: dict[str, RateLimitStrategy] = {}
50
51    def request(self, client: str, timestamp: int) -> str:
52        if client not in self._buckets:
53            self._buckets[client] = TokenBucket(
54                self._capacity, self._window, timestamp
55            )
56        return "allow" if self._buckets[client].allow(timestamp) else "deny"
57
58
59def rate_limiter(
60    capacity: int, window: int, instructions: list[list[str]]
61) -> list[str]:
62    limiter = RateLimiter(capacity, window)
63    output: list[str] = []
64    for instruction in instructions:
65        command, *args = instruction
66        if command == "request":
67            client, ts = args[0], int(args[1])
68            output.append(limiter.request(client, ts))
69    return output
70
71if __name__ == "__main__":
72    capacity = int(input())
73    window = int(input())
74    instructions = [input().split() for _ in range(int(input()))]
75    res = rate_limiter(capacity, window, instructions)
76    for line in res:
77        print(line)
78

1import java.util.ArrayList;
2import java.util.Arrays;
3import java.util.HashMap;
4import java.util.List;
5import java.util.Map;
6import java.util.Scanner;
7
8class Solution {
9    interface RateLimitStrategy {
10        boolean allow(int timestamp);
11    }
12
13    static class TokenBucket implements RateLimitStrategy {
14        private final int capacity;
15        private final int window;
16        private int tokens;
17        private int lastRefill;
18
19        TokenBucket(int capacity, int window, int start) {
20            this.capacity = capacity;
21            this.window = window;
22            this.tokens = capacity;
23            this.lastRefill = start;
24        }
25
26        @Override
27        public boolean allow(int timestamp) {
28            int elapsed = timestamp - lastRefill;
29            if (elapsed > 0 && window > 0) {
30                int refill = elapsed * capacity / window;
31                if (refill > 0) {
32                    if (tokens + refill >= capacity) {
33                        tokens = capacity;
34                        lastRefill = timestamp;
35                    } else {
36                        tokens += refill;
37                        lastRefill += refill * window / capacity;
38                    }
39                }
40            }
41            if (tokens > 0) {
42                tokens--;
43                return true;
44            }
45            return false;
46        }
47    }
48
49    static class RateLimiter {
50        private final int capacity;
51        private final int window;
52        private final Map<String, RateLimitStrategy> buckets = new HashMap<>();
53
54        RateLimiter(int capacity, int window) {
55            this.capacity = capacity;
56            this.window = window;
57        }
58
59        String request(String client, int timestamp) {
60            buckets.computeIfAbsent(
61                client, k -> new TokenBucket(capacity, window, timestamp)
62            );
63            return buckets.get(client).allow(timestamp) ? "allow" : "deny";
64        }
65    }
66
67    // Main entry
68    public static List<String> rateLimiter(int capacity, int window, List<List<String>> instructions) {
69        RateLimiter limiter = new RateLimiter(capacity, window);
70        List<String> output = new ArrayList<>();
71        for (List<String> instruction : instructions) {
72            String command = instruction.get(0);
73            if (command.equals("request")) {
74                String client = instruction.get(1);
75                int ts = Integer.parseInt(instruction.get(2));
76                output.add(limiter.request(client, ts));
77            }
78        }
79        return output;
80    }
81
82    public static List<String> splitWords(String s) {
83        return s.isEmpty() ? List.of() : Arrays.asList(s.split(" "));
84    }
85
86    public static void main(String[] args) {
87        java.util.Scanner scanner = new java.util.Scanner(System.in);
88        int capacity = Integer.parseInt(scanner.nextLine());
89        int window = Integer.parseInt(scanner.nextLine());
90        int instructionsLength = Integer.parseInt(scanner.nextLine());
91        List<List<String>> instructions = new ArrayList<>();
92        for (int i = 0; i < instructionsLength; i++) {
93            instructions.add(splitWords(scanner.nextLine()));
94        }
95        scanner.close();
96        List<String> res = rateLimiter(capacity, window, instructions);
97        for (String line : res) {
98            System.out.println(line);
99        }
100    }
101}
102

1"use strict";
2
3class TokenBucket {
4    constructor(capacity, window, start) {
5        this.capacity = capacity;
6        this.window = window;
7        this.tokens = capacity;
8        this.lastRefill = start;
9    }
10
11    allow(timestamp) {
12        const elapsed = timestamp - this.lastRefill;
13        if (elapsed > 0 && this.window > 0) {
14            const refill = Math.floor((elapsed * this.capacity) / this.window);
15            if (refill > 0) {
16                if (this.tokens + refill >= this.capacity) {
17                    this.tokens = this.capacity;
18                    this.lastRefill = timestamp;
19                } else {
20                    this.tokens += refill;
21                    this.lastRefill += Math.floor((refill * this.window) / this.capacity);
22                }
23            }
24        }
25        if (this.tokens > 0) {
26            this.tokens--;
27            return true;
28        }
29        return false;
30    }
31}
32
33class RateLimiter {
34    constructor(capacity, window) {
35        this.capacity = capacity;
36        this.window = window;
37        this.buckets = new Map();
38    }
39
40    request(client, timestamp) {
41        if (!this.buckets.has(client)) {
42            this.buckets.set(client, new TokenBucket(this.capacity, this.window, timestamp));
43        }
44        return this.buckets.get(client).allow(timestamp) ? "allow" : "deny";
45    }
46}
47
48function rateLimiter(capacity, window, instructions) {
49    const limiter = new RateLimiter(capacity, window);
50    const output = [];
51    for (const instruction of instructions) {
52        const [command, ...args] = instruction;
53        if (command === "request") {
54            output.push(limiter.request(args[0], parseInt(args[1])));
55        }
56    }
57    return output;
58}
59
60function splitWords(s) {
61    return s === "" ? [] : s.split(" ");
62}
63
64function* main() {
65    const capacity = parseInt(yield);
66    const window = parseInt(yield);
67    const instructionsLength = parseInt(yield);
68    const instructions = [];
69    for (let i = 0; i < instructionsLength; i++) {
70        instructions.push(splitWords(yield));
71    }
72    const res = rateLimiter(capacity, window, instructions);
73    for (const line of res) {
74        console.log(line);
75    }
76}
77
78class EOFError extends Error {}
79{
80    const gen = main();
81    const next = (line) => gen.next(line).done && process.exit();
82    let buf = "";
83    next();
84    process.stdin.setEncoding("utf8");
85    process.stdin.on("data", (data) => {
86        const lines = (buf + data).split("\n");
87        buf = lines.pop();
88        lines.forEach(next);
89    });
90    process.stdin.on("end", () => {
91        buf && next(buf);
92        gen.throw(new EOFError());
93    });
94}
95

1interface RateLimitStrategy {
2  allow(timestamp: number): boolean;
3}
4
5class TokenBucket implements RateLimitStrategy {
6  private tokens: number;
7  private lastRefill: number;
8
9  constructor(
10    private readonly capacity: number,
11    private readonly window: number,
12    start: number
13  ) {
14    this.tokens = capacity;
15    this.lastRefill = start;
16  }
17
18  allow(timestamp: number): boolean {
19    const elapsed = timestamp - this.lastRefill;
20    if (elapsed > 0 && this.window > 0) {
21      const refill = Math.floor((elapsed * this.capacity) / this.window);
22      if (refill > 0) {
23        if (this.tokens + refill >= this.capacity) {
24          this.tokens = this.capacity;
25          this.lastRefill = timestamp;
26        } else {
27          this.tokens += refill;
28          this.lastRefill += Math.floor((refill * this.window) / this.capacity);
29        }
30      }
31    }
32    if (this.tokens > 0) {
33      this.tokens--;
34      return true;
35    }
36    return false;
37  }
38}
39
40class RateLimiter {
41  private buckets = new Map<string, RateLimitStrategy>();
42
43  constructor(
44    private readonly capacity: number,
45    private readonly window: number
46  ) {}
47
48  request(client: string, timestamp: number): string {
49    if (!this.buckets.has(client)) {
50      this.buckets.set(
51        client,
52        new TokenBucket(this.capacity, this.window, timestamp)
53      );
54    }
55    return this.buckets.get(client)!.allow(timestamp) ? "allow" : "deny";
56  }
57}
58
59function rateLimiter(
60  capacity: number,
61  window: number,
62  instructions: string[][]
63): string[] {
64  const limiter = new RateLimiter(capacity, window);
65  const output: string[] = [];
66  for (const instruction of instructions) {
67    const [command, ...args] = instruction;
68    if (command === "request") {
69      output.push(limiter.request(args[0], parseInt(args[1])));
70    }
71  }
72  return output;
73}
74
75function splitWords(s: string): string[] {
76    return s === "" ? [] : s.split(" ");
77}
78
79function* main() {
80    const capacity = parseInt(yield);
81    const window = parseInt(yield);
82    const instructionsLength = parseInt(yield);
83    const instructions = [];
84    for (let i = 0; i < instructionsLength; i++) {
85        instructions.push(splitWords(yield));
86    }
87    const res = rateLimiter(capacity, window, instructions);
88    for (const line of res) {
89        console.log(line);
90    }
91}
92
93class EOFError extends Error {}
94{
95    const gen = main();
96    const next = (line?: string) => gen.next(line ?? "").done && process.exit();
97    let buf = "";
98    next();
99    process.stdin.setEncoding("utf8");
100    process.stdin.on("data", (data: string) => {
101        const lines = (buf + data).split("\n");
102        buf = lines.pop() ?? "";
103        lines.forEach(next);
104    });
105    process.stdin.on("end", () => {
106        buf && next(buf);
107        gen.throw(new EOFError());
108    });
109}
110

1#include <algorithm>
2#include <iostream>
3#include <iterator>
4#include <limits>
5#include <sstream>
6#include <string>
7#include <vector>
8
9#include <string>
10#include <vector>
11#include <unordered_map>
12#include <algorithm>
13
14class RateLimitStrategy {
15public:
16    virtual bool allow(int timestamp) = 0;
17    virtual ~RateLimitStrategy() = default;
18};
19
20class TokenBucket : public RateLimitStrategy {
21private:
22    int capacity;
23    int window;
24    int tokens;
25    int lastRefill;
26public:
27    TokenBucket(int capacity, int window, int start)
28        : capacity(capacity), window(window), tokens(capacity), lastRefill(start) {}
29
30    bool allow(int timestamp) override {
31        int elapsed = timestamp - lastRefill;
32        if (elapsed > 0 && window > 0) {
33            int refill = elapsed * capacity / window;
34            if (refill > 0) {
35                if (tokens + refill >= capacity) {
36                    tokens = capacity;
37                    lastRefill = timestamp;
38                } else {
39                    tokens += refill;
40                    lastRefill += refill * window / capacity;
41                }
42            }
43        }
44        if (tokens > 0) {
45            tokens--;
46            return true;
47        }
48        return false;
49    }
50};
51
52class RateLimiterObj {
53private:
54    int capacity;
55    int window;
56    std::unordered_map<std::string, TokenBucket*> buckets;
57public:
58    RateLimiterObj(int capacity, int window)
59        : capacity(capacity), window(window) {}
60
61    ~RateLimiterObj() {
62        for (auto& pair : buckets) delete pair.second;
63    }
64
65    std::string request(const std::string& client, int timestamp) {
66        if (buckets.find(client) == buckets.end()) {
67            buckets[client] = new TokenBucket(capacity, window, timestamp);
68        }
69        return buckets[client]->allow(timestamp) ? "allow" : "deny";
70    }
71};
72
73std::vector<std::string> rate_limiter(int capacity, int window, std::vector<std::vector<std::string>> instructions) {
74    RateLimiterObj limiter(capacity, window);
75    std::vector<std::string> output;
76    for (const auto& instruction : instructions) {
77        const std::string& command = instruction[0];
78        if (command == "request") {
79            const std::string& client = instruction[1];
80            int ts = std::stoi(instruction[2]);
81            output.push_back(limiter.request(client, ts));
82        }
83    }
84    return output;
85}
86
87void ignore_line() {
88    std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
89}
90
91template<typename T>
92std::vector<T> get_words() {
93    std::string line;
94    std::getline(std::cin, line);
95    std::istringstream ss{line};
96    ss >> std::boolalpha;
97    std::vector<T> v;
98    std::copy(std::istream_iterator<T>{ss}, std::istream_iterator<T>{}, std::back_inserter(v));
99    return v;
100}
101
102int main() {
103    int capacity;
104    std::cin >> capacity;
105    ignore_line();
106    int window;
107    std::cin >> window;
108    ignore_line();
109    int instructions_length;
110    std::cin >> instructions_length;
111    ignore_line();
112    std::vector<std::vector<std::string>> instructions;
113    for (int i = 0; i < instructions_length; i++) {
114        instructions.emplace_back(get_words<std::string>());
115    }
116    std::vector<std::string> res = rate_limiter(capacity, window, instructions);
117    for (const std::string& line : res) {
118        std::cout << line << '\n';
119    }
120}
121

1class TokenBucket
2  def initialize(capacity, window, start)
3    @capacity = capacity
4    @window = window
5    @tokens = capacity
6    @last_refill = start
7  end
8
9  def allow(timestamp)
10    elapsed = timestamp - @last_refill
11    if elapsed > 0 && @window > 0
12      refill = (elapsed * @capacity) / @window
13      if refill > 0
14        if @tokens + refill >= @capacity
15          @tokens = @capacity
16          @last_refill = timestamp
17        else
18          @tokens += refill
19          @last_refill += (refill * @window) / @capacity
20        end
21      end
22    end
23    if @tokens > 0
24      @tokens -= 1
25      return true
26    end
27    false
28  end
29end
30
31class RateLimiterObj
32  def initialize(capacity, window)
33    @capacity = capacity
34    @window = window
35    @buckets = {}
36  end
37
38  def request(client, timestamp)
39    @buckets[client] ||= TokenBucket.new(@capacity, @window, timestamp)
40    @buckets[client].allow(timestamp) ? "allow" : "deny"
41  end
42end
43
44def rate_limiter(capacity, window, instructions)
45  limiter = RateLimiterObj.new(capacity, window)
46  output = []
47  instructions.each do |instruction|
48    command = instruction[0]
49    if command == "request"
50      client = instruction[1]
51      ts = instruction[2].to_i
52      output << limiter.request(client, ts)
53    end
54  end
55  output
56end
57
58if __FILE__ == $0
59  capacity = Integer(gets, 10)
60  window = Integer(gets, 10)
61  instructions = Array.new(Integer(gets, 10)) { gets.split }
62  res = rate_limiter(capacity, window, instructions)
63  res.each { |line| puts(line) }
64end
65

Run and Test

The exercise drives the design through a command stream so it can be checked deterministically. The input supplies an explicit timestamp with every request rather than reading a wall clock, which means the same input always produces the same output — no time.sleep, no mocking.

The sequence diagram below traces a single request call through all three objects:

Trace test 2 to see the contract. Capacity is 3, window is 10 (so 3 tokens refill over 10 time units, meaning 1 token per 10/3 ≈ 3.33 units in float terms, but in integer math floor(elapsed * 3 / 10)).

Requests at t=0: the bucket starts full at 3 tokens.

request alice 0  → allow (tokens: 3→2)
request alice 0  → allow (tokens: 2→1)
request alice 0  → allow (tokens: 1→0)
request alice 0  → deny  (tokens: 0, elapsed=0, refill=0)

At t=10: elapsed = 10, refill = (10 * 3) // 10 = 3. Since tokens + refill = 0 + 3 >= 3 (capacity), the full-bucket branch runs: tokens = 3, last_refill snapped to 10.

request alice 10 → allow (tokens: 3→2)
request alice 10 → allow (tokens: 2→1)
request alice 10 → allow (tokens: 1→0)
request alice 10 → deny  (tokens: 0, elapsed=0, refill=0)

The other tests cover two independent clients (test 3, whose buckets are completely separate), a sub-window partial refill at t=2 in a capacity-2/window-4 configuration (test 4), capacity-1 strict one-at-a-time enforcement (test 5), and a 1-token-per-unit partial refill at t=1 in a capacity-5/window-5 configuration (test 6).

Concurrency and Thread Safety

In production, a rate limiter sits in the request path of a multi-threaded server. Many goroutines or threads evaluate the same client's bucket at the same time. The read-check-decrement sequence inside allow is not atomic, which creates a race on the last token.

Suppose Alice's bucket holds exactly 1 token and two threads arrive simultaneously:

Both threads see tokens = 1, both pass the check, and both decrement. The limiter allows two requests where only one should have been permitted. This is the classic lost-update race.

The fix is a per-client lock guarding the read-check-decrement sequence. The RateLimiter acquires the bucket's lock before calling allow and releases it after:

Thread 2 now sees the post-decrement state and correctly returns deny.

The critical design choice is where to place the lock. A single global lock on the RateLimiter is simple but serialises all clients: Alice's request blocks Bob's even though their buckets are completely independent. The right level is one lock per client — but a per-bucket lock alone is not enough, because the bucket itself is created on the first request and that creation is its own race. If two first requests for a new client both find no bucket, both create a full one, and both allow, the later map write hides one of them and a decrement is lost. The fix has two levels: make get-or-create atomic, then lock the bucket. In Java, ConcurrentHashMap.computeIfAbsent does both — it creates the bucket exactly once under the map's bin lock, and each bucket carries a ReentrantLock for the refill-and-decrement. In Python, hold a limiter-level lock (or a striped lock keyed on the client, below) just long enough to get-or-create the bucket, then acquire that bucket's threading.Lock for allow.

For very high client counts — millions of distinct keys — even per-client locks can be heavy if kept as live objects. Lock striping bounds memory: allocate a fixed pool of N locks (typically 128 or 256) and assign each client to a stripe via hash(client_key) % N. The pool size is constant regardless of client count, and contention is low because clients spread across stripes. Two clients can share a lock only by hash collision, and the critical section (a few integer operations) is short enough that the wait is negligible. This is the same idea used in ConcurrentHashMap's internal segment design.

The deterministic exercise above sidesteps concurrency entirely — timestamps are caller-supplied and commands are sequential — which is the right choice for a testable interface. In a real deployment, timestamps come from time.monotonic() or System.nanoTime() inside the lock, so the refill math and the consume step are always atomic with respect to each other.

Extensions

Every extension is a new concrete class behind the same interface — RateLimiter never changes:

A sliding-window log records the exact timestamp of every request in a deque. On each check, entries older than the window are evicted, then the remainder is counted. If the count is below the limit, the new timestamp is recorded and the request is allowed. This gives exact per-second semantics with no burst at window boundaries, at the cost of O(limit) memory per client. Implementing it as a second RateLimitStrategy concrete class requires zero changes to RateLimiter.

A fixed-window counter divides time into fixed buckets (per second, per minute) and keeps a counter per bucket. It uses O(1) state and is easy to implement in Redis with INCR and a TTL. The trade-off is a burst at window boundaries: a client can send limit requests at the end of one window and limit more at the start of the next, doubling the effective rate for one instant.

Distributed rate limiting replaces the in-process state with Redis so multiple server instances share the same quota per client. The simplest approach is a Redis fixed-window counter: each allow call increments a key scoped to the client and the current time window (INCRBY), rejects the request if the count exceeds the limit, and sets a TTL on the key so it expires at window end. This is a different algorithm from the in-memory token bucket above — it counts requests in a fixed window rather than tracking a token count and a last-refill timestamp. A true distributed token bucket requires an atomic Lua script that reads the stored token count and last-refill timestamp, computes the refill, decrements, and writes back — all in one server-side operation. The RateLimitStrategy interface absorbs either choice: a RedisFixedWindowCounter implements the same allow method, routing to Redis instead of an in-memory counter.

Per-tier limits add a second dimension: different client tiers (free, pro, enterprise) get different capacities and windows. The RateLimiter looks up the tier at client-creation time and constructs the bucket with tier-appropriate parameters. The strategy interface is unchanged.

Each extension lands by adding a class or swapping an implementation, never by rewriting the RateLimiter.

Quiz

Test your understanding of the key design decisions in this rate limiter.

Rate Limiter Design Quiz

Why is RateLimitStrategy defined as an interface rather than a concrete base class wired directly into RateLimiter?

Why does each client get its own independent TokenBucket rather than sharing one bucket across all clients?

The TokenBucket updates last_refill differently depending on whether the refill fills the bucket. When does it snap last_refill to the current timestamp, and when does it advance by floor(refill * window / capacity)?

Two threads both read tokens=1 on Alice's bucket simultaneously. Both pass the tokens > 0 check, both decrement, and both return allow — but only one should have been allowed. What is this race called, and how does the design fix it?

How would a distributed rate limiter differ from this in-process design, and how does the RateLimitStrategy interface accommodate it?

Core Learning

LLD/OOP Interview

Additional Resources

AI Engineering

Design a Rate Limiter

See It in Action

Clarifying Requirements

Decision checkpoint

Core Entities

Decision checkpoint

Decision checkpoint

Decision checkpoint

Designing the Classes

RateLimitStrategy

TokenBucket

RateLimiter

Try It Yourself

Code Walkthrough

Full Solution

Run and Test

Concurrency and Thread Safety

Extensions

Quiz

Rate Limiter Design Quiz