How to Implement Rate Limiting in Redis

Every API needs rate limiting, but not every API needs the same algorithm. The right choice depends on whether you want strict limits, smooth traffic, burst tolerance, or minimal memory usage. Redis can implement all the common approaches, each with different commands and tradeoffs.

This post walks through four rate limiting algorithms: fixed window, sliding window, token bucket, and leaky bucket. By the end you will understand when to use each and how to implement them with Redis.

Which Redis data types we will use

String is used in two ways in this implementation:

As counters for fixed window rate limiting. The INCR command atomically increments a string value, making it perfect for counting requests. Combined with EXPIRE, you get counters that automatically reset.
As key pairs for the token bucket algorithm. One string tracks the current token count, another tracks the last refill time. Together they simulate a bucket that refills at a steady rate.

Sorted Set is used in two ways in this implementation:

For timestamp tracking in sliding window rate limiting. Each request becomes a member with its timestamp as the score, letting you query "how many requests in the last 60 seconds" by counting members in that range.
As a queue for the leaky bucket algorithm. Requests enter the set with arrival timestamps, and a background worker drains them at a fixed rate, smoothing bursty traffic.

Each algorithm uses different data types because each solves a different problem. Fixed window is simple and memory-efficient. Sliding window is accurate. Token bucket allows bursts. Leaky bucket enforces smoothness. The data type follows from the algorithm's needs.

Fixed window: counting requests per time interval

Fixed window divides time into intervals and counts requests within each interval. It uses a simple Redis string counter that resets at interval boundaries. One key per user per interval, one integer value.

The algorithm works by including the interval start time in the key name. When a request arrives, you increment the counter for the current interval. If the count exceeds your limit, reject the request.

Fixed window is dead simple to implement and uses constant memory. Each user needs only one active key at a time, storing a single integer. You cannot get simpler than INCR.

The downside is boundary bursting. A user could make 100 requests at 11:59:59 and another 100 at 12:00:01. Both fall into different intervals, so both are allowed. The user stays under a 100-per-minute limit but sends 200 requests in two seconds. This matters if your backend cannot handle bursts, but is fine if approximate limits are acceptable.

Redis
Python
TypeScript
Go

# Key includes the window timestamp (Unix minute)
# This groups all requests in the same minute together
INCR ratelimit:user123:1699900000
> 1 (first request this window)
> 47 (on the 47th request)
# If return value > limit, reject the request

# Auto-delete after 60 seconds
# Prevents key accumulation for old windows
EXPIRE ratelimit:user123:1699900000 60

# Key includes the window timestamp (Unix minute)
# If return value > limit, reject the request
count = client.incr(f'ratelimit:{user_id}:{window_ts}')

# Auto-delete after 60 seconds
client.expire(f'ratelimit:{user_id}:{window_ts}', 60)

// Key includes the window timestamp (Unix minute)
// If return value > limit, reject the request
const count = await client.incr(`ratelimit:${userId}:${windowTs}`);

// Auto-delete after 60 seconds
await client.expire(`ratelimit:${userId}:${windowTs}`, 60);

// Key includes the window timestamp (Unix minute)
// If return value > limit, reject the request
count, _ := client.Incr(ctx, fmt.Sprintf("ratelimit:%s:%d", userID, windowTs)).Result()

// Auto-delete after 60 seconds
client.Expire(ctx, fmt.Sprintf("ratelimit:%s:%d", userID, windowTs), 60*time.Second)

Sliding window: accurate limits with timestamps

Sliding window looks at the trailing N seconds from now, not from interval boundaries. It uses a Redis sorted set where each request is a member with its timestamp as the score. Memory scales with the number of requests in the window.

Each incoming request gets recorded with its timestamp. Before checking the limit, you remove all entries older than your window. Then count what remains. If under the limit, add the new request.

Sliding window gives you accurate limits with no boundary exploits. A 100-per-minute limit means at most 100 requests in any 60-second period, not per calendar minute. You can also calculate exactly when the user's oldest request will expire, giving a precise "retry after" value to blocked clients.

The tradeoff is memory usage, which scales with your limit. Each request adds an entry to the sorted set. A 1000-per-minute limit means up to 1000 entries per user. For high-volume limits across many users, this adds up fast. You also need unique member values, so you append a random suffix to handle multiple requests in the same millisecond.

Redis
Python
TypeScript
Go

# Remove all entries older than the window start
# Current time is 1699900060000ms, window is 60 seconds
# So remove everything before 1699900000000ms
ZREMRANGEBYSCORE ratelimit:user123 0 1699900000000
> 3 (removed 3 expired entries)

# Count remaining entries in the window
# If >= limit, reject the request
ZCARD ratelimit:user123
> 47 (47 requests in current window)

# Record this request with timestamp as score
# Append random suffix for uniqueness
ZADD ratelimit:user123 1699900060000 "1699900060000-xyz789"
> 1 (added successfully)

# Key auto-deletes if no requests for 60 seconds
EXPIRE ratelimit:user123 60

key = f'ratelimit:{user_id}'

# Remove entries older than window start
client.zremrangebyscore(key, 0, window_start)

# Count remaining - if >= limit, reject
count = client.zcard(key)

# Record request with timestamp as score
client.zadd(key, {f'{now}-{request_id}': now})

# Auto-delete if idle
client.expire(key, 60)

const key = `ratelimit:${userId}`;

// Remove entries older than window start
await client.zremrangebyscore(key, 0, windowStart);

// Count remaining - if >= limit, reject
const count = await client.zcard(key);

// Record request with timestamp as score
await client.zadd(key, now, `${now}-${requestId}`);

// Auto-delete if idle
await client.expire(key, 60);

key := fmt.Sprintf("ratelimit:%s", userID)

// Remove entries older than window start
client.ZRemRangeByScore(ctx, key, "0", fmt.Sprintf("%d", windowStart))

// Count remaining - if >= limit, reject
count, _ := client.ZCard(ctx, key).Result()

// Record request with timestamp as score
client.ZAdd(ctx, key, redis.Z{Score: float64(now), Member: fmt.Sprintf("%d-%s", now, requestID)})

// Auto-delete if idle
client.Expire(ctx, key, 60*time.Second)

Token bucket: allowing controlled bursts

Token bucket allows controlled bursting by maintaining a bucket of tokens that refills over time. Each request consumes one token. It uses two Redis string keys per user: one for the token count and one for the last update timestamp. Memory is constant regardless of request rate.

Imagine a bucket that holds up to N tokens. Tokens are added at a steady rate, say 10 per second. When a request arrives, you calculate how many tokens have accumulated since the last check, add them to the bucket up to the maximum, then try to consume one token. If the bucket is empty, reject the request.

Token bucket uses constant memory (two keys per user) and provides tunable burst behavior. A user who has been idle can make several quick requests using accumulated tokens, then sustain at the refill rate. This feels fair to users and matches how people actually interact with APIs. A bucket with capacity 100 and refill rate 10 per second lets a user burst 100 requests immediately, then continue at 10 per second.

The complexity comes from concurrent requests. Checking and updating requires reading the current state, calculating, and writing back. Concurrent requests might both read the same state and both succeed when only one should. You need MULTI/EXEC transactions for strict accuracy under load, which adds implementation complexity.

Redis
Python
TypeScript
Go

# Read current state
GET ratelimit:user123:tokens
> "5" (5 tokens remaining)

GET ratelimit:user123:last_update
> "1699900050000" (last updated 10 seconds ago)

# Calculate new tokens based on elapsed time
# If refill rate is 10/sec and 10 seconds passed, add 100 tokens
# Cap at bucket maximum (say 100), subtract 1 for this request
# New value: min(5 + 100, 100) - 1 = 99

# Write new state
SET ratelimit:user123:tokens 99
SET ratelimit:user123:last_update 1699900060000

# Cleanup after 1 hour of inactivity
EXPIRE ratelimit:user123:tokens 3600
EXPIRE ratelimit:user123:last_update 3600

# Read current state
tokens = client.get(f'ratelimit:{user_id}:tokens')
last_update = client.get(f'ratelimit:{user_id}:last_update')

# Calculate new tokens based on elapsed time
# new_tokens = min(tokens + elapsed * refill_rate, max_tokens) - 1

# Write new state
client.set(f'ratelimit:{user_id}:tokens', new_tokens)
client.set(f'ratelimit:{user_id}:last_update', now)

# Cleanup after 1 hour of inactivity
client.expire(f'ratelimit:{user_id}:tokens', 3600)
client.expire(f'ratelimit:{user_id}:last_update', 3600)

// Read current state
const tokens = await client.get(`ratelimit:${userId}:tokens`);
const lastUpdate = await client.get(`ratelimit:${userId}:last_update`);

// Calculate new tokens based on elapsed time
// newTokens = Math.min(tokens + elapsed * refillRate, maxTokens) - 1

// Write new state
await client.set(`ratelimit:${userId}:tokens`, newTokens);
await client.set(`ratelimit:${userId}:last_update`, now);

// Cleanup after 1 hour of inactivity
await client.expire(`ratelimit:${userId}:tokens`, 3600);
await client.expire(`ratelimit:${userId}:last_update`, 3600);

// Read current state
tokens, _ := client.Get(ctx, fmt.Sprintf("ratelimit:%s:tokens", userID)).Result()
lastUpdate, _ := client.Get(ctx, fmt.Sprintf("ratelimit:%s:last_update", userID)).Result()

// Calculate new tokens based on elapsed time
// newTokens = min(tokens + elapsed * refillRate, maxTokens) - 1

// Write new state
client.Set(ctx, fmt.Sprintf("ratelimit:%s:tokens", userID), newTokens, 0)
client.Set(ctx, fmt.Sprintf("ratelimit:%s:last_update", userID), now, 0)

// Cleanup after 1 hour of inactivity
client.Expire(ctx, fmt.Sprintf("ratelimit:%s:tokens", userID), time.Hour)
client.Expire(ctx, fmt.Sprintf("ratelimit:%s:last_update", userID), time.Hour)

Leaky bucket: smoothing traffic to a constant rate

Leaky bucket enforces a strict, smooth output rate regardless of input burstiness. Requests enter a queue and drain at a fixed rate. It uses a Redis sorted set as the queue and requires a background worker to process entries.

Think of a bucket with a hole in the bottom. Requests pour in the top and leak out the bottom at a constant rate. If too many requests arrive and the bucket overflows, new requests are rejected.

Leaky bucket guarantees smooth output. Your downstream service sees exactly N requests per second, never more, no matter how bursty the incoming traffic. This protects fragile backends that cannot handle spikes.

The tradeoff is latency and operational complexity. Requests sit in a queue instead of being processed immediately. You also need a background worker running on a timer to drain the bucket. Unlike the other algorithms that are purely request-time checks, leaky bucket requires a separate process to pop and process queued requests at the configured rate.

Redis
Python
TypeScript
Go

# Add request to the queue with arrival timestamp
ZADD ratelimit:user123:queue 1699900060000 "request-id-abc123"
> 1 (added to queue)

# Check queue size against bucket capacity
# If >= bucket capacity, reject new requests
ZCARD ratelimit:user123:queue
> 15 (15 requests waiting)

# Background worker runs on a timer (e.g., every 100ms)
# Pops and processes the oldest entry
ZPOPMIN ratelimit:user123:queue
> ["request-id-xyz789", "1699900055000"]

# Alternative: pop multiple if processing in batches
ZPOPMIN ratelimit:user123:queue 10

key = f'ratelimit:{user_id}:queue'

# Add request to queue with arrival timestamp
client.zadd(key, {request_id: now})

# Check queue size - if >= capacity, reject
queue_size = client.zcard(key)

# Background worker: pop and process oldest entry
entry = client.zpopmin(key)

# Or pop multiple for batch processing
entries = client.zpopmin(key, 10)

const key = `ratelimit:${userId}:queue`;

// Add request to queue with arrival timestamp
await client.zadd(key, now, requestId);

// Check queue size - if >= capacity, reject
const queueSize = await client.zcard(key);

// Background worker: pop and process oldest entry
const entry = await client.zpopmin(key);

// Or pop multiple for batch processing
const entries = await client.zpopmin(key, 10);

key := fmt.Sprintf("ratelimit:%s:queue", userID)

// Add request to queue with arrival timestamp
client.ZAdd(ctx, key, redis.Z{Score: float64(now), Member: requestID})

// Check queue size - if >= capacity, reject
queueSize, _ := client.ZCard(ctx, key).Result()

// Background worker: pop and process oldest entry
entry, _ := client.ZPopMin(ctx, key).Result()

// Or pop multiple for batch processing
entries, _ := client.ZPopMin(ctx, key, 10).Result()

Choosing an algorithm

Fixed window works when you need something simple and can tolerate boundary bursts. Minimal memory, trivial implementation.

Sliding window works when you need accurate limits and want to tell users exactly when they can retry. Higher memory cost but precise control.

Token bucket works when you want to allow controlled bursting for a better user experience. Constant memory, but needs careful handling of concurrent requests.

Leaky bucket works when your downstream system cannot handle any bursts at all. Adds latency and operational complexity, but guarantees smooth output.

All four work well with Redis. Pick the one that matches your traffic goals, not the one that seems most sophisticated.

How to Implement Rate Limiting in Redis

Which Redis data types we will use​

Fixed window: counting requests per time interval​

Sliding window: accurate limits with timestamps​

Token bucket: allowing controlled bursts​

Leaky bucket: smoothing traffic to a constant rate​

Choosing an algorithm​