Skip to main content

How to Process Jobs Exactly Once in Redis

Processing a job more than once can cause real problems. Charging a customer twice, sending duplicate emails, or creating duplicate records all erode trust and create support tickets. The challenge is that distributed systems fail in messy ways: workers crash mid-job, networks partition, and processes restart. Ensuring a job runs exactly once requires careful coordination.

Redis provides the primitives needed to build exactly-once job processing. The core insight is that "exactly once" is really "at least once delivery" plus "idempotent processing." You cannot prevent a job from being delivered multiple times in a distributed system, but you can ensure that processing it multiple times has the same effect as processing it once. This guide covers three approaches: deduplication with unique job IDs to reject duplicates at submission time, atomic claim and process to prevent concurrent execution, and idempotency keys to make repeated processing safe.

Which Redis data types we will use

Set tracks which job IDs have been seen or processed. The SADD command returns whether the member was newly added, giving you an atomic check-and-add operation. If SADD returns 0, the job ID already exists and you can skip it. Sets provide O(1) membership checks and naturally deduplicate.

String is used in two ways in this implementation:

  1. As a lock to claim exclusive ownership of a job. The SET command with NX (only if not exists) atomically creates a claim that other workers will see.
  2. As an idempotency record to store the result of a completed operation. Before processing, check if a result already exists. If so, return the cached result instead of reprocessing.

Hash stores job metadata and processing state. Fields can track the current status, the worker that claimed it, timestamps, and the final result. Hashes let you update individual fields atomically without rewriting the entire job record.

List serves as the job queue itself. Workers pop jobs from the list for processing. Redis lists support atomic pop operations that remove and return an item in one step, preventing two workers from receiving the same job.

Preventing duplicate jobs with unique IDs

The first layer of defense is rejecting duplicate jobs before they enter the queue. Every job gets a unique ID, and you track which IDs you have seen. When a new job arrives, check if its ID exists in the seen set. If it does, reject the duplicate. If not, add the ID and enqueue the job.

This approach catches duplicates caused by client retries, network issues that cause resubmission, or application bugs that submit the same job twice. The deduplication window depends on how long you keep IDs in the set. For jobs that should never repeat, keep IDs forever. For jobs where duplicates only matter within a time window, expire old IDs periodically.

The tradeoff is memory usage. Every job ID consumes space in the set. For high-volume systems, you may need to expire old IDs or use a probabilistic data structure like a Bloom filter. The examples below use a set with expiration on individual tracking keys for bounded memory.

# Check if job ID already exists and add it atomically
# Returns 1 if added (new job), 0 if already existed (duplicate)
SADD jobs:seen job-abc-123
> 1 (new job, proceed to enqueue)

# Another attempt to submit the same job
SADD jobs:seen job-abc-123
> 0 (duplicate, reject it)

# Only enqueue if SADD returned 1
RPUSH jobs:pending job-abc-123
> 1

# For time-bounded deduplication, use a string with expiration instead
SET jobs:seen:job-xyz-456 1 NX EX 3600
> OK (new job, expires in 1 hour)

SET jobs:seen:job-xyz-456 1 NX EX 3600
> (nil) (duplicate within the hour)

Claiming jobs atomically to prevent double processing

Deduplication at submission time does not prevent the same job from being processed twice if a worker crashes mid-job and the job gets requeued. To handle this, workers must atomically claim a job before processing it. If two workers try to claim the same job, only one succeeds.

The pattern uses BRPOPLPUSH (or its newer equivalent BLMOVE) to atomically move a job from the pending queue to a processing queue. This single command removes the job from pending and adds it to processing, so no other worker can grab it. After successful processing, remove the job from the processing queue. If a worker crashes, a recovery process can move stale jobs from processing back to pending.

This approach handles worker crashes gracefully. The processing queue acts as a record of in-flight jobs. A background process periodically checks for jobs that have been in processing too long and moves them back to pending for retry. The job will run again, so your processing logic must be idempotent or you need the idempotency key pattern described in the next section.

# Worker atomically moves job from pending to processing
# BLMOVE blocks until a job is available, then moves it atomically
BLMOVE jobs:pending jobs:processing RIGHT LEFT 30
> "job-abc-123"

# The job is now in processing, not pending
# Only this worker has it

# After successful processing, remove from processing queue
LREM jobs:processing 1 "job-abc-123"
> 1 (removed)

# Recovery: find jobs stuck in processing (check timestamps separately)
LRANGE jobs:processing 0 -1
> ["job-xyz-old", "job-abc-stuck"]

# Move stuck job back to pending for retry
LREM jobs:processing 1 "job-xyz-old"
RPUSH jobs:pending "job-xyz-old"

Using idempotency keys for safe retries

Even with atomic claiming, jobs may run multiple times due to crashes, timeouts, or recovery processes. The final layer of defense is making the job processing itself idempotent. Before doing work, check if you have already done it. Store the result of completed work and return the cached result on subsequent attempts.

An idempotency key is a unique identifier for an operation. Before processing, check if a result exists for that key. If it does, return the cached result without reprocessing. If not, do the work and store the result. This pattern is especially important for operations with external side effects like payment processing or sending notifications.

The tradeoff is storage and complexity. You need to store results for as long as retries might occur, and you need to handle the case where a crash happens after the work is done but before the result is stored. For critical operations, wrap the work and the result storage in a transaction or use a two-phase approach where you mark the operation as "in progress" before starting.

# Check if this operation was already completed
GET idempotency:payment:order-123
> (nil) (not processed yet)

# Do the work, then store the result
# SET NX ensures only one worker can store the result
SET idempotency:payment:order-123 '{"status":"success","charge_id":"ch_xxx"}' NX EX 86400
> OK (result stored, expires in 24 hours)

# Subsequent attempts find the existing result
GET idempotency:payment:order-123
> '{"status":"success","charge_id":"ch_xxx"}'
# Return cached result, do not reprocess

# Another worker trying to store a result fails
SET idempotency:payment:order-123 '{"status":"success","charge_id":"ch_yyy"}' NX EX 86400
> (nil) (already exists, this is a duplicate)

Choosing an approach

Use deduplication with unique job IDs when duplicate submissions are your main concern. This is the simplest layer and catches most duplicates at the door. It works well when clients might retry requests or when upstream systems might send the same event multiple times.

Use atomic claiming when you have multiple workers and need to ensure only one processes each job. The BLMOVE pattern provides clean handoff between queues and makes recovery straightforward. Combine this with a background process that monitors for stuck jobs.

Use idempotency keys when your job processing has side effects that cannot be safely repeated. Payment processing, sending emails, and creating external resources all benefit from this pattern. The cached result also improves performance for legitimate retries.

In practice, production systems often use all three approaches together. Deduplicate at submission to reduce load, claim atomically to prevent concurrent processing, and use idempotency keys as the final safety net for operations that must not repeat. Each layer catches failures that slip through the others, giving you true exactly-once semantics even in a distributed system that can fail in countless ways.