Responding to webhooks quickly

Webhook providers expect a fast response. When your endpoint takes too long, the provider assumes something went wrong and retries the delivery. This creates duplicate events, wastes resources on both sides, and can snowball into a backlog that takes hours to clear.

Most providers set aggressive timeout thresholds. Stripe waits only 20 seconds. GitHub gives you 10 seconds. Shopify cuts you off at 5 seconds. If your handler does anything substantial like calling external APIs, processing images, or running complex database queries, you will hit these limits.

This article covers why timeouts matter, how to structure your handlers for speed, and the patterns that let you do slow work without missing deadlines.

Why providers set short timeouts

Webhook delivery systems handle enormous volume. Stripe sends billions of webhooks. Slack processes millions per day. At this scale, every second a connection stays open consumes resources that could serve other customers.

Short timeouts also protect providers from cascading failures. If your endpoint becomes slow, you should not drag down the entire webhook infrastructure. By timing out quickly, providers isolate problems and keep other deliveries flowing.

From your perspective, short timeouts are actually helpful. They force you to build handlers that fail fast and recover gracefully. A webhook system where endpoints can block for minutes would be far more fragile than one where everyone agrees to respond promptly.

The acknowledge-first pattern

The most reliable way to handle webhooks quickly is to separate receipt from processing. Your endpoint does the minimum work needed to accept the webhook, then hands it off to a background worker for actual processing.

When the webhook arrives, your handler verifies the signature, validates basic structure, writes the event to a queue, and returns 200. This takes milliseconds. The provider sees a successful delivery and moves on. Meanwhile, your worker picks up the event and does whatever time-consuming work is required.

from flask import Flask, request
import redis

app = Flask(__name__)
queue = redis.Redis()

@app.route("/webhooks", methods=["POST"])
def handle_webhook():
    payload = request.get_data()

    # Verify signature (fast)
    if not verify_signature(payload, request.headers.get("X-Signature")):
        return "Invalid signature", 401

    # Queue for processing (fast)
    queue.lpush("webhook_queue", payload)

    # Respond immediately
    return "OK", 200

The worker runs separately, pulling events from the queue and processing them at whatever pace it can manage. If processing is slow, the queue grows but deliveries keep succeeding. If the worker crashes, events stay in the queue until it restarts.

This pattern decouples your response time from your processing time. Your endpoint becomes a thin relay that almost never times out.

Choosing a queue for async processing

Any reliable queue works for webhook processing. Redis lists are simple and fast for moderate volume. RabbitMQ and Amazon SQS add features like dead-letter queues and delivery guarantees. For high volume or complex routing, Kafka provides durability and replay capabilities.

The key requirements are durability and at-least-once delivery. If your worker crashes mid-processing, the event should not disappear. Most queues achieve this with acknowledgment: the worker explicitly confirms when processing succeeds, and the queue redelivers unacknowledged messages after a timeout.

Keep your queue close to your webhook endpoint. Network latency between receiving a webhook and enqueueing it counts against your timeout budget. If your queue is in a different region, those extra milliseconds add up.

What to do in the synchronous path

Some work must happen before you can respond. Signature verification is mandatory. You might need to check that the event type is one you handle. You might want to reject malformed payloads rather than queue garbage for your workers.

Keep this synchronous work minimal. Verify the signature, parse the JSON, check the event type, and enqueue. Skip anything that calls external services, runs database queries, or could block on I/O.

If you absolutely need to look something up synchronously, set strict timeouts on those operations. A database query that usually takes 5ms might take 5 seconds when the database is overloaded. Without a timeout, that one slow query causes a webhook timeout, which triggers a retry, which adds more load, which makes the database slower.

# Bad: no timeout, could block forever
result = db.query("SELECT * FROM users WHERE id = %s", user_id)

# Better: fail fast if database is slow
result = db.query("SELECT * FROM users WHERE id = %s", user_id, timeout=0.5)

When a timeout fires, let the webhook fail. The provider will retry, and hopefully your system will be healthier by then.

Monitoring response times

You cannot improve what you do not measure. Track the response time of your webhook endpoints as a key metric. Set alerts for when the 95th percentile approaches your provider's timeout threshold.

Log slow requests with enough context to debug them. Which event types are slow? Which code paths? Are specific external dependencies causing delays? This data guides optimization efforts.

Some webhook providers offer dashboards showing your endpoint's response times from their perspective. Check these periodically. Network latency between their servers and yours counts against your timeout budget, and you might not see it in your own metrics.

When you cannot use async processing

Some use cases require synchronous responses. Slack's interactive messages expect your endpoint to return the new message content. Verification challenges require you to echo back a token. These cannot be deferred to a background worker.

For these cases, optimize ruthlessly. Cache everything you can. Use connection pooling to avoid TLS handshake overhead. Keep payloads small. Profile your handler to find bottlenecks.

If you still cannot meet the timeout, consider whether the integration is designed correctly. Some providers offer alternative patterns for slow operations, like responding with a placeholder and updating later via API. Check the documentation for options.

Responding to webhooks quickly

Why providers set short timeouts​

The acknowledge-first pattern​

Choosing a queue for async processing​

What to do in the synchronous path​

Monitoring response times​

When you cannot use async processing​