Understanding webhook delivery guarantees
When a webhook provider promises to deliver events to your endpoint, what exactly are they promising? The answer varies significantly between providers and has real implications for how you build your consumer. Some guarantee every event reaches you at least once. Others guarantee events arrive at most once. A few attempt to guarantee exactly once delivery.
Understanding these guarantees helps you build appropriate safeguards into your webhook handlers and set realistic expectations about data consistency.
At-least-once delivery
At-least-once delivery means the provider will keep trying until they get a successful acknowledgment from your endpoint. If your server processes the event but crashes before returning a 200 response, the provider sees a failure and retries. You receive the same event twice.
This is the most common guarantee because it prioritizes data completeness over simplicity. Losing events is usually worse than handling duplicates. A missed payment notification might mean orders never ship. A duplicate notification just means your code runs twice, which idempotent handlers can manage.
Most major webhook providers use at-least-once delivery: Stripe, GitHub, Shopify, Twilio, and many others. When you see retry policies and delivery attempt logs, you are looking at an at-least-once system.
Building consumers for at-least-once systems requires idempotency. Track which event IDs you have processed and skip duplicates. Design your business logic so that processing the same event twice produces the same result as processing it once. This is extra work, but it is the price of never missing an event.
At-most-once delivery
At-most-once delivery means the provider sends each event exactly one time and does not retry on failure. If your endpoint is down, the event is lost. This sounds worse than at-least-once, and for most use cases it is.
However, at-most-once has advantages for specific scenarios. There are no duplicates to handle, which simplifies consumer logic. Latency is predictable since events are never delayed by retry backoff. For high-volume, loss-tolerant use cases like analytics or logging, at-most-once might be acceptable.
Few webhook providers offer at-most-once as their default because customer expectations lean toward reliability. You might encounter it in internal systems or real-time applications where stale data is worse than missing data.
If you consume at-most-once webhooks, you need a reconciliation strategy. Periodically poll the provider's API to catch events your webhook handler missed. Accept that during outages you will have gaps in your data. Build your application to tolerate these gaps gracefully.
Exactly-once delivery
Exactly-once delivery promises each event arrives exactly one time: no duplicates, no losses. This sounds ideal but is notoriously difficult to achieve in distributed systems. The fundamental problem is that networks are unreliable, and there is no way for the provider to know whether you processed an event unless you tell them.
Consider what happens when your server receives an event, processes it, but the network drops your 200 response. The provider does not know you succeeded. If they retry, you get a duplicate (violating exactly-once). If they do not retry, you might have failed (also violating exactly-once). Without reading your server's internal state, the provider cannot make the right choice.
True exactly-once delivery requires coordination between provider and consumer that goes beyond HTTP request-response. Some message queue systems achieve it through two-phase commits or idempotent producer protocols, but these mechanisms are complex and add latency.
What some providers call "exactly-once" is actually at-least-once delivery combined with idempotency helpers. They provide stable event IDs and recommend deduplication on your end. The combined system behaves like exactly-once from the application's perspective, even though the underlying delivery is at-least-once.
Ordering guarantees
Separate from delivery guarantees are ordering guarantees. When multiple events occur, do they arrive in the order they happened?
Most webhook systems do not guarantee ordering. Events might be processed by different servers, retried at different times, or delayed by varying network conditions. Event A might be sent before event B but arrive after.
For many use cases, ordering does not matter. Each event is self-contained and your handler processes it independently. A customer creation event and an order creation event can be handled in any order.
When ordering matters, you have a few options. Include a sequence number or timestamp in each event and reorder on the consumer side. Buffer events and wait for gaps to fill before processing. Or accept that strict ordering is not possible and design your business logic accordingly.
Some providers offer ordering within a partition. Events for the same customer or resource arrive in order, but events across different resources might not. This is often sufficient since ordering typically only matters within a single entity's event stream.
Choosing your guarantees
If you are building a webhook provider, at-least-once delivery is the safe default. Customers expect reliability, and the infrastructure for retries is straightforward to build. Document your retry policy clearly so consumers know what to expect.
Consider offering ordering guarantees per resource if your events naturally partition by customer or entity. This adds complexity but significantly helps consumers who care about event sequences.
If you are consuming webhooks, assume at-least-once unless the provider explicitly states otherwise. Build idempotent handlers, track processed event IDs, and design for duplicates. This defensive approach works regardless of the actual guarantee and protects you from implementation details in the provider's system.
For critical data, complement webhooks with periodic reconciliation. Poll the provider's API to verify your local state matches theirs. Webhooks provide real-time notification, but polling provides a safety net. Together, they give you both speed and reliability.