PagerDuty Webhooks Review
PagerDuty is a critical tool that helps engineering and devops teams manage their incident response. Automating responses to incidents is a great use case for webhooks.
An important thing to note about PagerDuty's webhooks is that they're currently on version 3 and have already stopped supporting their V1 as of November 2021 and will End of Life V1 in October 2022.
V2 is set to End-Of-Support in October 2022 and read End-Of-Life by March 2023.
We'll strictly be covering their current version 3 of webhooks.
When we look at webhook solutions, we’re generally looking for 7 things: signature verification, retries, manual retries, exponential backoff, visibility/logs, event types, and multiple endpoint support.
✅ Signature Verification
⬜ Exponential Backoff
⬜ Manual Retries
⬜ Visibility into Logs
✅ Event Types
✅ Multiple Endpoints Support
Signature Verification ✅
Signature Verification is a critical security feature of webhooks. It lets users verify that a webhook was actually sent from the expected source.
V3 of Webhooks include a signature in the request header. They've also emphasized how important it is for their users to verify the messages and written up a nice explanation of how to do it with example code.
Webhook messages often fail. Without retries, users may miss many notifications which results in a bad user experience.
While PagerDuty does support automatic retries, they only retry 4 times over the span of 20 minutes. This is not a sufficient window to allow users to diagnose and fix their endpoints.
Exponential backoff is an algorithm that increases the delay between retries exponentially. This ensures that your system won’t get bottleneck from having to re-queue failed webhooks while also giving users time to fix broken endpoints before they burn through all their retry attempts.
While an exponential backoff algorithm is used to schedule their retries, scheduling all retry attempts over 20 minutes defeats the purpose of implementing exponential backoff (giving users time to fix the problem before their endpoint gets deactivated). If PageDuty wants to stick to 4 retries, they could at least increase the delay between attempts to extend the time window for debugging.
In the event that a user’s endpoint is failing, its nice to give the developer tasked with debugging the endpoint, the option to initiate a retry manually instead of having to wait for the next scheduled retry (this could be several hours of waiting if they’re close to the end of the retry schedule).
We could not find any mention of triggering retries manually.
Visibility into Logs
Giving users visibility into the delivery logs is critical for troubleshooting/debugging. There is no mention of any log/history visibility or troubleshooting in general.
Event Types ✅
Event types are identifiers denoting the type of message being sent and are the primary way for webhook consumers to configure what events they are interested in receiving.
There is an extensive list of available event types here: Webhhook API reference. There is also a filtering mechanism to specify which event types should be sent to which endpoints.
Multiple Endpoint Support ✅
Many webhook solutions only allow one endpoint URL to be specified where all messages will be received. By enabling your users to create multiple endpoints, they’ll be able to choose which endpoints receive which messages. Multiple endpoint support and event types go hand in hand.
PagerDuty's “Create a webhook subscription" endpoint allows users to create new subscriptions with different endpoints by specifying a URL.
Overall, we found PagerDuty's webhook system lacking. They've implemented some common best practices like retries with exponential backoff but it seems to be implemented to check a box instead of improving user experience. While we appreciated going the extra mile to explain and give examples of how to verify signatures, we would have liked to see similar content around helping users troubleshoot failing endpoints.
If you’re looking to implement world class webhooks, consider trying Svix, our webhooks as a service product that makes it super easy to build a secure, reliable, and scalable webhook solution through an API.