Common Webhook Signatures Failure Modes

Cover image

Svix is the enterprise ready webhooks sending service. With Svix, you can build a secure, reliable, and scalable webhook platform in minutes. Looking to send webhooks? Give it a try!

Signing webhooks is the most common way of authenticating webhooks. It's what Stripe, Github, Shopfiy, Svix and all of our customers do, as well as what's recommended by Standard Webhooks.

A future post will talk about what makes payload signing the best choice for webhook authentication, but for this post we will focus on common failure modes when signing webhooks. We will cover some of the most common mistakes, and how they affect the security or usability of webhooks.

Common failure modes

One of the reasons why security is hard, is because generally when writing software a bug results in an obvious broken behavior (crash, wrong results, etc). This means that with enough testing and fuzzing the bugs that people are likely to hit will be found.

With security however, things are different. First of all, security issues usually involve unexpected inputs or conditions which means they are much less likely to be hit accidentally. Additionally, security bugs are usually much more subtle, especially when there's cryptography involved. Lastly, with normal bugs, a crash under extremely rare conditions is not catastrophic, but with security issues even one bug is enough to bring the whole system down.

This is why some of the issues below are so common, because the mistakes are subtle, and the implications are not immediately obvious.

Using bad cryptographic primitives

The first one is both extremely subtle and extremely common. It literally just means choosing bad security primitives or constructs. One common example is choosing md5 or sha1 for the signature scheme, both of which are considered insufficiently secure. Another one is using some make shift signature scheme like hash(key + payload) instead of hmac(key, payload); you can read more on HMACs on Wikipedia to understand why this matters.

Why does it matter? Let's consider a much simple hashing function as an example. Let's consider sumhash() which just takes the sum of all the characters in a payload. Now let's consider the following payload:

payload1 = {"id": "123", "status": "unpaid", "name": "John Doe", ...}

If we run the function we get sumhash(payload1) = N.

Now, let's consider the following payload where moved the "un" part from "unpaid" to turn it into "paid", to the "name" field.

payload2 = {"id": "123", "status": "paid", "name": "John Doeun", ...}

Because of how subhash() works, the value N will be the same, so sumhash(payload1) == sumhash(payload2) which means the signature would be identical. This means that by knowing the signature of one payload we can now successfully malicious payloads that will pass signature verifications. In this case, tricking receivers to believe our account was paid, when in fact it wasn't.

Old hash functions like md5 and sha1 are obviously not this easy to exploit, but exploiting them will lead to similar issues (tricking verifiers).

We therefore may be tempted to use a state of the art hash function like BLAKE2b, which is already keyed (so doesn't even need the use of an HMAC construct) and is both fast and secure. The problem is that newer hash functions may not be supported by all the platforms your customers use which means that they won't be able to verify the signatures at all.

That's why it's recommended to use HMAC-SHA256 which is both widely available and secure.

In order for webhooks signatures to be secure, they have to include a secret component. Otherwise anyone would be able to create valid signatures by just following the signing algorithm. Sharing secrets is a pain, as you have to share them securely, and you don't want to overload your customers with too many secrets that they need to store securely on their end and use in their code.

It's common for webhook systems to support multiple endpoints per customer, so that they can fanout the same webhook to a few different destinations.

Because of the above two, it may be tempting to have just one endpoint secret per customer. It's the same customer anyway and they have access to the secret, so it surely doesn't matter.

The problem with this approach is that some of these endpoints may be third parties, and by sharing the same secret you're potentially giving these third parties (or anyone that hacks them) the keys to the kingdom. Even if all the endpoints are first party (owned by the receiver) sharing them increases the blast impact of any compromise of any of these destinations. So if one is compromised, data can be faked to all of the other ones as well, and you'd also need to rotate the secrets for those other ones.

Not all services are owned by the same people at the same company, and not all services follow the same security standards. E.g. maybe the webhooks are sent to the HR system, the billing system, and the marketing system. Maybe the operational security around the marketing system is more lax (e.g. more people have access) as it's not a core sensitive service. By sharing secrets, the security of the more sensitive systems depends on the less sensitive ones, which is not a good idea.

Asymmetric: signing all webhooks with the same key

There are many challenges with symmetric cryptography. Secure key exchange, keeping secrets secret, and the challenges we described in previous sections. There's a simple solution for that: asymmetric signatures.

We will discuss the advantages and disadvantages of asymmetric signature schemes in a future posts, but let's assume for the purpose of this section that we're utilizing an asymmetric signature scheme.

Well, an asymmetric signature scheme sounds easy! Because there's a public component that's used for verification and a secret component that's used for signing. We can just keep the secret part secret as the sender, and post the public part on our website. We will then sign all the payloads with the secret, and people can verify with the public key. Easy.

This is another good example of how security bugs can be subtle. The problem is that all the webhooks sent by your system are signed with the same key. This means that webhooks intended for user A will now also pass verification for user B! User A can therefore trick the service into sending it webhooks with specific values and can send them to B where they'll just pass!

This is much easier than it sounds. Many webhook solutions support sending arbitrary payloads and values for testing purposes, but even the ones that don't care often be tricked to sign malicious payloads that are sufficient for tricking at least some receivers.

The solution is to augment the signature scheme to also sign a server-controlled unique static value per user/consumer (doesn't need to be per endpoint in this case, per consumer is sufficient. So for example as part of the signature scheme you can sign the user ID in addition to the timestamp and payload; generating a unique signature per consumer.

Not protecting against replay attacks

In some scenarios, an attacker may be able to get a valid message with a proper signature that was processed successfully by the webhook consumer. While the attacker cannot change the webhook's payload (assuming the webhooks are signed!), they could still just send the webhook again and again. Depending on your use case (e.g. financial transactions), these replay attacks can have dire consequences.

Protecting against replay attacks is the responsibility of the receiver, but there are a few things the sender should do to make it possible for the receiver to avoid reprocessing the same webhook. The main two things are: (1) including a timestamp, and (2) including a unique message id.

The unique message id can be then used to check on the receiver end whether this webhook has already been processed, and if the answer is yes, it can just be dropped. Though store all of the IDs of all the messages ever processed in order to ensure messages are not processed twice is undesirable and oftentimes infeasible. This is where the timestamp comes in.

Every message should include the timestamp of when the delivery attempt was made. Webhook consumers can then verify that the timestamp is current (with some tolerance for clock drift, delivery times, etc.). For example, the consumer can check that the timestamp is now() plus/minus 5 minutes. Messages outside of this delivery will be automatically dropped.

By utilizing the fact that attempts older than 5 minutes will be dropped, we limited the replay attack to only be possible during a ~5 minute window. This means that we only need to retain the unique message IDs for ~5 minutes to check for duplication, which is a much more feasible things to do.

As an added bonus: having a unique per-message ID also means that consumers can protect against message duplication that happens by accident, and not necessarily by a malicious actor.

Not signing metadata

There is one gotcha with the above approach, both the timestamp and the unique ID have to be signed as well, otherwise an attacker can just take a valid signed message and change the timestamp and/or the unique ID.

Therefore it's paramount that the metadata will be signed in conjunction with the payload.

Using low-entropy secrets

Secrets, as the name implies, have to be secret to be effective. You can think of them like the equivalent of passwords for signature verification. If someone knows the password (signing secret), they can break the security of the webhooks.

Because of how webhook signature works, if someone gets a hold of a signed webhook request, they can try to bruteforce the signing secret offline. This is infeasible to do with sufficiently large secrets (e.g. 24 bytes). Though if the secret has insufficient entropy it can be cracked this way.

Not prefixing secrets (using just base64)

Another common issue with webhook implementations is that they don't prefix the secret with some known string that indicates that this is a secret. E.g. they would present a secret to a user as C2FVsBQIhrscChlQIMV+b5sSYspob7oD instead of whsec_C2FVsBQIhrscChlQIMV+b5sSYspob7oD.

You should always prefix secrets with a known prefix to make it easier for secret scanners (like the Github secret scanner) to detect when secrets have been accidentally committed to source control systems or otherwise leaked.

Not supporting zero-downtime secret rotation

Unlike most of the other items of this list, this is not a security issue, but rather an operational one. Many implementations only support signing a payload with one secret at the time. This makes it very hard (or in some cases impossible) to rotate a compromised secret without suffering any downtime.

Solution: support passing multiple alternative webhook signatures at once for each message, with the requirement that at least one should pass verification.

Assuming a consistent canonical form

Cryptographic signatures, by design, are sensitive to even the smallest variations in the message being signed. Even just adding a space at the end of the payload will cause the signature to be significantly different. This means that the payload being verified, and the payload being signed have to be identical.

The correct way of achieving that is ensuring that the payload is treated like a string or byte stream and not as JSON until it's been verified. Though many implementations assume, or even encourage a certain normalization process for payloads. In practice this can lead to security issues, or at the very least it can lead to undue load on the webhook receiver. JSON is a fairly significant spec and libraries behave differently on different platforms.

The solution, as mentioned, is to treat the payload as a string or byte stream until the payload is verified.

Verification issues on the receiver end

Many of the failure modes discussed above require cooperation from the receiver. E.g. the receiver needs to actually verify the payloads, protect against replay attacks by validating the timestamps and the idempotency keys, and the likes. It's important for webhook security that customers actually do all of that, and it's up to the sender to clarify that.

Making verification harder by lack of tools and documentation

As demonstrated in the examples above, there are some gotchas to be aware of when verifying webhooks so the easier you make it to verify your webhooks the more secure your customers' systems would be.

Oftentimes we would see webhooks documentation that looks something like this:

For security reasons you should verify all payloads using the webhook secret, this is how you do it:

const crypto = require('crypto')

signedContent = `${svix_id}.${svix_timestamp}.${body}`
const secret = 'whsec_5WbX5kEWLlfzsGNjH64I8lOOqUB6e8FH'

// Need to base64 decode the secret
const secretBytes = new Buffer(secret.split('_')[1], 'base64')
const signature = crypto.createHmac('sha256', secretBytes).update(signedContent).digest('base64')
console.log(signature)

Don't forget to also protect against replay attacks using the timestamp!

The documentation includes one example in Javascript, a language that potentially not everyone understand. As well as putting the burden of implementing the verification in their language of choice on the consumer and including an off-hand remark about needing to protect against replay attacks.

You should at the very least include complete examples in the languages your customers use, though even better would be to include pre-built libraries that they can just use that already do the right things (Standard Webhooks has open source libraries that you can use).

It's a wrap

These are some of the more common mistakes people make when designing, signing, and verifying webhook signatures. They are all solvable, they just require careful consideration. That's partially why we co-created Standard Webhooks to make it very easy to send and receive webhooks in a secure manner.

For more content like this, make sure to follow us on Twitter, Github or RSS for the latest updates for the Svix webhook service, or join the discussion on our community Slack.