- Tom Hacohen
Svix is the enterprise ready webhooks sending service. With Svix, you can build a secure, reliable, and scalable webhook platform in minutes. Looking to send webhooks? Give it a try!
At Svix, we send a lot of webhooks every month to many different webhooks consumers. This means that we see a lot of different HTTP services and servers, and therefore a lot of different quirks. We previously recorded a video about HTTP oddities that covers some of them. Today's post is not about an implementation quirk, but rather about a common misconfiguration when using SSL/TLS (going forward, I'll refer to it as TLS for simplicity).
When establishing a secure connection (e.g. HTTPS) the client verifies the server is indeed who they say they are by verifying the validity of their TLS certificate. This relatively simple step can fail for a variety of reasons, for example: the server having a valid certificate but for the wrong hostname (name mismatch) or the server using self-signed certificates (which the browser can't validate).
The issue we are discussing today is somehow similar to a self-signed certificate and it happens when there's an incomplete certificate chain. This happens when a server has a valid certificate, that's correctly signed by a trusted authority, but the server has no way of knowing that because it's missing some required intermediary certificates.
It's important to note that this issue doesn't just affect webhooks, it affects all TLS servers and thus all web servers. If a server exhibits this issue it will fail in a variety of other scenarios and this should be fixed regardless of whether this server consumes webhooks.
What are TLS certificate chains?
In order to understand what TLS certificate chains are, we first need to understand how certificate validation works in TLS.
When an client connects to a secure server it's presented with a certificate. The certificate includes various bits of information about the server, but most importantly it contains the host names this certificate is valid for, and a signature attesting to its validity.
Anyone can just sign a certificate. This is the case, for example, with self-signed certificates. So for the client to trust the signature, it should also trust the signer who signed it. This can happen in one of two ways: (1) it already knows and trusts the signer, or (2) it receives a certificate with information validating the signer is allowed to sign certificates that's signed by someone it trusts.
The above flow continues until either an untrusted certificate is met which causes a validation failure, or a trusted one which leads to a successful validation. This chain of certificates is called the certificate chain, and it usually ends with a certificate by a root certificate authority which should be included in the client's or the operating system's CA trust store, and therefore trusted.
Explaining the issue
The issue, as the name implies, happens when a chain is incomplete and the client can't establish a full chain from the server's certificate to one of its trusted root certificate authorities. Which means it can't validate the server and thus the connection fails.
When using the Svix product to send webhooks it will usually manifest with an error that looks something like this:
error making request: error trying to connect:
verify failed:../ssl/statem/statem_cint.c:1913::unable to get local issuer certificate
This is how it will look like when making a call to https://incomplete-chain.badssl.com/ which is test server configured to always return an incomplete chain:
% curl https://incomplete-chain.badssl.com/
curl: (60) SSL certificate problem: unable to get local issuer certificate
One way to check your server's own certificate chain is by using the OpenSSL CLI. For example, this is how you'll run it on the Svix website:
openssl s_client -connect svix.com:443 -showcerts
The Svix website uses Let's Encrypt, so you'll see three certificates. The first one is the svix.com certificate (signed by Let's Encrypt's R3 signing certificate):
0 s:CN = svix.com
i:C = US, O = Let's Encrypt, CN = R3
a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
v:NotBefore: Jun 23 11:08:00 2023 GMT; NotAfter: Sep 21 11:07:59 2023 GMT
The second one is Let's Encrypt intermediary certificate, which is signed by ISRG Root X1, Let's Encrypt's root certificate:
1 s:C = US, O = Let's Encrypt, CN = R3
i:C = US, O = Internet Security Research Group, CN = ISRG Root X1
a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
v:NotBefore: Sep 4 00:00:00 2020 GMT; NotAfter: Sep 15 16:00:00 2025 GMT
The third and last one is ISRG Root X1's certificate (mentioned above). While it should already be trusted by most clients, Let's Encrypt includes this certificate with a signature by DST Root CA X3 for better backwards compatibility.
You can read more about the Let's Encrypt's certificate chain here.
But it works for me!
Quite often, when we explain our customers what's going on, they respond saying that this URL works for them just fine, both in the browser and using the cURL command above. Why does that happen? How come it works for them?
When describing the chain of trust above, we were a bit vague about the intermediary certificates and who they are. Intermediary certificates are most commonly certificates owned by certificate authorities (like Let's Encrypt) which are signed by one of the root certificates (which are also owned by CAs). The reason why it's important for us is because this means that usually intermediary certificates sign a lot of certificates, and therefore it's likely we encounter the same intermediary certificate quite often from other websites.
Web browsers usually cache intermediary certificates, so if an intermediary certificate has already been received before, the browser will happily use the cached version even with the incomplete chain and will just work.
What about the discrepancy with cURL failing for some and working for others? Some operating systems, like macOS, have a system-wide certificate cache, so it's the same effect in play, just system-wide; other operating systems, like Linux, don't have a system-wide certificate cache, so this will fail there.
How to solve it
The solution is easy: Fix the configuration and make sure the server returns the full certificate chain in its TLS response excluding the root certificate.
You can also verify it's fixed by running the Qualys SSL Test on your server and making sure everything behaves correctly.