Strong static typing, a hill I'm willing to die on...

Cover image

Svix is the enterprise ready webhooks sending service. With Svix, you can build a secure, reliable, and scalable webhook platform in minutes. Looking to send webhooks? Give it a try!

Edit: previous version of this post didn't have "static" in the title to keep it short (but the body did). Added it in the title for clarification.

Update: there is now a part 2 about using the type system effectively.

I've been writing software for over 20 years, and with every day that goes by I grow more certain that strong static typing is not just a good idea, but also almost always the right choice.

There are definitely uses for untyped languages (or language variants), for example they are much nicer when using a REPL, or for throwaway scripts in environments that are already hopelessly untyped (e.g. the shell). In almost every other case, however, strong typing is strongly preferred.

There are advantages to not using types, such as a faster development speed, but they pale in comparison to all of the advantages. To that I say:

Writing software without types lets you go at full speed. Full speed towards the cliff.

The question around strong static typing is simple: would you rather work a bit more and get invariants checked at compile-time (or type-checking time for non-compiled languages), or work a bit less and have them be enforced at runtime, or even worse not enforced even at runtime (JavaScript, I'm looking at you... 1 + "2" == 12).

Getting errors at runtime is a terrible idea. First, it means that you won't always catch them during development. Second, when you do catch them, it will happen in a customer facing manner. Yes, tests help, but writing tests for every possible mistyped function parameter is impossible given the endless possibilities. Even if you could, having types is much easier than testing for wrong types.

Types lead to less bugs

Types also offer annotations to code that benefit both humans and machines. Having types is a way to more strictly define the contract between different pieces of code.

Consider the following four examples. They all do exactly the same thing just with varying level of contract definition.

// Params: Name (a string) and age (a number).
function birthdayGreeting1(...params) {
    return `${params[0]} is ${params[1]}!`;
}

// Params: Name (a string) and age (a number).
function birthdayGreeting2(name, age) {
    return `${name} is ${age}!`;
}

function birthdayGreeting3(name: string, age: number): string {
    return `${name} is ${age}!`;
}

The first one doesn't even define the number of parameters, so it's hard to know what it does without reading the docs. I believe most people will agree the first one is an abomination and wouldn't write code like that. Though it's a very similar idea to typing, it's about defining the contract between the caller and the callee.

As for the second and the third, because of the typing, the third will need less documentation. The code is simpler, but admittedly, the advantages are fairly limited. Well, until you actually change this function...

In both the second and the third functions, the author assumes the age is a number. So it is absolutely fine to change the code as below:

// Params: Name (a string) and age (a number).
function birthdayGreeting2(name, age) {
    return `${name} will turn ${age + 1} next year!`;
}

function birthdayGreeting3(name: string, age: number): string {
    return `${name} will turn ${age + 1} next year!`;
}

The problem is that some of the places that use this code accept user input which was collected from an HTML input (so always a string). Which will result in:

> birthdayGreeting2("John", "20")
"John will turn 201 next year!"

While the typed version will correctly fail to compile because this function excepts age to be a number, not a string.

Having the contract between a caller and callee is important for a codebase, so that callers can know when callees change. This is especially important for an open source library, where the callers and the callees are not written by the same group of people. Without this contract it's impossible to know how things change when they do.

Types lead to a better development experience

Typing can also be used by IDEs and other development tools to vastly improve the development experience. You get notified as you code if any of your expectations are wrong. This significantly reduces cognitive load. You no longer need to remember the types of all the variables and the function in the context. The compiler will be there with you and tell you when something is wrong.

This also leads to a very nice additional benefit: easier refactoring. You can trust the compiler to let you know whether a change you make (e.g. the change in our example above) will break assumptions made elsewhere in the code or not.

Types also make it much easier to onboard new engineers to a codebase or library:

They can follow the type definitions to understand where things are used.
It's much easier to tinker with things as changes will trigger a compile error.

Let's consider the following changes to our above code:

class Person {
  name: string;
  age: number;
}

function birthdayGreeting2(person) {
    return `${person.name} will turn ${person.age + 1} next year!`;
}

function birthdayGreeting3(person: Person): string {
    return `${person.name} will turn ${person.age + 1} next year!`;
}

function main() {
  const person: Person = { name: "Hello", age: 12 };

  birthdayGreeting2(person);

  birthdayGreeting3(person);
}

It's very easy to see (or use your IDE to find) all the places where Person is used. You can see it's initiated in main and you can see it's used by birthdayGreeting3. However, in order to know it's used in birthdayGreeting2, you'd need to read the entire codebase.

The flip side of this is also that when looking at birthdayGreeting2, it's hard to know that it expects a Person as a parameter. Some of these things can be solved by exhaustive documentation, but: (1) why bother if you can achieve more with types? (2) documentation goes stale, here the code is the documentation.

It's very similar to how you wouldn't write code like:

// a is a person
function birthdayGreeting2(a) {
    b = a.name;
    c = a.age;
    return `${b} will turn ${c + 1} next year!`;
}

You would want to use useful variable names. Typing is the same, it's just variable names on steroids.

We encode everything in the type system

At Svix we love types. In fact, we try to encode as much information as we possibly can in the type system, so that all of the errors that can be caught at compile time will be caught at compile time; and also to squeeze that extra mileage of developer experience improvements.

For example, Redis is a string based protocol with no inherent typing. We use Redis for caching (among other things). The problem is that all of our nice typing benefits will be lost at the Redis layer, and bugs can happen.

Consider the following piece of code:

pub struct Person {
    pub id: String,
    pub name: String,
    pub age: u16,
}

pub struct Pet {
    pub id: String,
    pub owner: String,
}


let id = "p123";
let person = Person::new("John", 20);
cache.set(format!("person-{id}"), person);
// ...
let pet: Pet = cache.get(format!("preson-{id}"));

There are a couple of bugs in the snippet:

There's a typo in the second key name.
We are trying to load a person into a pet type.

To avoid such issues we do two things at Svix. The first is that we require the key to be of a certain type (not a generic string), and to create this type you need to call a specific function. The second thing we do, is force pairing a key to a value.

So the above example would look something like:

pub struct PersonCacheKey(String);

impl PersonCacheKey {
    fn new(id: &str) -> Self { ... }
}

pub struct Person {
    pub id: String,
    pub name: String,
    pub age: u16,
}

pub struct PetCacheKey;

pub struct Pet {
    pub id: String,
    pub owner: String,
}


let id = "p123";
let person = Person::new(id, "John", 20);
cache.set(PersonCacheKey::new(id), person);
// ...
// Compilation will fail on the next line
let pet: Pet = cache.get(PersonCacheKey::new(id));

This is already much better, and makes it impossible to get either of the previously mentioned bugs. Though we can do even better!

Consider the following function:

pub fn do_something(id: String) {
    let person: Person = cache.get(PersonCacheKey::new(id));
    // ...
}

There are a couple of problems with it. The first is that it's not very clear which id it should be for. Is it a person? A pet? It's very easy to accidentally call it with the wrong one, like in the following example:

let pet = ...;
do_something(pet.id); // <-- should be pet.owner!

The second is that we are losing the discoverability. It's a bit hard to know that a Pet has a relationship to a Person.

So at Svix, we have a special type for each id to ensure that there are no mistakes. The adjusted code looks something like:

pub struct PersonId(String);
pub struct PetId(String);

pub struct Person {
    pub id: PersonId,
    pub name: String,
    pub age: u16,
}

pub struct Pet {
    pub id: PetId,
    pub owner: PersonId,
}

This is indeed much better than our previous example.

There is still one issue. If we accept the ids from the API, how do we know that they are valid? All of the pet ids in Svix, for example, are prefixed with pet_ and are then followed by a Ksuid like so: pet_25SVqQSCVpGZh5SmuV0A7X0E3rw.

We want to be able to tell our customers that they pass the wrong id in the API, e.g. they pass a person id when a pet one is expected. One simple solution for this is to validate this (duh...) but it can be easy to forget to validate it everywhere that it's used.

So we enforce that a PetId can never be created without first being validated. This way we know that all of the code paths that create a PetId first make sure it's valid. This means that when we return a 404 Not Found to a customer because a pet isn't found in the database, we can be sure it was actually a valid id that wasn't found in the database. If it wasn't a valid id, we would have already returned a 422 or 400 when it was passed to the API handlers.

So why doesn't everyone like types?

The main topic of argument against types are:

Development speed
Learning curve and types complexity
The amount of effort and boilerplate required

First of all, I'd argue that even if all of the above were true, the advantages mentioned above are well worth the trouble. Though I also don't agree with all of the above.

The first one is development speed. Prototyping without types is definitely much faster. You can comment out pieces of the code and won't have a compiler complain to you. You can set the wrong values for some of the fields until you're ready to figure out the right ones, etc.

Though like I said above: "Writing software without types lets you go at full speed. Full speed towards the cliff." The problem is that this is just aggressive and unnecessary technical debt. You'll pay it multiple times over when you need to debug why your code doesn't work (either locally, in the test suite, or in production).

As for the learning curve: Yes, learning more things takes time. Though I'd say that most people don't need to be typing experts. They can just get by with very simple type expressions, and ask if they ever hit a wall. However, if you keep things simple, you'll probably rarely if ever hit one.

Additionally, people are already required to learn how to code, learn frameworks (React, Axum, etc.), and so many other things. I don't think the learning burden is as significant as it's made out to be.

Last, but not least, regarding the learning curve: I strongly believe that the benefits of a lessened learning curve by not having to know types, are much less than the benefits of using the type script to onboard on a specific codebase. Especially since learning types is a one time cost.

The last point is about the amount of effort and boilerplate required to use types in your codebase. I strongly believe that the amount of effort is actually less than the amount of effort required by not writing types.

Not using types requires significant documentation and testing in order to reach even a basic level of sanity. Documentation can go stale, and so can testing; and either way they require more effort than just adding the right types. Reading code with types is also easier, because you get the types inline instead of in the function documentation where it's in an inconsistent format with a lot of added noise.

Yes, typing can be a pain in languages that don't support inference, e.g Java can be tedious:

Edit: It seems that Java has type inference nowadays. Thanks pron for the correction on HN.

Person person1 = newPerson();
Person person2 = newPerson();
Person child = makeChild(person1, person2);

while other languages that have inference (like Rust), are much nicer:

let person1 = new_person();
let person2 = new_person();
let child = make_child(person1, person2);

So having the right tools definitely helps.

Speaking of tools, in order to reap the benefits of your typing, you probably need to use a code editor (or IDE) that supports modern code completion that is language aware.

Closing words

I can see both side of the arguments on many topics, such as vim vs. emacs, tabs vs. spaces, and even much more controversial ones. Though in this case, the costs are so low compared to the benefits that I just don't understand why anyone would ever choose not to use types.

I'd love to know what I'm missing, but until then: Strong typing is a hill I'm willing to die on.

Update: there is now a part 2 about using the type system effectively.

For more content like this, make sure to follow us on Twitter, Github or RSS for the latest updates for the Svix webhook service, or join the discussion on our community Slack.