Proving that well-typed programs will not get type errors during runtime
I came across this theorem recently, and thought that it’s an interesting result. The proof itself isn’t particularly flashy, but I think the result is relevant to every prorammer’s daily life and that’s pretty cool. My hope is to explain it enough that a programmer with basic logic experience can understand the proof. I’ll explain notation along the way.
The Theorem
Theorem:
Notation
To break this down:
- is an environment. Formally that means it’s a finite set of pairs of the form {x:T} where x is a (lambda) variable and T is a type. Informally, you can think of this like a scope/environment in a program.
- is sometimes called a “turnstile”. would be read as “M is derived from ”.
- can be read as “M proves ”. Alternatively, you could also read it as “M is a term of type ” (The fact that you can read it either way is itself extremely intresting and an important correspondence)
- , called multi-step beta reduction, is the transitive and reflexive closure of beta reduction. To break that down more
- For transitive closure, imagine you had a relation R that meant “can fly to”. So New York R Los Angeles would mean you can fly directly from New York to Los Angeles. The transitive closure of this operation would mean something like “can fly to by taking one or more flights”. So for example, Los Angeles -> New York -> Amsterdam would be a valid statement.
- Beta reduction, written as (note the single arrowhead instead of the two above), satisfies the relation .
- is a concept from lambda calculus. If you’re not familiar, basically is a function that takes a single variable x as input, and returns P as output. This function is then applied to Q. P and Q can be arbitrary values.
- is substitution. This means for every instance of a variable “x” in P (which might be a program or a function), replace it with Q
Interpretation
So basically this is saying that given a certain environment (), we know that some term M has type ; and we also know that M eventually reduces to N - given those two conditions, we can also conclude that in the same environment, N must also have type .
In a practical setting, this means that if you write a program where a function takes an integer
argument, and you mutate that int several times, like adding and subtracting from it, or modifying it with another helper function, you’re still going to end up with an int, and none of those intermediate operations should throw type mismatch errors. In a language like python, which doesn’t type-check ahead of runtime, you can run into this exact issue. But in a typed, statically compiled language like Java, the compiler will error and force you to fix this mistake before it can even finish compiling.
The Proof
We use induction on .
Base case
M is the result of applying a function to a term Q. Since M has type , then must have type (where tau is an arbitrary type) and Q must have type .
Generally, “x must have type (in environment )” can be written as .
So then we have and .
We can rewrite again as . We’re allowed to do this because of the rules of the sequent calculus. Unfortunately, you’ll have to trust me on that because explaining the sequent calculus as well is too much material for this post.
- For an informal treatment, you can remember that means that M is derived from , and means that M is derived from and x. So you could say , which is equal to . In our notation, this could be written
Now we have and . The first statement means that if we have an environment , and a variable x with type in that environment, we can then derive P which has type . Luckily, we now have Q with type . So we can substitute Q for x - - and we now have . This is the result we wanted to show for N.
General case
Assuming it holds for {M trans reduces N}, show that it holds for {M’ reduces to M, trans reduces to N}
We showed in the base case that if M’ has type then M must also have type . Given our assumption - if and M has type then N has type as well - then N must have the same type as M’.
The End
So that’s it. Unfortunately, there are probably some explanations missing, particularly surrounding our notation and its underlying rules. For more information, you can google for “sequent calculus”