Understanding Church numerals
Table of Contents
1. Introduction
Church numerals are a representation of the natural numbers in the context of lambda calculus. Although encoding numbers as functions may not be the most practical method, it has some interesting properties, specially when computing arithmetical operations.
Church numerals, as well as λ-calculus itself, was invented by the mathematician and computer scientist Alonzo Church.
This article was mainly motivated by Exercise 2.6 of Structure and Interpretation of Computer Programs. Furthermore, the original explanation about lambda calculus was moved from my article on the Y combinator.
2. Lambda calculus
Lambda calculus (λ-calculus) is a mathematical system for expressing computation based on function abstraction and application. As mentioned above, it was invented by Alonzo Church in the 1930s.
The simplicity of λ-calculus, along with its potential, makes it very interesting from both a theoretical and practical point of view. This is the reason why it’s the predecesor of functional programming in general, and many programming languages in particular (Lisp, Haskell, etc.).
2.1. Rules of lambda calculus
The lambda calculus syntax is so simple that it can be defined in the following three rules:
- A variable \(x\) is a character or string that represents a parameter.
- A lambda abstraction \((\lambda x. M)\) defines an anonymous function, with a parameter \(x\) (between the λ and the dot), that returns the body \(M\).
- An application \((M\ N)\), where the function \(M\) is called with \(N\) as its argument.
In a function application, \(M\) and \(N\) are lambda terms. A lambda terms is simply a valid expression in the lambda calculus system. To determine if an expression is valid, the following three rules can be used:
- A variable \(x\) is itself a valid lambda term.
- If \(M\) is a lambda term, and \(x\) is a variable, then \((\lambda x. M)\) is also a lambda term.
- If \(M\) and \(N\) are lambda terms, then \((M\ N)\) is also a lambda term.
The recursive nature of λ-calculus is present in the definition of the syntax itself, where:
- Variables provide a way of naming an unknown lambda term.
- Lambda abstractions provide a way of extracting part of a lambda term.
- Applications provide a way of filling a variable, which was extracted/abstracted, with new a lambda term (the argument of the application).
- Lambda terms themselves are any of the above.
At first glance, a system that doesn’t directly deal with specific data (such as numbers) may seem pointless, but it has helped shape many modern languages, and understanding how to use these simple rules in these complex systems can be extremely helpful.
2.2. Conversion and reduction operations
Since λ-calculus doesn’t deal with specific values such as numbers, the value of the system lies in the way lambda terms can be transformed.
Before diving into the transformation methods, it’s important to mention the substitution notation that is often used in this context. An expression like \(b[p := a]\), denotes the operation of substituting the parameter \(p\) with the argument \(a\) in the body \(b\). Similarly, the syntax \(M[x]\) is used to denote “a lambda expression \(M\) which contains a bound variable \(x\)”.
There are three kinds of transformations, which are used to convert lambda expressions to equivalent ones.
- Alpha conversion (α-conversion)
Consists on renaming the bound variables in an expression. It is mainly used to avoid name collisions.
\[ (\lambda x. M[x]) \to (\lambda y. M[y]) \]
- Beta reduction (β-reduction)
Consists on reducing an application by substituting the bound variable in the body of the abstraction with the argument of the application. Essentially, applying a function call.
\[ ((\lambda x. M)\ N) \to M[x := N] \]
In this case, the reduction consists on substituting \(x\) in the body \(M\) with \(N\).
The inverse of this process is called beta abstraction, which instead of simplifying an expression, adds an extra function call. This can be useful in languages such as Lisp for delaying the evaluation of the arguments.
\[ M[N] \to ((\lambda x. M)\ N) \]
- Eta reduction (η-reduction)
Expresses the idea of extensionality, which applied to this context establishes that two functions are the same if and only if they give the same result for all arguments.
It is used to simplify redundant abstractions; for example, given that \(x\) does not appear free in \(f\):
\[ (\lambda x. f\ x) \to f \]
The inverse of this process is called eta expansion, which essentially wraps a function into a lambda abstraction:
\[ f \to (\lambda x. f\ x) \]
These transformations can be used to transform computations into simpler and more efficient ones.
2.3. Notation
Finally, to fully understand λ-notation, there are some conventions that should be kept in mind when it comes to notation.
- Outermost parentheses are dropped: \((M N)\) can be noted as \(M N\).
- Applications are assumed to be left associative: \(M N P\) is equivalent to \(((M N) P)\), instead of \((M (N P))\).
- The body of an abstraction extends as far right as possible: \((\lambda x. M N P)\) means \((\lambda x. (M N P))\), instead of \(((\lambda x. M) N P)\) or \(((\lambda x. M N) P)\).
- A sequence of abstractions can be contracted: \((\lambda x. \lambda y. \lambda z. N)\) can be expressed as \((\lambda x\ y\ z. N)\). This is related to the notion of currying.
- When all variables are single-letter, the space in applications may be omitted: \((xyz)\) instead of \((x\ y\ z)\).
For the sake of readability, this article will intentionally avoid using some of these conventions (e.g. dropping outermost parentheses), specially for the expressions embedded in text.
3. Church numerals
Since λ-calculus does not support the decimal representation of numbers, they have to be encoded as functions. Usually, the value of the number number depends on the number of calls that are performed from the body of the function. The general method of encoding data as functions is called Church encoding, and the representation of a natural number in this context is called a Church numeral.
The following table shows the first four numbers encoded as church numerals in lambda notation and conventional function notation (after uncurrying them into two-argument functions).
Decimal | λ-notation | Function notation |
---|---|---|
0 | \(\lambda f. \lambda x. x\) | \(\text{zero}(f, x) = x\) |
1 | \(\lambda f. \lambda x. f\ x\) | \(\text{one}(f, x) = f(x)\) |
2 | \(\lambda f. \lambda x. f\ (f\ x)\) | \(\text{two}(f, x) = f(f(x))\) |
3 | \(\lambda f. \lambda x. f\ (f\ (f\ x))\) | \(\text{three}(f, x) = f(f(f(x)))\) |
… | … | … |
Note how the decimal value matches the number of nested applications of \(f\) inside the body of the innermost lambda. Therefore, a Church numeral with a value of 3 is simply a function that does something three times. This notion of a number as a series of repeated operations will be referenced below.
4. Increasing a number
As mentioned in the introduction, this article was motivated by Exercise 2.6 of
SICP. The exercise provides the reader with the previous definition of zero
along with an add-1
procedure, which increases its input; from those
definitions, the reader is expected to calculate one
.
Although the value of one
was shown in the previous section for illustrating
Church numerals, it’s worth going over the expansion of add-1
in detail.
4.1. Definition of add-1
This is the definition of add-1
in λ-notation1.
\[ \lambda n. \lambda f. \lambda x. f (n\ f\ x) \]
This is the equivalent definition of add-1
in Lisp.
(define add-1 (lambda (n) (lambda (f) (lambda (x) (f ((n f) x))))))
4.2. Beta reduction
A call to add-1
with an argument of zero
can be examined through β-reduction
to understand why the Church numeral for one
is returned. Each step of the
reduction will be highlighted and explained using Lisp syntax.
First, the call to add-1
is expanded, replacing each symbol with its bound
expression.
Then, the outermost call is evaluated, replacing the n
parameter in the body of
add-1
with the value of zero
. Specifically, to avoid naming collisions, n
is
replaced with the α-conversion of zero
, where the f
and x
symbols are
substituted with g
and y
, respectively.
Once the outer call has been evaluated, the expression can be simplified by
evaluating the call to zero
, that is, the call to the lambda whose parameter is
y
, and whose argument is f
. Since the parameter g
does not appear in the body,
is is essentially discarded, resulting in (lambda (y) y)
, the identity function.
Finally, the call to the identity lambda can be evaluated, that is, the call to
the lambda whose parameter is z
and whose argument is x
. A call to this function
simply evaluates to the argument, in this case, x
.
This final expression matches the definition of one
from the previous section: a
function that receives a function f
, and returns another function that applies f
to its argument x
.
Naturally, the same β-reduction process can be expressed in λ-notation.
\begin{align*} \text{one} &= (\lambda n. \lambda f. \lambda x. f\ (n\ f\ x)) (\lambda f. \lambda x. x) \\ &= \lambda f. \lambda x. f\ ((\lambda g. \lambda y. y)\ f\ x) \\ &= \lambda f. \lambda x. f\ ((\lambda y. y)\ x) \\ &= \lambda f. \lambda x. f\ x \end{align*}4.3. Calculating two
The same process can be done to β-reduce a call to add-1
with an argument of
one
. From the previous definitions:
The following β-reduction process can be performed:
\begin{align} \text{two} &= (\lambda n. \lambda f. \lambda x. f\ (n\ f\ x)) (\lambda f. \lambda x. f\ x) \label{eq:definition} \\ &= \lambda f. \lambda x. f\ ((\lambda g. \lambda y. g\ y)\ f\ x) \label{eq:reduction} \\ &= \lambda f. \lambda x. f\ ((\lambda y. f\ y)\ x) \label{eq:reduction2} \\ &= \lambda f. \lambda x. f\ (f\ x) \label{eq:solution} \end{align}
The initial expression \(\eqref{eq:definition}\) is simply the call to add-1
,
replacing the symbols with the values defined above. Then, the call is
β-reduced in \(\eqref{eq:reduction}\) by replacing the parameter \(n\) with the
argument one
, which was α-converted to avoid parameter collisions, just
like in the example from the previous section. The call to the innermost lambda
is β-reduced in \(\eqref{eq:reduction2}\) produce \((\lambda y. f\ y)\), that
is, a function that applies \(f\) to its argument. Finally, the call to that
lambda, whose argument is \(x\), is also reduced into \((f\ x)\); this result
\(\eqref{eq:solution}\) matches the previous definition of two
.
5. Arbitrary addition
Although the previous section explained how to increase a Church numeral, the
add-1
function was already provided. The second part of the exercise asks for a
definition of the addition procedure, avoiding repeated calls to add-1
.
It is helpful to analyze the inputs of this new addition function, along with the expected output. In Lisp, these would be defined as the following three expressions.
(lambda (f) (lambda (x) (f (f x)))) ; Input `a': Two (lambda (f) (lambda (x) (f (f (f x))))) ; Input `b': Three (lambda (f) (lambda (x) (f (f (f (f (f x))))))) ; Result: Five
5.1. The abstract objective
From the previous expressions, it can be concluded that the new addition
function should essentially replace x
(in either expression, since addition is
commutative) with the entire body of the other expression’s innermost
lambda. This abstract process is represented in the following figure, which
shows the expected behavior when adding the Church numeral representation of 2
and 3.
For simplicity, this approach will be used for the rest of the section, where
the body of the first argument a
is “placed” in the second argument b
.
5.2. Building the procedure
Since the parameters of add
are known, its initial structure can be defined. The
following figure shows the partial definition, which will evolve trough this
section.
Since the function will manipulate the bodies of the innermost lambdas, it must be able to access them. Specifically, it needs to:
- Extract the innermost body from
a
, that is,(f (f x))
. - Replace
x
in the innermost body ofb
.
It might be easier to understand this process by first examining the second
step, the replacement of x
.
5.2.1. Replacing x
in b
Since the numbers themselves are functions, replacing x
in the inner body of b
should be as simple as calling its inner lambda with the desired value as an
argument. To access and call the inner lambda, the outer lambda, whose parameter
is f
, needs to be called first, but it obviously needs some value as its
argument.
As noted above, Church numerals express “repeated operations”, and their f
argument represents the function that performs the operation itself. Since the
result of adding two Church numerals must be another Church numeral, which will
also express some “repeated operation”, there is no way for add
to know the
operation that the result is expected to repeat; therefore, the result must also
be a function with an f
parameter, which lets the caller specify the function to
be repeated.
Since the result of add
now contains a lambda with an f
parameter, the
(currently unknown) value of f
can be used to call b
, accessing the inner
lambda.
Although there is now a way of calling the inner lambda of b
for altering x
,
it’s still not clear how to obtain the argument: the body of a
.
5.2.2. Obtaining the body of a
The inner body of a
can be obtained through a similar process. Just like when
calling b
, the value of f
can be used for calling the outer lambda; however,
in this case the inner lambda needs to be called for retrieving the body.
A similar problem emerges: what value should be used as the x
argument of the
inner lambda of a
? Going back to the “repeated operations”, x
represents the
input of the operation, in other words, the value to which the operation should
be applied. Just like with the function itself, add
cannot know this value when
computing its result, so it must use another generic lambda that receives x
.
With this, the previous two steps of extraction and replacement can be
performed. The following figure shows the final definition of add
in Lisp.
Note that the previous Lisp definition “replaces” the body of a
into b
, but the
conventional definition in λ-notation is:
\[ \text{add} = \lambda ab. \lambda f. \lambda x. a\ f\ (b\ f\ x) \]
Which would move the body of b
into a
. Again, this is not an issue because
addition is commutative.
6. Calling Church numerals
To further illustrate why Church numerals are functions that represent repeated
operations, a call to one of them will be made below. First, a simple definition
of increase
, which adds one to its argument:
(define increase (lambda (val) (+ val 1))) (increase 5) ; Returns 6
This increase
function can be used as the f
argument when calling a Church numeral,
and the result will be another function which applies the operation N times.
(define church-three (lambda (f) (lambda (x) (f (f (f x)))))) (define increase-by-three (church-three increase))
In the previous example, increase-by-three
is a function which receives an
argument x
and applies increase
to it 3 times. Therefore, these two expressions
are equivalent:
(increase-by-three 10) ; Returns 13 (increase (increase (increase 10))) ; Returns 13
Naturally, this is extremely versatile, as it can be done with any functions as an argument:
(define square (lambda (val) (* val val))) ;; The following expression is equivalent to: ;; ;; (square (square (square 3))) ;; ;; Which returns: ;; ;; ((3^2)^2)^2 = 3^(2*2*2) = 3^8 = 6561 ;; ((church-three square) 3)