Understanding Church numerals

1. Introduction
2. Lambda calculus
3. Church numerals
4. Increasing a number
5. Arbitrary addition
- 5.1. The abstract objective
- 5.2. Building the procedure
  - 5.2.1. Replacing x in b
  - 5.2.2. Obtaining the body of a
6. Calling Church numerals

1. Introduction

Church numerals are a representation of the natural numbers in the context of lambda calculus. Although encoding numbers as functions may not be the most practical method, it has some interesting properties, specially when computing arithmetical operations.

Church numerals, as well as λ-calculus itself, was invented by the mathematician and computer scientist Alonzo Church.

This article was mainly motivated by Exercise 2.6 of Structure and Interpretation of Computer Programs. Furthermore, the original explanation about lambda calculus was moved from my article on the Y combinator.

2. Lambda calculus

Lambda calculus (λ-calculus) is a mathematical system for expressing computation based on function abstraction and application. As mentioned above, it was invented by Alonzo Church in the 1930s.

The simplicity of λ-calculus, along with its potential, makes it very interesting from both a theoretical and practical point of view. This is the reason why it’s the predecesor of functional programming in general, and many programming languages in particular (Lisp, Haskell, etc.).

2.1. Rules of lambda calculus

The lambda calculus syntax is so simple that it can be defined in the following three rules:

A variable \(x\) is a character or string that represents a parameter.
A lambda abstraction \((\lambda x. M)\) defines an anonymous function, with a parameter \(x\) (between the λ and the dot), that returns the body \(M\).
An application \((M\ N)\), where the function \(M\) is called with \(N\) as its argument.

In a function application, \(M\) and \(N\) are lambda terms. A lambda terms is simply a valid expression in the lambda calculus system. To determine if an expression is valid, the following three rules can be used:

A variable \(x\) is itself a valid lambda term.
If \(M\) is a lambda term, and \(x\) is a variable, then \((\lambda x. M)\) is also a lambda term.
If \(M\) and \(N\) are lambda terms, then \((M\ N)\) is also a lambda term.

The recursive nature of λ-calculus is present in the definition of the syntax itself, where:

Variables provide a way of naming an unknown lambda term.
Lambda abstractions provide a way of extracting part of a lambda term.
Applications provide a way of filling a variable, which was extracted/abstracted, with new a lambda term (the argument of the application).
Lambda terms themselves are any of the above.

At first glance, a system that doesn’t directly deal with specific data (such as numbers) may seem pointless, but it has helped shape many modern languages, and understanding how to use these simple rules in these complex systems can be extremely helpful.

2.2. Conversion and reduction operations

Since λ-calculus doesn’t deal with specific values such as numbers, the value of the system lies in the way lambda terms can be transformed.

Before diving into the transformation methods, it’s important to mention the substitution notation that is often used in this context. An expression like \(b[p := a]\), denotes the operation of substituting the parameter \(p\) with the argument \(a\) in the body \(b\). Similarly, the syntax \(M[x]\) is used to denote “a lambda expression \(M\) which contains a bound variable \(x\)”.

There are three kinds of transformations, which are used to convert lambda expressions to equivalent ones.

Alpha conversion (α-conversion)

Consists on renaming the bound variables in an expression. It is mainly used to avoid name collisions.

\[ (\lambda x. M[x]) \to (\lambda y. M[y]) \]

Beta reduction (β-reduction)

Consists on reducing an application by substituting the bound variable in the body of the abstraction with the argument of the application. Essentially, applying a function call.

\[ ((\lambda x. M)\ N) \to M[x := N] \]

In this case, the reduction consists on substituting \(x\) in the body \(M\) with \(N\).

The inverse of this process is called beta abstraction, which instead of simplifying an expression, adds an extra function call. This can be useful in languages such as Lisp for delaying the evaluation of the arguments.

\[ M[N] \to ((\lambda x. M)\ N) \]

Eta reduction (η-reduction)

Expresses the idea of extensionality, which applied to this context establishes that two functions are the same if and only if they give the same result for all arguments.

It is used to simplify redundant abstractions; for example, given that \(x\) does not appear free in \(f\):

\[ (\lambda x. f\ x) \to f \]

The inverse of this process is called eta expansion, which essentially wraps a function into a lambda abstraction:

\[ f \to (\lambda x. f\ x) \]

These transformations can be used to transform computations into simpler and more efficient ones.

2.3. Notation

Finally, to fully understand λ-notation, there are some conventions that should be kept in mind when it comes to notation.

Outermost parentheses are dropped: \((M N)\) can be noted as \(M N\).
Applications are assumed to be left associative: \(M N P\) is equivalent to \(((M N) P)\), instead of \((M (N P))\).
The body of an abstraction extends as far right as possible: \((\lambda x. M N P)\) means \((\lambda x. (M N P))\), instead of \(((\lambda x. M) N P)\) or \(((\lambda x. M N) P)\).
A sequence of abstractions can be contracted: \((\lambda x. \lambda y. \lambda z. N)\) can be expressed as \((\lambda x\ y\ z. N)\). This is related to the notion of currying.
When all variables are single-letter, the space in applications may be omitted: \((xyz)\) instead of \((x\ y\ z)\).

For the sake of readability, this article will intentionally avoid using some of these conventions (e.g. dropping outermost parentheses), specially for the expressions embedded in text.

3. Church numerals

Since λ-calculus does not support the decimal representation of numbers, they have to be encoded as functions. Usually, the value of the number number depends on the number of calls that are performed from the body of the function. The general method of encoding data as functions is called Church encoding, and the representation of a natural number in this context is called a Church numeral.

The following table shows the first four numbers encoded as church numerals in lambda notation and conventional function notation (after uncurrying them into two-argument functions).

Decimal	λ-notation	Function notation
0	\(\lambda f. \lambda x. x\)	\(\text{zero}(f, x) = x\)
1	\(\lambda f. \lambda x. f\ x\)	\(\text{one}(f, x) = f(x)\)
2	\(\lambda f. \lambda x. f\ (f\ x)\)	\(\text{two}(f, x) = f(f(x))\)
3	\(\lambda f. \lambda x. f\ (f\ (f\ x))\)	\(\text{three}(f, x) = f(f(f(x)))\)
…	…	…

Note how the decimal value matches the number of nested applications of \(f\) inside the body of the innermost lambda. Therefore, a Church numeral with a value of 3 is simply a function that does something three times. This notion of a number as a series of repeated operations will be referenced below.

4. Increasing a number

As mentioned in the introduction, this article was motivated by Exercise 2.6 of SICP. The exercise provides the reader with the previous definition of zero along with an add-1 procedure, which increases its input; from those definitions, the reader is expected to calculate one.

Although the value of one was shown in the previous section for illustrating Church numerals, it’s worth going over the expansion of add-1 in detail.

4.1. Definition of `add-1`

This is the definition of add-1 in λ-notation¹.

\[ \lambda n. \lambda f. \lambda x. f (n\ f\ x) \]

This is the equivalent definition of add-1 in Lisp.

(define add-1
  (lambda (n)
    (lambda (f)
      (lambda (x)
        (f ((n f) x))))))

4.2. Beta reduction

A call to add-1 with an argument of zero can be examined through β-reduction to understand why the Church numeral for one is returned. Each step of the reduction will be highlighted and explained using Lisp syntax.

First, the call to add-1 is expanded, replacing each symbol with its bound expression.

Then, the outermost call is evaluated, replacing the n parameter in the body of add-1 with the value of zero. Specifically, to avoid naming collisions, n is replaced with the α-conversion of zero, where the f and x symbols are substituted with g and y, respectively.

Once the outer call has been evaluated, the expression can be simplified by evaluating the call to zero, that is, the call to the lambda whose parameter is y, and whose argument is f. Since the parameter g does not appear in the body, is is essentially discarded, resulting in (lambda (y) y), the identity function.

Finally, the call to the identity lambda can be evaluated, that is, the call to the lambda whose parameter is z and whose argument is x. A call to this function simply evaluates to the argument, in this case, x.

This final expression matches the definition of one from the previous section: a function that receives a function f, and returns another function that applies f to its argument x.

Naturally, the same β-reduction process can be expressed in λ-notation.

\begin{align*} \text{one} &= (\lambda n. \lambda f. \lambda x. f\ (n\ f\ x)) (\lambda f. \lambda x. x) \\ &= \lambda f. \lambda x. f\ ((\lambda g. \lambda y. y)\ f\ x) \\ &= \lambda f. \lambda x. f\ ((\lambda y. y)\ x) \\ &= \lambda f. \lambda x. f\ x \end{align*}

4.3. Calculating `two`

The same process can be done to β-reduce a call to add-1 with an argument of one. From the previous definitions:

\begin{align*} \text{add-1} &= \lambda n. \lambda f. \lambda x. f\ (n\ f\ x) \\ \text{one} &= \lambda f. \lambda x. f\ x \\ \end{align*}

The following β-reduction process can be performed:

\begin{align} \text{two} &= (\lambda n. \lambda f. \lambda x. f\ (n\ f\ x)) (\lambda f. \lambda x. f\ x) \label{eq:definition} \\ &= \lambda f. \lambda x. f\ ((\lambda g. \lambda y. g\ y)\ f\ x) \label{eq:reduction} \\ &= \lambda f. \lambda x. f\ ((\lambda y. f\ y)\ x) \label{eq:reduction2} \\ &= \lambda f. \lambda x. f\ (f\ x) \label{eq:solution} \end{align}

The initial expression \(\eqref{eq:definition}\) is simply the call to add-1, replacing the symbols with the values defined above. Then, the call is β-reduced in \(\eqref{eq:reduction}\) by replacing the parameter \(n\) with the argument one, which was α-converted to avoid parameter collisions, just like in the example from the previous section. The call to the innermost lambda is β-reduced in \(\eqref{eq:reduction2}\) produce \((\lambda y. f\ y)\), that is, a function that applies \(f\) to its argument. Finally, the call to that lambda, whose argument is \(x\), is also reduced into \((f\ x)\); this result \(\eqref{eq:solution}\) matches the previous definition of two.

5. Arbitrary addition

Although the previous section explained how to increase a Church numeral, the add-1 function was already provided. The second part of the exercise asks for a definition of the addition procedure, avoiding repeated calls to add-1.

It is helpful to analyze the inputs of this new addition function, along with the expected output. In Lisp, these would be defined as the following three expressions.

(lambda (f) (lambda (x) (f (f x))))              ; Input `a': Two
(lambda (f) (lambda (x) (f (f (f x)))))          ; Input `b': Three
(lambda (f) (lambda (x) (f (f (f (f (f x)))))))  ; Result: Five

5.1. The abstract objective

From the previous expressions, it can be concluded that the new addition function should essentially replace x (in either expression, since addition is commutative) with the entire body of the other expression’s innermost lambda. This abstract process is represented in the following figure, which shows the expected behavior when adding the Church numeral representation of 2 and 3.

For simplicity, this approach will be used for the rest of the section, where the body of the first argument a is “placed” in the second argument b.

5.2. Building the procedure

Since the parameters of add are known, its initial structure can be defined. The following figure shows the partial definition, which will evolve trough this section.

Since the function will manipulate the bodies of the innermost lambdas, it must be able to access them. Specifically, it needs to:

Extract the innermost body from a, that is, (f (f x)).
Replace x in the innermost body of b.

It might be easier to understand this process by first examining the second step, the replacement of x.

5.2.1. Replacing `x` in `b`

Since the numbers themselves are functions, replacing x in the inner body of b should be as simple as calling its inner lambda with the desired value as an argument. To access and call the inner lambda, the outer lambda, whose parameter is f, needs to be called first, but it obviously needs some value as its argument.

As noted above, Church numerals express “repeated operations”, and their f argument represents the function that performs the operation itself. Since the result of adding two Church numerals must be another Church numeral, which will also express some “repeated operation”, there is no way for add to know the operation that the result is expected to repeat; therefore, the result must also be a function with an f parameter, which lets the caller specify the function to be repeated.

Since the result of add now contains a lambda with an f parameter, the (currently unknown) value of f can be used to call b, accessing the inner lambda.

Although there is now a way of calling the inner lambda of b for altering x, it’s still not clear how to obtain the argument: the body of a.

5.2.2. Obtaining the body of `a`

The inner body of a can be obtained through a similar process. Just like when calling b, the value of f can be used for calling the outer lambda; however, in this case the inner lambda needs to be called for retrieving the body.

A similar problem emerges: what value should be used as the x argument of the inner lambda of a? Going back to the “repeated operations”, x represents the input of the operation, in other words, the value to which the operation should be applied. Just like with the function itself, add cannot know this value when computing its result, so it must use another generic lambda that receives x.

With this, the previous two steps of extraction and replacement can be performed. The following figure shows the final definition of add in Lisp.

Note that the previous Lisp definition “replaces” the body of a into b, but the conventional definition in λ-notation is:

\[ \text{add} = \lambda ab. \lambda f. \lambda x. a\ f\ (b\ f\ x) \]

Which would move the body of b into a. Again, this is not an issue because addition is commutative.

6. Calling Church numerals

To further illustrate why Church numerals are functions that represent repeated operations, a call to one of them will be made below. First, a simple definition of increase, which adds one to its argument:

(define increase
  (lambda (val)
    (+ val 1)))

(increase 5)  ; Returns 6

This increase function can be used as the f argument when calling a Church numeral, and the result will be another function which applies the operation N times.

(define church-three
  (lambda (f)
    (lambda (x)
      (f (f (f x))))))

(define increase-by-three (church-three increase))

In the previous example, increase-by-three is a function which receives an argument x and applies increase to it 3 times. Therefore, these two expressions are equivalent:

(increase-by-three 10)               ; Returns 13
(increase (increase (increase 10)))  ; Returns 13

Naturally, this is extremely versatile, as it can be done with any functions as an argument:

(define square
  (lambda (val)
    (* val val)))

;; The following expression is equivalent to:
;;
;;   (square (square (square 3)))
;;
;; Which returns:
;;
;;   ((3^2)^2)^2 = 3^(2*2*2) = 3^8 = 6561
;;
((church-three square) 3)

Footnotes:

The expression can be simplified through currying, that is, by transforming multiple lambdas that receive one argument into a single lambda that receives multiple arguments (in this case, both \(f\) and \(x\)). The simplified definition would be \(\lambda n. \lambda f\ x. f (n f x)\).