Understanding the Y combinator

1. Quick introduction to lambda calculus
2. SICP, Lisp and JavaScript
3. Simple example: Factorial
4. Simple recursion with anonymous functions
5. Fixed points
6. Fixed-point combinators
7. Y combinator
8. Applications of the Y combinator
9. Implementation in Scheme
10. Final note

1. Quick introduction to lambda calculus

Lambda calculus (λ-calculus) is a mathematical system for computation based on function abstraction and application, using variable binding and substitution.¹

1.1. Rules

The beauty of lambda calculus is its simplicity. These are the only 3 rules:

\(x\): A variable represents a parameter.
\((\lambda x. M)\): A lambda abstraction is an anonymous function, with a parameter \(x\) (between the λ and the dot), that returns the body \(M\).
\((M\ N)\): An application of function \(M\) to an argument \(N\).

In the last rule, both \(M\) and \(N\) are lambda terms.

1.2. Lambda terms

A lambda term is just a valid expression in the lambda calculus system. Well, what makes an expression valid? The following 3 rules are used to determine if a lambda expression is valid:

A variable \(x\) is itself a valid lambda term.
If \(M\) is a lambda term, and \(x\) is a variable, then \((\lambda x. M)\) is also a lambda term (An abstraction).
If \(M\) and \(N\) are lambda terms, then \((M\ N)\) is also a lambda term (An application).

You can begin to feel the recursive magic of lambda calculus, before even getting into an example.

1.3. Reduction operations

Lambda calculus also has 2 main reduction operations:

\((\lambda x. M[x])\) → \((\lambda y. M[y])\): Alpha conversion (α-conversion), renaming the bound variables in the expression. Used to avoid name collisions.
\(((\lambda x. M) N)\) → \((M[x := N])\): Beta reduction (β-reduction), replacing the bound variables (\(x\)) with the argument expression (\(N\)) in the body (\(M\)) of the abstraction.

There is also eta reduction (η-reduction), which expresses the idea of extensionality,² which applied to this context establishes that two functions are the same if and only if they give the same result for all arguments.

Later we will also mention the opposite of the beta reduction, the beta abstraction, which instead of simplifying an expression, it adds an extra function call that might be useful (e.g. in Lisps) for delaying the evaluation of the arguments.

1.4. Notation

For understanding lambda notation, you will also have to keep in mind these conventions:

Outermost parentheses are dropped: \(M N\) instead of \((M N)\).
Applications are assumed to be left associative: \(M N P\) instead of \(((M N) P)\).
The body of an abstraction extends as far right as possible: \((\lambda x. M N)\) means \((\lambda x. (M N))\) and not \(((\lambda x. M) N)\).
A sequence of abstractions is contracted: \((\lambda x. \lambda y. \lambda z. N)\) is abbreviated as \((\lambda x\ y\ z. N)\).
When all variables are single-letter, the space in applications may be omitted: \((M N P)\) instead of \((M\ N\ P)\).

Some of these look a bit confusing to me, specially when embedding expressions in text, so I will try to make each expression as readable as possible.

2. SICP, Lisp and JavaScript

Structure and Interpretation of Computer Programs is an amazing book by Harold Abelson and Gerald Jay Sussman. The book teaches the fundamental principles of computer programming, including recursion, abstraction, modularity, and much more. I recommend the book to anyone who is interested in programming, I am sure they will learn something.

This book has two editions. The first one uses Scheme (a dialect of Lisp), for its examples and explanations, while the second uses JavaScript. I have not programmed much in JavaScript at the time of writing this, but I will try to provide all examples in both lambda notation, Scheme and JavaScript.

At one point, in section 1.3.3, they talk about finding fixed points of functions. This was the first time I heard about this, and it’s what mainly motivated me to write this article. If you are interested on the general Lisp approach, and not on the Y combinator itself, I recommend you check it out.

One of the most valuable things that SICP has taught me is that sometimes it’s extremely useful to treat functions as black boxes that are able to transform some inputs into some outputs. They call this wishful thinking, and it has been useful not only when using functions, but also when designing them.

3. Simple example: Factorial

This is the function for calculating the factorial of a number \(n\), using the lambda calculus notation:

\[ \text{fact} = \lambda n. \Big(\Big(\text{iszero}\ n\Big) 1 \Big(\text{mult}\ n \ \big(\text{fact}\ (\text{prec}\ n)\big)\Big)\Big) \]

In Scheme:

(define fact
  (lambda (n)
    (if (equal? n 0)
        1
        (mult n (fact (prec n))))))

Or in JavaScript:

var fact = (n) => (n == 0)
    ? 1
    : mult(n, (fact(prec(n))));

We are defining fact as a function that takes a parameter n. This function returns 1 if n is zero, and otherwise multiplies n by the factorial of the number preceding n.

In this case, we can simply ignore how iszero, mult, prec and even fact work internally, we just have to trust that they do what we expect. Another useful way of thinking about lambda calculus and Lisp in general is as a language for expressing processes.

In any case, we don’t have those name-defining commodities in lambda calculus. A function can’t call itself by name, so we will have to find an alternative way.

4. Simple recursion with anonymous functions

Before trying to understand the Y combinator, let’s have a look at an example of how an anonymous function might call itself without the need for symbols.

\[ (\lambda x. x\ x)(\lambda x. x\ x) \]

Or in Scheme:

((lambda (x) (x x))
 (lambda (x) (x x)))

Note: Depending on the Lisp, you might need to use (funcall x x) instead of (x x), since variables and functions don’t share the same namespace. You can search about the differences between Lisp-1 and Lisp-2.

Or in JavaScript:

((x) => x(x))((x) => x(x))

We can see that the two parenthesized expressions are identical, and that the first is applied to the second one. Let’s try to simplify it by β-reduction. The first parenthesized expression, is applied to the second one. We replace each occurrence of \(x\) in the body of the first expression with the whole second parenthesized expression.

We are right back where we started. This function would call itself indefinitely, and a similar form will be used for the Y combinator below.

5. Fixed points

Before getting into the fixed-point combinators, we need to define what a fixed point is.

A fixed point of function \(f\) is a value that is mapped to itself by the function.³ In other words, \(x\) is a fixed point of \(f\) if \(f(x) = x\). For this to be possible, \(x\) has to belong to both the domain of \(f\) (set of values that it can take), and the codomain of \(f\) (set of values that it can return).

For example, if \(f(x) = x!\), 1 and 2 are fixed points, since \(f(1) = 1\) and \(f(2) = 2\).

The image shows the graph of a function \(f\), with 3 fixed points. When plotting with \(y = f(x)\), these 3 points were also on the line \(x = y\).

For example, for some functions \(f\), we can locate a fixed point by beginning with an initial guess and applying \(f\) repeatedly.

\[ f(x),\quad f(f(x)),\quad f(f(f(x))),\quad \dots, \]

We would do that until the value doesn’t change very much, and we are satisfied with the result.

6. Fixed-point combinators

A fixed-point combinator is a higher-order function (i.e. a function that takes a function as argument) that returns some fixed point of its argument function.⁴

So, if a function fix is a fixed-point combinator, a function f has one or more fixed points, then fix(f) is one of these fixed points:

\[ f(\text{fix}\ f) = \text{fix}\ f \]

In lambda calculus, every function has a fixed point.

7. Y combinator

An example of a fixed-point combinator is the Y combinator. This is the definition of \(Y\).

\[ Y = \lambda f. \big(\lambda x. f (x\ x)\big) \big(\lambda x. f (x\ x)\big) \]

Or in Scheme:

(define Y
  (lambda (f)
    ((lambda (x) (f (x x)))
     (lambda (x) (f (x x))))))

Note: This version is not accurate, see Implementation in Scheme below.

Or in JavaScript:

var Y = (f) =>
    ((x) => f(x(x)))(
     (x) => f(x(x)));

Since it’s a fixed-point combinator, calling \(Y\) with a function as its argument would be reduced to \(Y\ f = f(Y\ f)\). This is a very interesting and useful concept, and it’s where this image comes from.

Let’s try to understand what it does, and why it’s a fixed-point combinator. We are saying that \(Y\) is a function that takes one parameter \(f\). The body consists of the same lambda term applied to itself: \((\lambda x. f(x\ x))\). You may realize why we explained how to do recursion with anonymous functions earlier. A similar principle applies here, but we are also calling the \(f\) function.

Let’s simplify it with β-reduction step by step:

\begin{align*} Y\ g &= \lambda f. \big(\lambda x. f (x\ x)\big) \big(\lambda x. f (x\ x)\big) g && \text{By definition of } Y \\ &= \big(\lambda x. g (x\ x)\big) \big(\lambda x. g (x\ x)\big) && \text{By beta reduction: Replacing } f \text{ of } Y \text{ with } g \\ &= g \Big(\big(\lambda x. g (x\ x)\big) \big(\lambda x. g (x\ x)\big)\Big) && \text{By beta reduction: Replacing } x \text{ of the first function with } \big(\lambda x. g (x\ x)\big) \\ &= g (Y\ g) && \text{By equality} \end{align*}

Note how the reduction on the third step is applying \(g\) to the same expression in the second step, which we know is equal to \(Y\ g\). That’s how we can verify that \(Y\ g = g(Y\ g)\).

An alternative (and slightly simpler) version of the Y combinator is the following:

\[ X = \lambda f. (\lambda x. x\ x) (\lambda x. f(x\ x)) \]

Notice how the first call to \(f\) was not necessary, since this expression also β-evaluates to the Y combinator.

8. Applications of the Y combinator

You might be wondering what makes the Y combinator so special. As we said, lambda calculus doesn’t have any kind of “global symbols”, therefore a function can’t reference itself by name. Let’s go back to the factorial example in Scheme.

(define fact
  (lambda (n)
    (if (equal? n 0)
        1
        (* n (fact (- n 1))))))

This recursive form is possible because the fact function can reference itself by name. More specifically, because a lambda body is evaluated whenever a call is made, and by that time the fact symbol is already bound to the lambda, and therefore the body can reference it. In OCaml, for example, the rec keyword is needed when defining a recursive function to denote that it will reference itself, and not an external function with the same name.

The Y combinator allows us to call a function recursively in a language that doesn’t implement recursion. Let’s have a look at an alternative form of fact.

(define fact-generator
  (lambda (self)        ; Outer lambda (fact-generator)
    (lambda (n)         ; Inner lambda (returned by fact-generator)
      (if (equal? n 0)
          1
          (* n (self (- n 1)))))))

Let’s carefully look at what we just defined. We are defining fact-generator as the outer lambda, a function with one argument self. This function does not return the factorial of a number, but instead returns another another lambda function that will receive a number n, and return its factorial.

This inner lambda, the one that will be returned when calling fact-generator, is essentially the same as our previous fact function, but this time it’s able to use recursion without referencing itself by name by accessing the self parameter of the outer lambda.

The important detail is that fact-generator is supposed to return a factorial function, but also expects to receive a factorial function as its self parameter, which will be used whenever the inner lambda wants to make a “recursive” call. How could we accomplish this? At first sight, we could try something like this.

;; Wrong.
(define fact
  (fact-generator fact))

The code above is incorrect because of how Scheme evaluates the arguments. When evaluating the define expression, it will first try to evaluate the call to fact-generator, but before that it must evaluate its argument, the fact symbol. Since at this point fact isn’t defined (we are trying to do just that), Scheme will show an “Unbound variable” error.

Since the fact-generator function expects a function for calculating the factorial, but also returns one, we are looking for something like:

(define fact
  (fact-generator (fact-generator (fact-generator ...))))

Does that look familiar? Indeed, this is just what the Y combinator allows us to do.

9. Implementation in Scheme

Our Scheme version of the Y combinator was not really correct. Evaluation in Scheme (and most Lisps) is strict, meaning that each argument is evaluated before applying the function. This is not a problem in lambda calculus.

9.1. Delaying evaluation with beta abstraction

Defining Y like we did before would result in infinite recursive calls when trying to apply the (x x) expressions. This will be more obvious when analyzing the combinator below, so let’s look at a possible solution first. We can fix our Y combinator by beta abstracting⁵ those two applications.

JAO’s blog about the Y-combinator in Scheme (2014)

If you have a function F in Scheme, you can define a totally equivalent function G by (define G (lambda (args) (F args))). We say that G is a beta abstraction of F, or that F is a beta reduction of G.

The usual reason you would beta abstract a function in Scheme is in order to delay the evaluation of its body, just what the doctor ordered.

We can add this beta abstraction to the (x x) applications, effectively acting as a “proxy”.

(define Y
  (lambda (f)
    ((lambda (x) (f (lambda (n) ((x x) n))))
     (lambda (x) (f (lambda (n) ((x x) n)))))))

Now, since the (x x) expression is inside the body of this “proxy” lambda, it will not be evaluated until the proxy is called. This is not easy to understand, so so let’s try to visualize the evaluation process of a call to our Y function.

9.2. Evaluation process of our Y combinator

The following diagram represents the evaluation of a call to our Y function. Specifically, applying the first inner lambda to its copy. Note that this isn’t the actual process that a Scheme interpreter would follow, instead I have decided to make some changes to make it easier to understand.

Let me explain briefly each step of the diagram.

The black lambda receives a copy of itself (gray lambda) as the argument x. We have seen this above with the \((\lambda x. x\ x)(\lambda x. x\ x)\) expression.
With some wishful thinking™, since we know that x is a copy of the current expression, we can assume that the result of the green (x x) expression is whatever the current black expression returns. For now, it’s better to assume that the value returned when calling x is fact, since this will be explained in more detail below.
The blue (lambda (n) (fact n)) expression acts as a proxy, receiving some arguments, in this case n, and calling fact with them. We name this simple lambda fact-proxy, since calling it is essentially the same as calling fact.
We know that f is the function that has been passed to Y, in this case fact-generator. We substitute it for readability, along with substituting fact-proxy with just fact.
Finally, the expression (fact-generator fact) gets passed to another lambda, or returned by Y, depending on whether or not we are the first call in the recursive cycle.

At this point, although there are some parts that might not be very clear, we can believe that (Y fact-generator) is the same as (fact-generator (fact-generator ...)), which returns a recursive fact function, even though fact has not been defined yet.

Now that we can see that the black expression effectively returns fact, you can verify that the second point was true. The (x x) expression is calling x with a copy of itself (the gray expression) as argument. That argument will be used for looping recursively, as mentioned in point one. Therefore, the call to x returns fact, and that’s why we were able to replace it on point two.

By wishful thinking, we know the (x x) call returns fact, but since evaluation in Scheme is strict, it will try to evaluate the call to x, which also contains another call to x, and so on. That’s what the fact-proxy is for.

With this example, you can also realize that the following expression makes more sense now.

\begin{align*} Y f &= f (Y f) \\ &= f (f (f (\dots))) \end{align*}

Which is just what we wanted to achieve earlier.

(define fact
  (fact-generator (fact-generator (fact-generator ...))))

9.3. Complete Scheme example

Finally, we can try our full example.

(define Y
  (lambda (f)
    ((lambda (x) (f (lambda (n) ((x x) n))))
     (lambda (x) (f (lambda (n) ((x x) n)))))))

(define fact-generator
  (lambda (self)
    (lambda (n)
      (if (equal? n 0)
          1
          (* n (self (- n 1)))))))

(define fact
  (Y fact-generator))

(fact 5)

10. Final note

If you reached this far, I hope you have learned something. Everything in this article is based on what I found while trying to learn about the Y combinator, so if you feel like some explanations could be improved, feel free to contribute.

Footnotes:

See the Wikipedia page for lambda calculus.

See the Wikipedia page for extensionality.

See the Wikipedia page for fixed point.

⁴

See the Wikipedia page for fixed-point combinator.

⁵

See jao.io article about the Y combinator in Scheme, and about how to design good websites.

Understanding the Y combinator

Table of Contents

1. Quick introduction to lambda calculus

1.1. Rules

1.2. Lambda terms

1.3. Reduction operations

1.4. Notation

2. SICP, Lisp and JavaScript

3. Simple example: Factorial

4. Simple recursion with anonymous functions

5. Fixed points

6. Fixed-point combinators

7. Y combinator

8. Applications of the Y combinator

9. Implementation in Scheme

9.1. Delaying evaluation with beta abstraction

9.2. Evaluation process of our Y combinator

9.3. Complete Scheme example

10. Final note

Footnotes: