# Understanding the Y combinator

## Table of Contents

## 1. Quick introduction to lambda calculus

**Lambda calculus** (λ-calculus) is a mathematical system for computation
based on **function abstraction** and **application**, using **variable binding** and
substitution.^{1}

### 1.1. Rules

The beauty of lambda calculus is its simplicity. These are the only 3 rules:

- \(x\): A
**variable**represents a parameter. - \((\lambda x. M)\): A
**lambda abstraction**is an anonymous function, with a parameter \(x\) (between the λ and the dot), that returns the body \(M\). - \((M\ N)\): An
**application**of function \(M\) to an argument \(N\).

In the last rule, both \(M\) and \(N\) are lambda terms.

### 1.2. Lambda terms

A **lambda term** is just a valid expression in the lambda calculus system. Well,
what makes an expression valid? The following 3 rules are used to determine if a
lambda expression is valid:

- A
**variable**\(x\) is itself a valid lambda term. - If \(M\) is a lambda term, and \(x\) is a variable, then \((\lambda x. M)\) is also
a lambda term (An
**abstraction**). - If \(M\) and \(N\) are lambda terms, then \((M\ N)\) is also a lambda term (An
**application**).

You can begin to feel the recursive magic of lambda calculus, before even getting into an example.

### 1.3. Reduction operations

Lambda calculus also has 2 main reduction operations:

- \((\lambda x. M[x])\) → \((\lambda y. M[y])\):
**Alpha conversion**(α-conversion), renaming the bound variables in the expression. Used to avoid name collisions. - \(((\lambda x. M) N)\) → \((M[x := N])\):
**Beta reduction**(β-reduction), replacing the bound variables (\(x\)) with the argument expression (\(N\)) in the body (\(M\)) of the abstraction.

There is also **eta reduction** (η-reduction), which expresses the idea of
*extensionality*,^{2} which applied to this context establishes that two
functions are the same if and only if they give the same result for all
arguments.

Later we will also mention the opposite of the beta reduction, the **beta
abstraction**, which instead of simplifying an expression, it adds an extra
function call that might be useful (e.g. in Lisps) for delaying the evaluation
of the arguments.

### 1.4. Notation

For understanding lambda notation, you will also have to keep in mind these conventions:

- Outermost parentheses are dropped: \(M N\) instead of \((M N)\).
- Applications are assumed to be left associative: \(M N P\) instead of \(((M N) P)\).
- The body of an abstraction extends as far right as possible: \((\lambda x. M N)\) means \((\lambda x. (M N))\) and not \(((\lambda x. M) N)\).
- A sequence of abstractions is contracted: \((\lambda x. \lambda y. \lambda z. N)\) is abbreviated as \((\lambda x\ y\ z. N)\).
- When all variables are single-letter, the space in applications may be omitted: \((M N P)\) instead of \((M\ N\ P)\).

Some of these look a bit confusing to me, specially when embedding expressions in text, so I will try to make each expression as readable as possible.

## 2. SICP, Lisp and JavaScript

Structure and Interpretation of Computer Programs is an amazing book by Harold Abelson and Gerald Jay Sussman. The book teaches the fundamental principles of computer programming, including recursion, abstraction, modularity, and much more. I recommend the book to anyone who is interested in programming, I am sure they will learn something.

This book has two editions. The first one uses Scheme (a dialect of Lisp), for its examples and explanations, while the second uses JavaScript. I have not programmed much in JavaScript at the time of writing this, but I will try to provide all examples in both lambda notation, Scheme and JavaScript.

At one point, in section 1.3.3, they talk about finding fixed points of functions. This was the first time I heard about this, and it’s what mainly motivated me to write this article. If you are interested on the general Lisp approach, and not on the Y combinator itself, I recommend you check it out.

One of the most valuable things that SICP has taught me is that sometimes it’s
extremely useful to treat functions as black boxes that are able to transform
some inputs into some outputs. They call this *wishful thinking*, and it has been
useful not only when using functions, but also when designing them.

## 3. Simple example: Factorial

This is the function for calculating the factorial of a number \(n\), using the lambda calculus notation:

\[ \text{fact} = \lambda n. \Big(\Big(\text{iszero}\ n\Big) 1 \Big(\text{mult}\ n \ \big(\text{fact}\ (\text{prec}\ n)\big)\Big)\Big) \]

In Scheme:

(define fact (lambda (n) (if (equal? n 0) 1 (mult n (fact (prec n))))))

Or in JavaScript:

var fact = (n) => (n == 0) ? 1 : mult(n, (fact(prec(n))));

We are defining `fact`

as a function that takes a parameter `n`

. This function
returns 1 if `n`

is zero, and otherwise multiplies `n`

by the factorial of the
number preceding `n`

.

In this case, we can simply ignore how `iszero`

, `mult`

, `prec`

and even `fact`

work
*internally*, we just have to trust that they do what we expect. Another useful
way of thinking about lambda calculus and Lisp in general is as a language for
expressing processes.

In any case, we don’t have those name-defining commodities in lambda calculus. A function can’t call itself by name, so we will have to find an alternative way.

## 4. Simple recursion with anonymous functions

Before trying to understand the Y combinator, let’s have a look at an example of how an anonymous function might call itself without the need for symbols.

\[ (\lambda x. x\ x)(\lambda x. x\ x) \]

Or in Scheme:

((lambda (x) (x x)) (lambda (x) (x x)))

Note:Depending on the Lisp, you might need to use`(funcall x x)`

instead of`(x x)`

, since variables and functions don’t share the same namespace. You can search about the differences between Lisp-1 and Lisp-2.

Or in JavaScript:

((x) => x(x))((x) => x(x))

We can see that the two parenthesized expressions are identical, and that the first is applied to the second one. Let’s try to simplify it by β-reduction. The first parenthesized expression, is applied to the second one. We replace each occurrence of \(x\) in the body of the first expression with the whole second parenthesized expression.

We are right back where we started. This function would call itself indefinitely, and a similar form will be used for the Y combinator bellow.

## 5. Fixed points

Before getting into the fixed-point combinators, we need to define what a fixed point is.

A fixed point of function \(f\) is a value that is mapped to itself by the
function.^{3} In other words, \(x\) is a fixed point of \(f\) if \(f(x) = x\). For
this to be possible, \(x\) has to belong to both the *domain* of \(f\) (set of values
that it can take), and the *codomain* of \(f\) (set of values that it can return).

For example, if \(f(x) = x!\), 1 and 2 are fixed points, since \(f(1) = 1\) and \(f(2) = 2\).

The image shows the graph of a function \(f\), with 3 fixed points. When plotting with \(y = f(x)\), these 3 points were also on the line \(x = y\).

For example, for some functions \(f\), we can locate a fixed point by beginning with an initial guess and applying \(f\) repeatedly.

\[ f(x),\quad f(f(x)),\quad f(f(f(x))),\quad \dots, \]

We would do that until the value doesn’t change very much, and we are satisfied with the result.

## 6. Fixed-point combinators

A **fixed-point combinator** is a higher-order function (i.e. a function that takes
a function as argument) that returns some fixed point of its argument
function.^{4}

So, if a function `fix`

is a fixed-point combinator, a function `f`

has one or
more fixed points, then `fix(f)`

is one of these fixed points:

\[ f(\text{fix}\ f) = \text{fix}\ f \]

In lambda calculus, every function has a fixed point.

## 7. Y combinator

An example of a fixed-point combinator is the Y combinator. This is the definition of \(Y\).

\[ Y = \lambda f. \big(\lambda x. f (x\ x)\big) \big(\lambda x. f (x\ x)\big) \]

Or in Scheme:

(define Y (lambda (f) ((lambda (x) (f (x x))) (lambda (x) (f (x x))))))

Note:This version is not accurate, see Implementation in Scheme bellow.

Or in JavaScript:

var Y = (f) => ((x) => f(x(x)))( (x) => f(x(x)));

Since it’s a fixed-point combinator, calling \(Y\) with a function as its argument would be reduced to \(Y\ f = f(Y\ f)\). This is a very interesting and useful concept, and it’s where this image comes from.

Let’s try to understand what it does, and why it’s a fixed-point combinator. We are saying that \(Y\) is a function that takes one parameter \(f\). The body consists of the same lambda term applied to itself: \((\lambda x. f(x\ x))\). You may realize why we explained how to do recursion with anonymous functions earlier. A similar principle applies here, but we are also calling the \(f\) function.

Let’s simplify it with β-reduction step by step:

\begin{align*} Y\ g &= \lambda f. \big(\lambda x. f (x\ x)\big) \big(\lambda x. f (x\ x)\big) g && \text{By definition of } Y \\ &= \big(\lambda x. g (x\ x)\big) \big(\lambda x. g (x\ x)\big) && \text{By beta reduction: Replacing } f \text{ of } Y \text{ with } g \\ &= g \Big(\big(\lambda x. g (x\ x)\big) \big(\lambda x. g (x\ x)\big)\Big) && \text{By beta reduction: Replacing } x \text{ of the first function with } \big(\lambda x. g (x\ x)\big) \\ &= g (Y\ g) && \text{By equality} \end{align*}Note how the reduction on the third step is applying \(g\) to the same expression in the second step, which we know is equal to \(Y\ g\). That’s how we can verify that \(Y\ g = g(Y\ g)\).

An alternative (and slightly simpler) version of the Y combinator is the following:

\[ X = \lambda f. (\lambda x. x\ x) (\lambda x. f(x\ x)) \]

Notice how the first call to \(f\) was not necessary, since this expression also β-evaluates to the Y combinator.

## 8. Applications of the Y combinator

You might be wondering what makes the Y combinator so special. As we said, lambda calculus doesn’t have any kind of “global symbols”, therefore a function can’t reference itself by name. Let’s go back to the factorial example in Scheme.

(define fact (lambda (n) (if (equal? n 0) 1 (* n (fact (- n 1))))))

This recursive form is possible because the `fact`

function can reference itself
by name. More specifically, because a lambda body is evaluated whenever a *call*
is made, and by that time the `fact`

symbol is already bound to the lambda, and
therefore the body can reference it. In OCaml, for example, the `rec`

keyword is
needed when defining a recursive function to denote that it will reference
itself, and not an external function with the same name.

The Y combinator allows us to call a function recursively in a language that
*doesn’t implement recursion*. Let’s have a look at an alternative form of `fact`

.

(define fact-generator (lambda (self) ; Outer lambda (fact-generator) (lambda (n) ; Inner lambda (returned by fact-generator) (if (equal? n 0) 1 (* n (self (- n 1)))))))

Let’s carefully look at what we just defined. We are defining `fact-generator`

as
the outer lambda, a function with one argument `self`

. This function *does not*
return the factorial of a number, but instead returns another *another lambda
function* that will receive a number `n`

, and return its factorial.

This inner lambda, the one that will be returned when calling `fact-generator`

, is
essentially the same as our previous `fact`

function, but this time it’s able to
use recursion *without* referencing itself by name by accessing the `self`

parameter
of the outer lambda.

The important detail is that `fact-generator`

is supposed to *return* a factorial
function, but also expects to *receive* a factorial function as its `self`

parameter, which will be used whenever the inner lambda wants to make a
“recursive” call. How could we accomplish this? At first sight, we could try
something like this.

;; Wrong. (define fact (fact-generator fact))

The code above is incorrect because of how Scheme evaluates the arguments. When
evaluating the `define`

expression, it will first try to evaluate the call to
`fact-generator`

, but before that it must evaluate its argument, the `fact`

symbol. Since at this point `fact`

isn’t defined (we are trying to do just that),
Scheme will show an “Unbound variable” error.

Since the `fact-generator`

function expects a function for calculating the
factorial, but also returns one, we are looking for something like:

(define fact (fact-generator (fact-generator (fact-generator ...))))

Does that look familiar? Indeed, this is just what the Y combinator allows us to do.

## 9. Implementation in Scheme

Our Scheme version of the Y combinator was not really correct. Evaluation in
Scheme (and most Lisps) is *strict*, meaning that each argument is evaluated
*before* applying the function. This is not a problem in lambda calculus.

### 9.1. Delaying evaluation with beta abstraction

Defining `Y`

like we did before would result in infinite recursive calls when
trying to apply the `(x x)`

expressions. This will be more obvious when analyzing
the combinator bellow, so let’s look at a possible solution first. We can fix
our Y combinator by **beta abstracting**^{5} those two applications.

JAO’s blog about the Y-combinator in Scheme (2014)If you have a function

`F`

in Scheme, you can define a totally equivalent function`G`

by`(define G (lambda (args) (F args)))`

. We say that`G`

is abeta abstractionof`F`

, or that`F`

is abeta reductionof`G`

.The usual reason you would beta abstract a function in Scheme is in order to delay the evaluation of its body, just what the doctor ordered.

We can add this beta abstraction to the `(x x)`

applications, effectively acting
as a “proxy”.

(define Y (lambda (f) ((lambda (x) (f (lambda (n) ((x x) n)))) (lambda (x) (f (lambda (n) ((x x) n)))))))

Now, since the `(x x)`

expression is inside the body of this “proxy” lambda, it
will not be evaluated until the proxy is called. This is not easy to understand,
so so let’s try to visualize the evaluation process of a call to our `Y`

function.

### 9.2. Evaluation process of our Y combinator

The following diagram represents the evaluation of a call to our `Y`

function. Specifically, applying the first inner lambda to its copy. Note that
this isn’t the actual process that a Scheme interpreter would follow, instead I
have decided to make some changes to make it easier to understand.

Let me explain briefly each step of the diagram.

- The black lambda receives a copy of itself (gray lambda) as the argument
`x`

. We have seen this above with the \((\lambda x. x\ x)(\lambda x. x\ x)\) expression. - With some
*wishful thinkingâ„˘*, since we know that`x`

is a copy of the current expression, we can assume that the result of the green`(x x)`

expression is whatever the current black expression returns. For now, it’s better to assume that the value returned when calling`x`

is`fact`

, since this will be explained in more detail below. - The blue
`(lambda (n) (fact n))`

expression acts as a proxy, receiving some arguments, in this case`n`

, and calling`fact`

with them. We name this simple lambda`fact-proxy`

, since calling it is essentially the same as calling`fact`

. - We know that
`f`

is the function that has been passed to`Y`

, in this case`fact-generator`

. We substitute it for readability, along with substituting`fact-proxy`

with just`fact`

. - Finally, the expression
`(fact-generator fact)`

gets passed to another lambda, or returned by`Y`

, depending on whether or not we are the first call in the recursive cycle.

At this point, although there are some parts that might not be very clear, we
can believe that `(Y fact-generator)`

is the same as
`(fact-generator (fact-generator ...))`

, which returns a recursive `fact`

function, even though `fact`

has not been defined yet.

Now that we can see that the black expression effectively returns `fact`

, you can
verify that the second point was true. The `(x x)`

expression is calling `x`

with a
copy of itself (the gray expression) as argument. That argument will be used for
looping recursively, as mentioned in point one. Therefore, the call to `x`

returns
`fact`

, and that’s why we were able to replace it on point two.

By wishful thinking, we know the `(x x)`

call returns `fact`

, but since evaluation
in Scheme is strict, it will try to evaluate the call to `x`

, which also contains
another call to `x`

, and so on. That’s what the `fact-proxy`

is for.

With this example, you can also realize that the following expression makes more sense now.

\begin{align*} Y f &= f (Y f) \\ &= f (f (f (\dots))) \end{align*}Which is just what we wanted to achieve earlier.

(define fact (fact-generator (fact-generator (fact-generator ...))))

### 9.3. Complete Scheme example

Finally, we can try our full example.

(define Y (lambda (f) ((lambda (x) (f (lambda (n) ((x x) n)))) (lambda (x) (f (lambda (n) ((x x) n))))))) (define fact-generator (lambda (self) (lambda (n) (if (equal? n 0) 1 (* n (self (- n 1))))))) (define fact (Y fact-generator)) (fact 5)

## 10. Final note

If you reached this far, I hope you have learned something. Everything in this article is based on what I found while trying to learn about the Y combinator, so if you feel like some explanations could be improved, feel free to contribute.

## Footnotes:

^{1}

See the Wikipedia page for lambda calculus.

^{2}

See the Wikipedia page for extensionality.

^{3}

See the Wikipedia page for fixed point.

^{4}

See the Wikipedia page for fixed-point combinator.

^{5}

See jao.io article about the Y combinator in Scheme, and about how to design good websites.