The pros and cons of Cons

1. Introduction
- 1.1. Cons cells
- 1.2. Lists and syntactic sugar
2. The initial approach
- 2.1. A basic Expr structure
- 2.2. Combining expressions with linked lists
3. The conventional cons cell approach
4. Comparing the two methods

1. Introduction

This article is about the advantages and disadvantages of using cons cells when implementing a Lisp-like programming language.

I have been working on a simple Lisp interpreter for some months now, and I have learned many things from my mistakes. I recently changed how lists are internally stored in my interpreter, and I wanted to explain why I decided to use my old approach in the beginning, and why I decided to eventually change into a more conventional cons cell approach.

Throughout this article I will use the Lisp syntax from my interpreter, but the examples can be run in most Lisp dialects with some minor modifications. Just keep in mind that, unlike in Lisp₂ dialects (e.g. Common Lisp), there isn’t a separate namespace for functions, so you won’t see funcall.

First of all, I would like to credit Daniel Kochmański for his help some months ago, and for linking me this email by Erik Naggum, which you might find useful.

1.1. Cons cells

If you are familiar with Lisp languages, you probably know what cons cells are, but I will still give a brief explanation.

The term “cons” is a bit ambiguous in Lisp, because it’s commonly used when talking about a data type, but it’s also the name of a procedure that creates an object of that type. Although basically all Lisp dialects have a cons function, the name used to refer to the data type changes among them: Scheme calls them pairs, Emacs Lisp calls them cons cells, and Common Lisp simply calls them cons. In this article, I will use the term “cons cell” when referring to the data type to avoid confusion.

A cons cell is a Lisp object that simply points to another two to Lisp objects. For historical reasons, the first pointer is called the CAR, and the second pointer is called the CDR¹. These CAR and CDR terms are not only used when talking about the pointers themselves, but also when talking about the values that they point to.

Figure 1: Box diagram of a cons cell.

These kinds of diagrams are called box-and-pointer, box-and-arrow, or simply box diagrams, and they are very useful for understanding how complex data structures are stored internally.

The procedures for accessing the CAR and the CDR of a cons cell are called car and cdr. There are also wrappers for combining them, so (caddr x) would return the car of the cdr of the cdr of x.

(car (cons 10 20))
  ⇒ 10

(cdr (cons 10 20))
  ⇒ 20

Finally, I would like to mention dotted pair or cons pair notation. When a dot is encountered inside a list (i.e. between parentheses), the interpreter assumes that the element before the dot is the CAR of the current pair, and the one after the dot is the CDR.

(car '(10 . 20))
  ⇒ 10

(cdr '(10 . 20))
  ⇒ 20

1.2. Lists and syntactic sugar

When people mention lists in the context of Lisp, they are usually talking about something that looks like:

(a b 1 2)

These are called proper lists, but it’s important to understand that (usually) the lists are internally stored as chained cons cells, and that the previous notation is just syntactic sugar for:

(a . (b . (1 . (2 . nil))))

As you can see, each element of our list is stored in the CAR of a cons cell, and the CDR points to another cons cell which contains the next element. The CDR of the innermost cons cell is the symbol nil, used to terminate lists². Although not all Lisp dialects use the symbol nil, there is always a unique value used to terminate proper lists. In box diagrams, a crossed box is used to represent this terminator.

Figure 2: Box diagram of a proper list.

Let me emphasize that there is no “list type”, it’s just a data structure that is built out of cons cells.

2. The initial approach

Now that we know how lists are conventionally represented in Lisps, let’s have a look at an alternative implementation.

One of the books that inspired me to write a Lisp interpreter was Build Your Own Lisp by Daniel Holden. The author doesn’t use the cons cell approach described above, and although I didn’t follow the book, at first glance it didn’t seem like a bad idea. This article tries to precisely show its advantages and limitations when compared to the traditional approach explained above.

2.1. A basic `Expr` structure

First, let’s have a look at how a very basic Lisp expression would look like from C, using a tagged union.

enum EExprType {
    EXPR_INTEGER,
    EXPR_FLOAT,
    EXPR_SYMBOL,
};

typedef struct Expr {
    enum EExprType type;

    union {
        int n;
        float f;
        char* s;
    } val;
} Expr;

I will not spend too much time explaining why tagged unions are useful, since it is not the scope of this article, but in case you are not familiar with them, just know that they generally have a lower memory impact, and that we could access the appropriate value of an expression by first checking its type member.

int f(Expr* expr) {
    switch (expr->type) {
        case EXPR_INTEGER: return expr->val.n;
        case EXPR_FLOAT:   return (int)expr->val.f;
        case EXPR_SYMBOL:  return strlen(expr->val.s);
        default:           abort(); /* ??? */
    }
}

2.2. Combining expressions with linked lists

Now let’s have a look at how we could combine simple expressions into more complex structures using linked lists. The premise of a linked list is that each object in the list contains a pointer to the next one, therefore allowing the programmer to link objects that are not adjacent in memory, which is a limitation when using simple arrays.

Figure 3: Memory layout of a generic linked list of 3 elements.

We could use this linked list method for joining an arbitrary number of expressions together. Keep in mind that the lists themselves (just like the cons cells in conventional implementations) are expressions, so we would need to add a new expression type whose value is a pointer to the start of a linked list of expressions. The following code shows how the new Expr structure would look like after adding the necessary members.

enum EExprType {
    /* ... */
    EXPR_LIST,
};

typedef struct Expr {
    enum EExprType type;

    union {
        /* ... */
        struct Expr* children;
    } val;

    struct Expr* next;
} Expr;

Notice how children is a member of the union, but next is a member of the Expr structure, so the size of each expression just increased by sizeof(Expr*), not by 2 * sizeof(Expr*).

The following code shows how we would manually create the list (a b 1 2) from C, assuming that there is some expr_new function that allocates an expression with the specified type. For readability, we will also assume that we can safely store string literals in our expressions³.

extern Expr* expr_new(enum EExprType type);

Expr* get_list(void) {
    /* Create the list itself */
    Expr* list = expr_new(EXPR_LIST);
    list->next = NULL;

    /* Write each element */
    Expr* elt          = expr_new(EXPR_SYMBOL);
    list->val.children = elt;
    elt->val.s         = "a";

    elt->next  = expr_new(EXPR_SYMBOL);
    elt        = elt->next;
    elt->val.s = "b";

    elt->next  = expr_new(EXPR_INTEGER);
    elt        = elt->next;
    elt->val.n = 1;

    elt->next  = expr_new(EXPR_INTEGER);
    elt        = elt->next;
    elt->val.n = 2;

    /* Terminate the linked list */
    elt->next = NULL;

    return list;
}

The following diagram shows how the list would be stored in memory with our new structure.

Figure 4: Layout of a list of 4 expressions, using the linked list approach.

As you can see, each expression has a next member, so they will always be implicitly in a list, even if an expression isolated like the first one. This has some important consequences that will be explained below.

The following code shows how we would iterate each expression of a list; in this case for adding some integers together.

int sum(const Expr* list) {
    assert(list->type == EXPR_LIST);

    int total = 0;
    for (Expr* e = list->val.children; e != NULL; e = e->next)
        if (e->type == EXPR_INTEGER)
            total += e->val.n;

    return total;
}

Before getting into the advantages and disadvantages of the implementation, I would like to note that the author of the book I mentioned probably chose this approach to deliberately explain how linked lists work, since the book is also meant for people learning C.

I would also like to mention that Clojure, a Lisp dialect⁴, uses a similar linked list approach through sequences, as described in this article. And indeed, this can be seen in the Java source code where the Cons class is defined.

3. The conventional cons cell approach

Now that we had a look at how to combine expressions with a linked list, let’s go back to the cons cell approach that is used in most Lisps. Instead of adding “list” expressions to our implementation, we will simply add a “pair” type that contains the CAR and CDR pointers, and we will chain them to build lists.

The following code shows how we could extend our basic Expr structure from above to include pairs.

enum EExprType {
    /* ... */
    EXPR_PAIR,
};

typedef struct ExprPair {
    struct Expr* car;
    struct Expr* cdr;
} ExprPair;

typedef struct Expr {
    enum EExprType type;

    union {
        /* ... */
        struct ExprPair pair;
    } val;
} Expr;

There are some details I would like to note from the previous code. First, notice how the Expr structure doesn’t have a next member anymore, so expressions are not in an “implicit” list. Second, notice how the pair union member is an ExprPair structure, not a pointer to that structure; this is because the pair structure is very small, so there is no need to allocate it separately.

We (potentially⁵) increased the size of the union by sizeof(expr.val.pair.cdr), but also decreased the size of the structure by sizeof(expr.next), so the size of an individual expression was not negatively impacted by this change.

The following code is equivalent to the one shown in the previous section for creating the list (a b 1 2). We assume that there is some NIL expression that is used to uniquely identify a list terminator, but we could set the last CDR pointer to NULL instead.

extern Expr* expr_new(enum EExprType type);
extern Expr* NIL;

Expr* get_list(void) {
    Expr* first              = expr_new(EXPR_PAIR);
    Expr* cur                = first;
    cur->val.pair.car        = expr_new(EXPR_SYMBOL); /* CADR */
    cur->val.pair.car->val.s = "a";

    cur->val.pair.cdr        = expr_new(EXPR_PAIR);
    cur                      = cur->val.pair.cdr;
    cur->val.pair.car        = expr_new(EXPR_SYMBOL); /* CADR */
    cur->val.pair.car->val.s = "b";

    cur->val.pair.cdr        = expr_new(EXPR_PAIR);
    cur                      = cur->val.pair.cdr;
    cur->val.pair.car        = expr_new(EXPR_INTEGER); /* CADDR */
    cur->val.pair.car->val.n = 1;

    cur->val.pair.cdr        = expr_new(EXPR_PAIR);
    cur                      = cur->val.pair.cdr;
    cur->val.pair.car        = expr_new(EXPR_INTEGER); /* CADDDR */
    cur->val.pair.car->val.n = 2;

    /* Terminate the chain of cons cells */
    cur->val.pair.cdr = NIL;

    return first;
}

The following diagram shows how the list would be stored in memory with our new structure.

Figure 5: Layout of a list of 4 expressions, using the cons cell approach.

The following code shows how we would iterate over a (proper⁶) list, accessing each element. Since the previous code is not very readable, I added some CAR and CDR macros for (hopefully) making the code a bit cleaner.

#define CAR(EXPR_PTR) ((EXPR_PTR)->val.pair.car)
#define CDR(EXPR_PTR) ((EXPR_PTR)->val.pair.cdr)

extern bool is_proper_list(const Expr* expr);
extern bool is_nil(const Expr* expr);

int sum(const Expr* list) {
    assert(is_proper_list(list));

    int total = 0;
    for (; !is_nil(list); list = CDR(list))
        if (CAR(list) == EXPR_INTEGER)
            total += CAR(list)->val.n;

    return total;
}

Just like with expr_new, notice how we use two functions that we haven’t defined: is_proper_list and is_nil. It’s not necessary to know how these functions are implemented, or even what nil really is, as long as we have a reliable way of checking for this unique value. This is called wishful thinking, and is a key concept from Structure and Interpretation of Computer Programs, a great book that I truly recommend (not just for Lisp programmers).

4. Comparing the two methods

After considering how the two approaches could be implemented, we can finally compare them. I will start with the disadvantages of the conventional cons cell approach when compared with the simple linked list.

4.1. Disadvantage: Memory impact

If we compare the memory layouts for the linked list and the cons cell implementations, we can immediately see that the latter uses more memory for storing the same list.

Specifically, since the linked list approach needed an expression for the list itself, we would need n+1 expressions for building a list of n elements. When using the cons cell approach, each element of the (proper) list is placed inside a cons pair, so we would need 2n expressions, assuming that we don’t need to allocate a nil expression for every list, since that is not usually the case.

Considering this fact, it’s easy to conclude that the linked list implementation is more memory-efficient, and that there is no point in using cons cells. In that regard, I would like to say that even if the implementation uses cons cells for building conventional lists, this doesn’t prevent us from adding another more optimized “vector” or “array” data structure.

4.2. Advantage: Improper lists

When describing Lisp lists, I only mentioned proper lists, but in most Lisp dialects, there are also improper lists. One subtype of improper lists are dotted lists: lists whose last CDR is not nil. Such a list might be defined with the dotted pair notation described above, or by using cons.

(cdr '(a b . c))
  ⇒ (b . c)

(cadr '(a b . c))  ; (car (cdr ...))
  ⇒ b

(cddr '(a b . c))  ; (cdr (cdr ...))
  ⇒ c

(cons 'a (cons 'b (cons 3 4)))
  ⇒ (a b 3 . 4)

This data structure is perfectly possible with the cons cell approach, but impossible when using a linked list, since the possible values of the next member are either the address of another expression or NULL.

I would also like to mention circular lists, another type of improper list. A circular list can be defined as “a chain of cons cells that has no termination because some cell in the chain is the CDR of a later cons cell”. To be completely fair, we could still build circular lists with the linked list approach, since the next member can point to a previous element of the list.

4.3. Advantage: Context independence, reusing references

As mentioned above, when using linked lists, each expression is restricted to their context, since they always have a next pointer.

In the book by Daniel Holden, this isn’t a problem because most functions returned copies, instead of references. Naturally, this is appropriate for his book, since it would be confusing for new C programmers to implement a more complex memory-management system like garbage collection.

Returning copies is not practical when making a real implementation, however. Not only does it waste memory and decrease performance (with unnecessary calls to the allocator), but it also makes some standard Lisp functions like car and cdr counter-intuitive, since they are supposed to return references, instead of copies.

To further illustrate why the next pointer is a problem, let’s look at a possible implementation of the car primitive. The following function, written for the linked list implementation, receives an expression of type EXPR_LIST and is supposed to return its first element.

Expr* expr_clone(const Expr* e) {
    Expr* result = expr_new(e->type);
    result->val  = e->val;
    result->next = NULL;
    return result;
}

Expr* car(Expr* list) {
    if (is_nil(list))
        return NIL;

    assert(list->type == EXPR_LIST);
    return expr_clone(list->val.children);
}

We can’t directly return list->val.children because the first element still has the rest of the list “attached” through its next pointer, and we can’t set list->val.children->next to NULL to “isolate” the first element either, because we would be overwriting the list (a b c) into becoming (a), and we could even be leaking memory. Even if we wanted to directly return the reference, we can’t do it without breaking the structure of the list we received, so we are forced to return a clone.

However, with the cons cell approach, this is primitive would be much simpler to implement, since a list references its contents, but the contents themselves are completely independent of their context.

Expr* car(Expr* expr) {
    if (is_nil(expr))
        return NIL;

    assert(expr->type == EXPR_PAIR);
    return expr->val.pair.car;
}

Keep in mind that, since we are returning a reference, if something overwrites the value of the expression returned by car, it will also affect the contents of the cons cell where it’s referenced. This behavior is expected in Lisp, and we could always write another function that returns a copy for this purpose.

Although this advantage might not look too significant, it allows us to work with references, which can potentially save more memory than when using linked lists, as explained in the disadvantage above.

4.4. Advantage: Consistent `nil`

In Lisp, the symbol nil is often used to represent the empty list. The following quote is taken from the Emacs manual:

In Emacs Lisp, the symbol nil has two meanings. First, it means the empty list. Second, it means false and is the value returned when a true-or-false-test tests false. nil can be written as an empty list, (), or as nil. As far as the Lisp interpreter is concerned, () and nil are the same. Humans, however, tend to use nil for false and () for the empty list.

When I implemented nil in my interpreter, back when I was using linked lists, this description seemed very confusing. With my implementation at the time, the symbol nil and the empty list () were two completely different expressions.

/* Symbol: nil */
Expr* sym  = expr_new(EXPR_SYMBOL);
sym->val.s = "nil";

/* Empty list: () */
Expr* list         = expr_new(EXPR_LIST);
list->val.children = NULL;

At the time, the symbol nil was just like any other variable from the global environment, bound to an empty list. However, in most Lisps, the following are equivalent:

nil
  ⇒ nil

'nil
  ⇒ nil

()
  ⇒ nil

'()
  ⇒ nil

This happens because, as explained above, lists are syntactic sugar, so an empty list is converted to the symbol nil by the parser. This makes many internal functions more consistent (and efficient), since you don’t have to check for multiple expression types.

/* Old, inconsistent */
bool is_nil(const Expr* e) {
    return (e->type == EXPR_LIST && e->val.children == NULL) ||
           (e->type == EXPR_SYMBOL && e->val.s != NULL &&
            !strcmp(e->val.s, "nil"));
}

When using the new approach, nil is a normal symbol that evaluates to itself (it’s simply used as a special indicator), and the empty list () is just syntactic sugar for it.

4.5. Advantage: No list wrappers

When using linked lists, many wrappers had to be written for internal functions that operated on individual expressions.

For example, assume there is an expr_equal function that checks whether two expressions have the same value. When an EXPR_LIST is encountered, it would have to compare each element recursively, ideally using an expr_list_equal wrapper.

bool expr_equal(const Expr* a, const Expr* b) {
    /* ... */

    switch (a->type) {
        case EXPR_LIST:
            return expr_list_equal(a, b);

        /* ... */
    }
}

bool expr_list_equal(const Expr* a, const Expr* b) {
    /* Similar to strcmp(3) */
    while (expr_equal(a, b)) {
        if (a == NULL)
            return true;

        a = a->next;
        b = b->next;
    }

    return false;
}

This pattern repeated with many internal functions, and many expr_* functions ended up having a expr_list_* equivalent.

When using cons cells, comparing two lists (whether they are proper or not) is as simple as comparing the CAR and CDR recursively. This is an example of how cons cells are their own independent expression, and how most operations can be applied without depending on their context.

bool expr_equal(const Expr* a, const Expr* b) {
    /* ... */

    switch (a->type) {
        case EXPR_PAIR:
            return expr_equal(CAR(a), CAR(b))
                && expr_equal(CDR(a), CDR(b));

        /* ... */
    }
}

4.6. Advantage: Lisp-like C code

Whether we chose to implement lists with one method or the other, the Lisp programmer should still have access to the usual cons, car and cdr procedures. By using the cons cell approach, our C code will use a similar structure to Lisp procedures, even if we decide to define iterative functions, rather than recursive ones.

As an example, look at how the following Lisp procedure is implemented. It is a recursive procedure that implements an iterative process⁷ using a tail call.

;; Does the function F return non-nil for all elements of LST?
(defun every (f lst)
  (cond ((null? lst) tru)
        ((not (f (car lst))) nil)
        (tru (every f (cdr lst)))))

Now compare it to the following iterative implementation as a C primitive.

extern Expr* funcall(Expr* func, Expr* arg);

bool every(Expr* f, Expr* lst) {
    for (; !is_nil(lst); lst = CDR(lst))
        if (is_nil(funcall(f, CAR(lst))))
            return false;

    return true;
}

This is a small advantage, but I was happy to notice it when I made the change, so I decided to include it.

Footnotes:

Since Lisp was originally implemented on the IBM 704 computer, CAR stood for Contents of the Address part of the Register, and CDR stood for Contents of the Decrement part of the Register. See John McCarthy, History of Lisp (1979).

Generally, nil is a pretty special symbol: It is considered both a symbol and a list (although it’s not considered a cons cell according to Common Lisp’s consp or Scheme’s pair?), it usually evaluates to itself, and in many dialects both (car nil) and (cdr nil) evaluate to nil. It is also often used to denote false in boolean operations.

This is naturally a big assumption, one that doesn’t even match our previous definition of Expr, since string literals are read-only, and we are storing them in a non-constant char*. We would probably need to use some function that allocates the string on the heap, like strdup(3).

⁴

Although Richard Stallman doesn’t agree with this statement, precisely because there aren’t proper cons cells in Clojure. See How I do my computing, retreived in February 2025.

⁵

I say “potentially” because, depending on how complex our Expr structure was, the size of the union might have been already greater than the size of ExprPair, so this change would have actually decreased the size of an Expr structure (since we removed the next pointer). One example where the union could be storing bigger members is if we kept track of the length of symbols/strings along with the actual char* data.

⁶

As explained above, proper lists are lists of cons cells whose last CDR is nil. This is important because we want to stop as soon as we encounter nil, but an improper list does not contain a null CDR so we would need a different loop condition. Note that nil is also a proper list, and therefore a valid input for this sum function.

⁷

See Section 1.2.1 of Structure and Interpretation of Computer Programs for more information on how the implementation of an iterative process can be recursive.