The pros and cons of Cons
Table of Contents
1. Introduction
This article is about the advantages and disadvantages of using cons cells when implementing a Lisp-like programming language.
I have been working on a simple Lisp interpreter for some months now, and I have learned many things from my mistakes. I recently changed how lists are internally stored in my interpreter, and I wanted to explain why I decided to use my old approach in the beginning, and why I decided to eventually change into a more conventional cons cell approach.
Throughout this article I will use the Lisp syntax from my interpreter, but the
examples can be run in most Lisp dialects with some minor modifications. Just
keep in mind that, unlike in Lisp2 dialects (e.g. Common Lisp), there isn’t a
separate namespace for functions, so you won’t see funcall
.
1.1. Cons cells
If you are familiar with Lisp languages, you probably know what cons cells are, but I will still give a brief explanation.
The term “cons” is a bit ambiguous in Lisp, because it’s commonly used when
talking about a data type, but it’s also the name of a procedure that creates an
object of that type. Although basically all Lisp dialects have a cons
function,
the name used to refer to the data type changes among them: Scheme calls them
pairs, Emacs Lisp calls them cons cells, and Common Lisp simply calls them
cons. In this article, I will use the term “cons cell” when referring to the
data type to avoid confusion.
A cons cell is a Lisp object that simply points to another two to Lisp objects. For historical reasons, the first pointer is called the CAR, and the second pointer is called the CDR1. These CAR and CDR terms are not only used when talking about the pointers themselves, but also when talking about the values that they point to.
Figure 1: Box diagram of a cons cell.
These kinds of diagrams are called box-and-pointer, box-and-arrow, or simply box diagrams, and they are very useful for understanding how complex data structures are stored internally.
The procedures for accessing the CAR and the CDR of a cons cell are called car
and cdr
. There are also wrappers for combining them, so (caddr x)
would return
the car
of the cdr
of the cdr
of x
.
(car (cons 10 20)) ⇒ 10 (cdr (cons 10 20)) ⇒ 20
Finally, I would like to mention dotted pair or cons pair notation. When a dot is encountered inside a list (i.e. between parentheses), the interpreter assumes that the element before the dot is the CAR of the current pair, and the one after the dot is the CDR.
(car '(10 . 20)) ⇒ 10 (cdr '(10 . 20)) ⇒ 20
1.2. Lists and syntactic sugar
When people mention lists in the context of Lisp, they are usually talking about something that looks like:
(a b 1 2)
These are called proper lists, but it’s important to understand that (usually) the lists are internally stored as chained cons cells, and that the previous notation is just syntactic sugar for:
(a . (b . (1 . (2 . nil))))
As you can see, each element of our list is stored in the CAR of a cons cell,
and the CDR points to another cons cell which contains the next element. The CDR
of the innermost cons cell is the symbol nil
, used to terminate
lists2. Although not all Lisp dialects use the
symbol nil
, there is always a unique value used to terminate proper lists. In
box diagrams, a crossed box is used to represent this terminator.
Figure 2: Box diagram of a proper list.
Let me emphasize that there is no “list type”, it’s just a data structure that is built out of cons cells.
2. The initial approach
Now that we know how lists are conventionally represented in Lisps, let’s have a look at an alternative implementation.
One of the books that inspired me to write a Lisp interpreter was Build Your Own Lisp by Daniel Holden. The author doesn’t use the cons cell approach described above, and although I didn’t follow the book, at first glance it didn’t seem like a bad idea. This article tries to precisely show its advantages and limitations when compared to the traditional approach explained above.
2.1. A basic Expr
structure
First, let’s have a look at how a very basic Lisp expression would look like from C, using a tagged union.
enum EExprType { EXPR_INTEGER, EXPR_FLOAT, EXPR_SYMBOL, }; typedef struct Expr { enum EExprType type; union { int n; float f; char* s; } val; } Expr;
I will not spend too much time explaining why tagged unions are useful, since it
is not the scope of this article, but in case you are not familiar with them,
just know that they generally have a lower memory impact, and that we could
access the appropriate value of an expression by first checking its type
member.
int f(Expr* expr) { switch (expr->type) { case EXPR_INTEGER: return expr->val.n; case EXPR_FLOAT: return (int)expr->val.f; case EXPR_SYMBOL: return strlen(expr->val.s); default: abort(); /* ??? */ } }
2.2. Combining expressions with linked lists
Now let’s have a look at how we could combine simple expressions into more complex structures using linked lists. The premise of a linked list is that each object in the list contains a pointer to the next one, therefore allowing the programmer to link objects that are not adjacent in memory, which is a limitation when using simple arrays.
Figure 3: Memory layout of a generic linked list of 3 elements.
We could use this linked list method for joining an arbitrary number of
expressions together. Keep in mind that the lists themselves (just like the cons
cells in conventional implementations) are expressions, so we would need to add
a new expression type whose value is a pointer to the start of a linked list of
expressions. The following code shows how the new Expr
structure would look like
after adding the necessary members.
enum EExprType { /* ... */ EXPR_LIST, }; typedef struct Expr { enum EExprType type; union { /* ... */ struct Expr* children; } val; struct Expr* next; } Expr;
Notice how children
is a member of the union, but next
is a member of the Expr
structure, so the size of each expression just increased by sizeof(Expr*)
, not
by 2 * sizeof(Expr*)
.
The following code shows how we would manually create the list (a b 1 2)
from C,
assuming that there is some expr_new
function that allocates an expression with
the specified type. For readability, we will also assume that we can safely
store string literals in our expressions3.
extern Expr* expr_new(enum EExprType type); Expr* get_list(void) { /* Create the list itself */ Expr* list = expr_new(EXPR_LIST); list->next = NULL; /* Write each element */ Expr* elt = expr_new(EXPR_SYMBOL); list->val.children = elt; elt->val.s = "a"; elt->next = expr_new(EXPR_SYMBOL); elt = elt->next; elt->val.s = "b"; elt->next = expr_new(EXPR_INTEGER); elt = elt->next; elt->val.n = 1; elt->next = expr_new(EXPR_INTEGER); elt = elt->next; elt->val.n = 2; /* Terminate the linked list */ elt->next = NULL; return list; }
The following diagram shows how the list would be stored in memory with our new structure.
Figure 4: Layout of a list of 4 expressions, using the linked list approach.
As you can see, each expression has a next
member, so they will always be
implicitly in a list, even if an expression isolated like the first one. This
has some important consequences that will be explained below.
The following code shows how we would iterate each expression of a list; in this case for adding some integers together.
int sum(const Expr* list) { assert(list->type == EXPR_LIST); int total = 0; for (Expr* e = list->val.children; e != NULL; e = e->next) if (e->type == EXPR_INTEGER) total += e->val.n; return total; }
Before getting into the advantages and disadvantages of the implementation, I would like to note that the author of the book I mentioned probably chose this approach to deliberately explain how linked lists work, since the book is also meant for people learning C.
I would also like to mention that Clojure, a Lisp dialect4, uses a similar linked list approach through sequences, as described in this article. And indeed, this can be seen in the Java source code where the Cons class is defined.
3. The conventional cons cell approach
Now that we had a look at how to combine expressions with a linked list, let’s go back to the cons cell approach that is used in most Lisps. Instead of adding “list” expressions to our implementation, we will simply add a “pair” type that contains the CAR and CDR pointers, and we will chain them to build lists.
The following code shows how we could extend our basic Expr
structure from above
to include pairs.
enum EExprType { /* ... */ EXPR_PAIR, }; typedef struct ExprPair { struct Expr* car; struct Expr* cdr; } ExprPair; typedef struct Expr { enum EExprType type; union { /* ... */ struct ExprPair pair; } val; } Expr;
There are some details I would like to note from the previous code. First,
notice how the Expr
structure doesn’t have a next
member anymore, so expressions
are not in an “implicit” list. Second, notice how the pair
union member is an
ExprPair
structure, not a pointer to that structure; this is because the pair
structure is very small, so there is no need to allocate it separately.
We (potentially5) increased the size of the
union by sizeof(expr.val.pair.cdr)
, but also decreased the size of the structure
by sizeof(expr.next)
, so the size of an individual expression was not negatively
impacted by this change.
The following code is equivalent to the one shown in the previous section for
creating the list (a b 1 2)
. We assume that there is some NIL
expression that is
used to uniquely identify a list terminator, but we could set the last CDR
pointer to NULL
instead.
extern Expr* expr_new(enum EExprType type); extern Expr* NIL; Expr* get_list(void) { Expr* first = expr_new(EXPR_PAIR); Expr* cur = first; cur->val.pair.car = expr_new(EXPR_SYMBOL); /* CADR */ cur->val.pair.car->val.s = "a"; cur->val.pair.cdr = expr_new(EXPR_PAIR); cur = cur->val.pair.cdr; cur->val.pair.car = expr_new(EXPR_SYMBOL); /* CADR */ cur->val.pair.car->val.s = "b"; cur->val.pair.cdr = expr_new(EXPR_PAIR); cur = cur->val.pair.cdr; cur->val.pair.car = expr_new(EXPR_INTEGER); /* CADDR */ cur->val.pair.car->val.n = 1; cur->val.pair.cdr = expr_new(EXPR_PAIR); cur = cur->val.pair.cdr; cur->val.pair.car = expr_new(EXPR_INTEGER); /* CADDDR */ cur->val.pair.car->val.n = 2; /* Terminate the chain of cons cells */ cur->val.pair.cdr = NIL; return first; }
The following diagram shows how the list would be stored in memory with our new structure.
Figure 5: Layout of a list of 4 expressions, using the cons cell approach.
The following code shows how we would iterate over a (proper6) list, accessing each element. Since the previous code is
not very readable, I added some CAR
and CDR
macros for (hopefully) making the
code a bit cleaner.
#define CAR(EXPR_PTR) ((EXPR_PTR)->val.pair.car) #define CDR(EXPR_PTR) ((EXPR_PTR)->val.pair.cdr) extern bool is_proper_list(const Expr* expr); extern bool is_nil(const Expr* expr); int sum(const Expr* list) { assert(is_proper_list(list)); int total = 0; for (; !is_nil(list); list = CDR(list)) if (CAR(list) == EXPR_INTEGER) total += CAR(list)->val.n; return total; }
Just like with expr_new
, notice how we use two functions that we haven’t
defined: is_proper_list
and is_nil
. It’s not necessary to know how these
functions are implemented, or even what nil
really is, as long as we have a
reliable way of checking for this unique value. This is called wishful thinking,
and is a key concept from Structure and Interpretation of Computer Programs, a
great book that I truly recommend (not just for Lisp programmers).
4. Comparing the two methods
After considering how the two approaches could be implemented, we can finally compare them. I will start with the disadvantages of the conventional cons cell approach when compared with the simple linked list.
4.1. Disadvantage: Memory impact
If we compare the memory layouts for the linked list and the cons cell implementations, we can immediately see that the latter uses more memory for storing the same list.
Specifically, since the linked list approach needed an expression for the list
itself, we would need n+1
expressions for building a list of n elements. When
using the cons cell approach, each element of the (proper) list is placed inside
a cons pair, so we would need 2n
expressions, assuming that we don’t need to
allocate a nil
expression for every list, since that is not usually the case.
Considering this fact, it’s easy to conclude that the linked list implementation is more memory-efficient, and that there is no point in using cons cells. In that regard, I would like to say that even if the implementation uses cons cells for building conventional lists, this doesn’t prevent us from adding another more optimized “vector” or “array” data structure.
4.2. Advantage: Improper lists
When describing Lisp lists, I only mentioned proper lists, but in most Lisp
dialects, there are also improper lists. One subtype of improper lists are
dotted lists: lists whose last CDR is not nil
. Such a list might be defined with
the dotted pair notation described above, or by using cons
.
(cdr '(a b . c)) ⇒ (b . c) (cadr '(a b . c)) ; (car (cdr ...)) ⇒ b (cddr '(a b . c)) ; (cdr (cdr ...)) ⇒ c (cons 'a (cons 'b (cons 3 4))) ⇒ (a b 3 . 4)
This data structure is perfectly possible with the cons cell approach, but
impossible when using a linked list, since the possible values of the next
member are either the address of another expression or NULL
.
I would also like to mention circular lists, another type of improper list. A
circular list can be defined as “a chain of cons cells that has no termination
because some cell in the chain is the CDR of a later cons cell”. To be
completely fair, we could still build circular lists with the linked list
approach, since the next
member can point to a previous element of the list.
4.3. Advantage: Context independence, reusing references
As mentioned above, when using linked lists, each expression is restricted to
their context, since they always have a next
pointer.
In the book by Daniel Holden, this isn’t a problem because most functions returned copies, instead of references. Naturally, this is appropriate for his book, since it would be confusing for new C programmers to implement a more complex memory-management system like garbage collection.
Returning copies is not practical when making a real implementation,
however. Not only does it waste memory and decrease performance (with
unnecessary calls to the allocator), but it also makes some standard Lisp
functions like car
and cdr
counter-intuitive, since they are supposed to return
references, instead of copies.
To further illustrate why the next
pointer is a problem, let’s look at a
possible implementation of the car
primitive. The following function, written
for the linked list implementation, receives an expression of type EXPR_LIST
and
is supposed to return its first element.
Expr* expr_clone(const Expr* e) { Expr* result = expr_new(e->type); result->val = e->val; result->next = NULL; return result; } Expr* car(Expr* list) { if (is_nil(list)) return NIL; assert(list->type == EXPR_LIST); return expr_clone(list->val.children); }
We can’t directly return list->val.children
because the first element still has
the rest of the list “attached” through its next
pointer, and we can’t set
list->val.children->next
to NULL
to “isolate” the first element either, because
we would be overwriting the list (a b c)
into becoming (a)
, and we could even be
leaking memory. Even if we wanted to directly return the reference, we can’t do
it without breaking the structure of the list we received, so we are forced to
return a clone.
However, with the cons cell approach, this is primitive would be much simpler to implement, since a list references its contents, but the contents themselves are completely independent of their context.
Expr* car(Expr* expr) { if (is_nil(expr)) return NIL; assert(expr->type == EXPR_PAIR); return expr->val.pair.car; }
Keep in mind that, since we are returning a reference, if something overwrites
the value of the expression returned by car
, it will also affect the contents of
the cons cell where it’s referenced. This behavior is expected in Lisp, and we
could always write another function that returns a copy for this purpose.
Although this advantage might not look too significant, it allows us to work with references, which can potentially save more memory than when using linked lists, as explained in the disadvantage above.
4.4. Advantage: Consistent nil
In Lisp, the symbol nil
is often used to represent the empty list. The following
quote is taken from the Emacs manual:
In Emacs Lisp, the symbol
nil
has two meanings. First, it means the empty list. Second, it means false and is the value returned when a true-or-false-test tests false.nil
can be written as an empty list,()
, or asnil
. As far as the Lisp interpreter is concerned,()
andnil
are the same. Humans, however, tend to usenil
for false and()
for the empty list.
When I implemented nil
in my interpreter, back when I was using linked lists,
this description seemed very confusing. With my implementation at the time, the
symbol nil
and the empty list ()
were two completely different expressions.
/* Symbol: nil */ Expr* sym = expr_new(EXPR_SYMBOL); sym->val.s = "nil"; /* Empty list: () */ Expr* list = expr_new(EXPR_LIST); list->val.children = NULL;
At the time, the symbol nil
was just like any other variable from the global
environment, bound to an empty list. However, in most Lisps, the following are
equivalent:
nil ⇒ nil 'nil ⇒ nil () ⇒ nil '() ⇒ nil
This happens because, as explained above, lists are syntactic sugar, so an empty
list is converted to the symbol nil
by the parser. This makes many internal
functions more consistent (and efficient), since you don’t have to check for
multiple expression types.
/* Old, inconsistent */ bool is_nil(const Expr* e) { return (e->type == EXPR_LIST && e->val.children == NULL) || (e->type == EXPR_SYMBOL && e->val.s != NULL && !strcmp(e->val.s, "nil")); }
When using the new approach, nil
is a normal symbol that evaluates to itself
(it’s simply used as a special indicator), and the empty list ()
is just
syntactic sugar for it.
4.5. Advantage: No list wrappers
When using linked lists, many wrappers had to be written for internal functions that operated on individual expressions.
For example, assume there is an expr_equal
function that checks whether two
expressions have the same value. When an EXPR_LIST
is encountered, it would have
to compare each element recursively, ideally using an expr_list_equal
wrapper.
bool expr_equal(const Expr* a, const Expr* b) { /* ... */ switch (a->type) { case EXPR_LIST: return expr_list_equal(a, b); /* ... */ } } bool expr_list_equal(const Expr* a, const Expr* b) { /* Similar to strcmp(3) */ while (expr_equal(a, b)) { if (a == NULL) return true; a = a->next; b = b->next; } return false; }
This pattern repeated with many internal functions, and many expr_*
functions
ended up having a expr_list_*
equivalent.
When using cons cells, comparing two lists (whether they are proper or not) is as simple as comparing the CAR and CDR recursively. This is an example of how cons cells are their own independent expression, and how most operations can be applied without depending on their context.
bool expr_equal(const Expr* a, const Expr* b) { /* ... */ switch (a->type) { case EXPR_PAIR: return expr_equal(CAR(a), CAR(b)) && expr_equal(CDR(a), CDR(b)); /* ... */ } }
4.6. Advantage: Lisp-like C code
Whether we chose to implement lists with one method or the other, the Lisp
programmer should still have access to the usual cons
, car
and cdr
procedures. By using the cons cell approach, our C code will use a similar
structure to Lisp procedures, even if we decide to define iterative functions,
rather than recursive ones.
As an example, look at how the following Lisp procedure is implemented. It is a recursive procedure that implements an iterative process7 using a tail call.
;; Does the function F return non-nil for all elements of LST? (defun every (f lst) (cond ((null? lst) tru) ((not (f (car lst))) nil) (tru (every f (cdr lst)))))
Now compare it to the following iterative implementation as a C primitive.
extern Expr* funcall(Expr* func, Expr* arg); bool every(Expr* f, Expr* lst) { for (; !is_nil(lst); lst = CDR(lst)) if (is_nil(funcall(f, CAR(lst)))) return false; return true; }
This is a small advantage, but I was happy to notice it when I made the change, so I decided to include it.
Footnotes:
Since Lisp was originally implemented on the IBM 704 computer, CAR stood for Contents of the Address part of the Register, and CDR stood for Contents of the Decrement part of the Register. See John McCarthy, History of Lisp (1979).
Generally, nil
is a pretty special symbol: It is considered both a
symbol and a list (although it’s not considered a cons cell according to Common
Lisp’s consp
or Scheme’s pair?
), it usually evaluates to itself, and in many
dialects both (car nil)
and (cdr nil)
evaluate to nil
. It is also often used to
denote false in boolean operations.
This is naturally a big assumption,
one that doesn’t even match our previous definition of Expr
, since string
literals are read-only, and we are storing them in a non-constant char*
. We
would probably need to use some function that allocates the string on the heap,
like strdup(3)
.
Although Richard Stallman doesn’t agree with this statement, precisely because there aren’t proper cons cells in Clojure. See How I do my computing, retreived in February 2025.
I say “potentially” because, depending on how complex our
Expr
structure was, the size of the union might have been already greater than
the size of ExprPair
, so this change would have actually decreased the size of
an Expr
structure (since we removed the next
pointer). One example where the
union could be storing bigger members is if we kept track of the length of
symbols/strings along with the actual char*
data.
As explained
above, proper lists are lists of cons cells whose last CDR is nil
. This is
important because we want to stop as soon as we encounter nil
, but an improper
list does not contain a null CDR so we would need a different loop
condition. Note that nil
is also a proper list, and therefore a valid input for
this sum
function.
See Section 1.2.1 of Structure and Interpretation of Computer Programs for more information on how the implementation of an iterative process can be recursive.