Writing a simple pool allocator in C

1. Introduction
2. Writing the initial implementation
3. Reallocation problems
4. Second implementation: Expanding the pool
5. Adding valgrind support

1. Introduction

I found out about pool allocators some time ago, and I really liked its simplicity and high performance, so I decided to write my own. This article was initially inspired by Dmitry Soshnikov’s article. It’s also worth mentioning that this article was discussed in Hacker News.

Similarly to malloc, a pool allocator allows the user to allocate memory at run time. The pool allocator, however, is much faster than malloc¹, at the cost of having a fixed pool size. It allows the user to allocate and free memory blocks (referred to as chunks, from now on) in O(1) constant time. This implementation also uses very little memory: when creating the pool, a very small Pool structure is allocated, along with the pool itself. Free chunks are used to store information, so the memory impact is minimal.

Lastly, I would like to mention that this article is based on my ANSI C library for pool allocation, libpool. In the future, I might make small changes to the code on that repository and they won’t be reflected here, so feel free to check it out as well.

2. Writing the initial implementation

The following implementation corresponds to version v1.0.0 of my libpool library, with some notable “visual” differences (e.g. without ANSI restrictions). Some improvements, like being able to resize/reallocate an existing pool, will also be expanded below.

2.1. The `Chunk` type

In our simple implementation, we will declare a Chunk type with a fixed size. In a more general implementation, we would use a chunk size specified by the programmer at runtime, but that method adds a lot of casts, and I think it would make this article more confusing.

First, let’s have a look at the definition of Chunk.

#define CHUNK_SZ 64

typedef union Chunk Chunk;
union Chunk {
    Chunk* next;
    char arr[CHUNK_SZ];
};

Notice how we are typedef’ing a union, not a struct.

If you are not familiar with C union’s, they are syntactically similar to struct’s, but they can be used to define variables that may hold (at different times) objects of different types and sizes; as opposed to structures, which let the programmer group different data types as a single record. Unions essentially let the programmer access the data in a single area of storage as if it were storing one of many data types at a given time. When you access a member of a union variable (e.g. my_chunk.next or my_chunk.arr), the compiler is responsible for accessing the union variable as if it were the specified member type (in that example, either a pointer or an array of CHUNK_SZ characters, never both).

In this case, using a union is convenient because it lets us “interpret” the Chunk as a pointer to another Chunk depending on the context, while also specifying the real size of the Chunk (i.e. CHUNK_SZ). Also note that the size of a union is the size of the biggest possible member (in a 64-bit computer, sizeof(Chunk*) is 8 bytes, and since the arr member is 64 bytes, that will be the size of the Chunk union).

However, it’s still not clear why we would need to interpret the Chunk as a pointer depending on the context. First, we need to understand that there will be two (implicit) types of chunks:

Free chunks: They are not in use by the program. They will use the next pointer.
Non-free chunks: They have been allocated, so we assume they contain arbitrary user data. Specifically, in this implementation, each chunk can contain (at most) CHUNK_SZ bytes of data.

As mentioned, these types are “implicit”. This means that there isn’t any flags variable that lets us know whether a specified chunk is free; instead, we know that a chunk is free because it will be inside a linked list of free chunks. When we initialize the pool, since all chunks are free, we will use the Chunk.next pointer to link each chunk to its adjacent one, like this:

Figure 1: Four free chunks right after initializing the pool.

The gray area represents the rest of the union that is not used when accessing Chunk.next, and the incomplete red arrow represents a NULL pointer. The creation of this linked list, along with its advantages, will be explained when creating the pool below.

2.2. The `Pool` structure

We will also declare a Pool structure, which simply contains:

A pointer to the first element of the linked list of free chunks, which will be used when the user allocates a chunk.
A pointer to the first element of the chunk array, which will be used when closing the whole pool.

typedef struct Pool Pool;
struct Pool {
    Chunk* free_chunk;
    Chunk* chunk_arr;
};

Again, other members such as the chunks size might be necessary depending on the implementation; in this case it’s known at compile time.

2.3. Creating the pool

We will define a function for creating a Pool with an arbitrary number of chunks.

Pool* pool_new(size_t pool_sz) {
    Pool* pool = malloc(sizeof(Pool));
    if (pool == NULL)
        return NULL;

    pool->chunk_arr = pool->free_chunk = malloc(pool_sz * sizeof(Chunk));
    if (pool->chunk_arr == NULL) {
        free(pool);
        return NULL;
    }

    for (size_t i = 0; i < pool_sz - 1; i++)
        pool->chunk_arr[i].next = &pool->chunk_arr[i + 1];
    pool->chunk_arr[pool_sz - 1].next = NULL;

    return pool;
}

Here’s a brief explanation of each step:

We allocate the Pool structure that will be returned, using malloc².
We allocate the pool itself, that is, the array of Chunk unions. We initialize both chunk_arr and free_chunk pointers to the same address, since all chunks will be free by default.
We build the linked list of free chunks. We set the .next member of the Chunk union to the address of the adjacent chunk³, except for the last free chunk, which will point to NULL.

This is how the pool looks after being returned from pool_new:

Figure 2: Layout of a Pool structure after initialization.

And this is how the pool looks after the user has allocated two chunks. This process will be explained below, but perhaps you are starting to realize the advantages of this method:

Figure 3: Layout of a Pool structure after two allocations.

Since this implementation doesn’t support pool resizing, the only O(n) algorithm occurs when creating the pool itself, since we need to iterate each chunk to build the linked list described above. The chunk allocation process, on the other hand, has O(1) complexity, since we always have a free chunk waiting for us at Pool.free_chunk. Freeing a chunk is also done in constant time, since we just have to prepend an element to this linked list.

2.4. Allocating chunks

Now that the pool has a pointer to a linked list of free chunks, when the user requests an allocation for a chunk, we just have to:

Ensure that we haven’t reached the end of the list, that is, ensure the Pool.free_chunk pointer is not NULL.
The first element of this “free chunks” list will be returned. Before that, remove it from the list by setting the start of the list (i.e. Pool.free_chunk) to what used to be the second element (i.e. Pool.free_chunk.next).

void* pool_alloc(Pool* pool) {
    if (pool == NULL || pool->free_chunk == NULL)
        return NULL;

    Chunk* result    = pool->free_chunk;
    pool->free_chunk = pool->free_chunk->next;
    return result;
}

Now the user can safely overwrite the contents of the pointer returned by pool_alloc, and it will be essentially setting the arr member of the Chunk union. This is fine, since that chunk is no longer part of our “free chunks” list.

Just to emphasize once again, we are not iterating anything, so this process is constant. Allocating chunks of arbitrary size on constant time obviously has great advantages, specially when we have to allocate and free many times per second (e.g. many entities in each tick of a simulation).

2.5. Freeing chunks

Freeing chunks is pretty straight-forward, and if you understood the previous sections, I recommend you try to write your own function.

The freeing process simply consists of adding (prepending) a chunk back into the linked list of free chunks. As you can probably tell, this is also a constant process.

void pool_free(Pool* pool, void* ptr) {
    if (pool == NULL || ptr == NULL)
        return;

    Chunk* freed     = ptr;
    freed->next      = pool->free_chunk;
    pool->free_chunk = freed;
}

For example, following the previous figure, this would be the layout after the user frees the first block of memory.

Figure 4: Layout of a Pool structure after freeing a chunk.

Notice how, unlike with arena allocators, we don’t have to free in the same order that we allocated.

2.6. Closing the pool

Finally, once the user is done with the pool itself, it should be able to free it to the system. This is also pretty intuitive in this implementation, but it will get a bit trickier below.

void pool_destroy(Pool* pool) {
    if (pool == NULL)
        return;

    free(pool->chunk_arr);
    free(pool);
}

3. Reallocation problems

When using a pool allocator, at some point you will probably want to be able to resize an existing pool, for example when you run out of chunks. This might not seem too hard, but there are a few caveats.

At first sight, we could reallocate a pool with a few simple steps:

Reallocate the old chunk array (i.e. my_pool.chunk_arr).
Link the new chunks together, just like we did when creating the pool.
Prepend the new chunks to the list of free chunks, just like we did when freeing a previously-allocated chunk.

For example, following the previous figure, if we reallocated the pool to add two more chunks, we would (at first sight) get the following layout.

Figure 5: Layout of a Pool structure after resizing it, with two new chunks.

However, there is an important detail that is easy to miss. When we reallocate the pool (i.e. the array of chunks), the base address of the array might change, so the address of each chunk will also change. This is a problem because:

The old pointers used to build the linked list of free chunks still point to the old array, so they become invalid. There are a few possible fixes for this, like recalculating the offsets⁴ from the old base address, storing offsets instead of pointers, etc.
The pointers we returned when the user allocated chunks also point to the old array, so they are also invalid. If the user tries to access or free these pointers, a segmentation fault might occur.

This is how the layout will probably look like after the reallocation. Incomplete connections crossed-out with a single line represent invalid (but non-null) pointers to the old array, which is now invalid.

Figure 6: Resizing problems: old pointers may become invalid.

4. Second implementation: Expanding the pool

Instead of modifying the existing chunk array, we can allocate a new array with the number of chunks we want to add and prepend them to the linked list of free chunks an existing pool. Although this only let’s the pool grow (and not shrink), I think it’s what most implementations will need. Some important details about this approach will be explained below.

The following figure shows how two different Chunk arrays could be allocated separately. The green area denotes the initial chunk array allocated inside pool_new, while the blue area denotes a different chunk array allocated when expanding the pool. The two arrays don’t necessarily have to be adjacent in memory, which is why there is no need for reallocations.

Figure 7: Different chunk arrays after expanding a pool.

Notice how we have to keep track of the start of each array, since we will need to free them separately. In the previous figure, we use two chunk_arr0 and chunk_arr1 members to denote this, but since we would like to allow the user to expand the pool an arbitrary number of times, we should be able to keep track of an indefinite number of pointers (to the start of the chunk arrays) at runtime.

4.1. Keeping track of the array starts

For keeping track of these pointers, we will create another linked list of “array starts”. We declare an ArrayStart structure which will contain the address of the next element in the linked list (or NULL), along with the pointer to the start of each array.

typedef struct ArrayStart ArrayStart;
struct ArrayStart {
    Chunk* arr;
    ArrayStart* next;
};

Now, instead of storing a single Chunk* in the Pool structure, we store a pointer to the linked list of array starts.

struct Pool {
    Chunk* free_chunk;
    ArrayStart* array_starts; /* Updated */
};

This takes a bit more space in memory, but it’s worth it. Even if we don’t expand the pool, only the size 2 more pointers would be needed: one that points to the ArrayStart structure itself, and the (unused) .next member.

4.2. Changes to `pool_new` and `pool_destroy`

The pool_new and pool_free functions need to be modified according to our new ArrayStart structure.

When creating the pool, instead of storing the base address of the chunk array in pool->chunk_arr, we will have to allocate an ArrayStart structure and write it there.

Pool* pool_new(size_t pool_sz) {
    Pool* pool = malloc(sizeof(Pool));
    if (pool == NULL)
        return NULL;

    Chunk* arr = pool->free_chunk = malloc(pool_sz * sizeof(Chunk));
    if (arr == NULL) {
        free(pool);
        return NULL;
    }

    for (size_t i = 0; i < pool_sz - 1; i++)
        arr[i].next = &arr[i + 1];
    arr[pool_sz - 1].next = NULL;

    /* Added */
    pool->array_starts = malloc(sizeof(ArrayStart));
    if (pool->array_starts == NULL) {
        free(arr);
        free(pool);
        return NULL;
    }

    pool->array_starts->next = NULL;
    pool->array_starts->arr  = arr;

    return pool;
}

When closing the pool, we will also need to traverse this array_starts linked list, freeing each chunk array and each ArrayStart structure in the list.

void pool_destroy(Pool* pool) {
    if (pool == NULL)
        return;

    ArrayStart* array_start = pool->array_starts;
    while (array_start != NULL) {
        ArrayStart* next = array_start->next;
        free(array_start->arr);
        free(array_start);
        array_start = next;
    }

    free(pool);
}

4.3. Expanding without modifying the array

Now that we have a way of storing the start addresses of an indefinite number of chunk arrays, we can implement the expansion method shown in the previous figure. The expansion process is the following:

Allocate the array of extra chunks that we are trying to add to the pool.
Link the new chunks together, just like we did when creating the pool.
Prepend the array of extra chunks to the “free chunks” list, just like we did when freeing chunks.
Allocate a new ArrayStart structure, and store the start of the new chunk array in it.
Prepend this new ArrayStart structure to the linked list of “array starts”, stored inside the Pool structure.

And this is the code corresponding to the previous steps.

bool pool_expand(Pool* pool, size_t extra_sz) {
    if (pool == NULL || extra_sz == 0)
        return false;

    /* Step 1 */
    Chunk* extra_arr = malloc(extra_sz * sizeof(Chunk));
    if (extra_arr == NULL)
        return false;

    /* Step 2 */
    for (size_t i = 0; i < extra_sz - 1; i++)
        extra_arr[i].next = &extra_arr[i + 1];

    /* Step 3 */
    extra_arr[extra_sz - 1].next = pool->free_chunk;
    pool->free_chunk             = extra_arr;

    /* Step 4 */
    ArrayStart* array_start = malloc(sizeof(ArrayStart));
    if (array_start == NULL) {
        free(extra_arr);
        return false;
    }

    /* Step 5 */
    array_start->arr   = extra_arr;
    array_start->next  = pool->array_starts;
    pool->array_starts = array_start;

    return true;
}

Just like in the previous figure, the green and blue regions represent arrays allocated independently, but their respective ArrayStart structures are also included in the diagram.

Figure 8: Layout of a pool after expanding, showing the linked list of array starts.

Naturally, this second implementation is able to allocate and free chunks with the same O(1) efficiency. The memory impact is slightly bigger, but it’s probably worth it if you ever want to resize a pool.

5. Adding valgrind support

Valgrind is a very useful tool for debugging and profiling programs. Among other things, it lets you detect memory leaks and invalid memory accesses. Personally, I have been using it for some years for detecting memory-related bugs in programs that use the standard library (i.e. malloc, free, etc.), but I didn’t know that the valgrind framework also had support for custom allocators.

The Memcheck manual contains a lot of useful information for us, specially the Memory Pools section. This Red Hat article also contains some useful information about how the valgrind framework can be used with other kinds of allocators.

First, we will need to include valgrind’s headers. You might need to install some valgrind-devel package, depending on your distribution.

#include <valgrind/valgrind.h>
#include <valgrind/memcheck.h>

We will need to register an anchor address (i.e. the address of our Pool structure) with VALGRIND_CREATE_MEMPOOL once, in pool_new. This anchor address is needed to associate each chunk with its specific pool whenever it’s allocated or freed. We will also have to use VALGRIND_DESTROY_MEMPOOL once we are done with the pool, in pool_destroy.

The VALGRIND_MAKE_MEM_NOACCESS macro will be used when a memory region should be inaccessible by the user (e.g. for the Pool and ArrayStart structures); while the VALGRIND_MAKE_MEM_DEFINED macro will be used whenever we need to read or write to a NOACCESS region (e.g. when accessing the Pool.array_starts member in pool_expand). There is also a VALGRIND_MAKE_MEM_UNDEFINED macro, but we won’t need to call it manually.

Finally, we will have to use VALGRIND_MEMPOOL_ALLOC when allocating a chunk from the pool, and VALGRIND_MEMPOOL_FREE when freeing it. These functions will call VALGRIND_MAKE_MEM_UNDEFINED⁵ and VALGRIND_MAKE_MEM_NOACCESS respectively, so we won’t have to worry about that.

With all that in mind, this is how our new code would look like:

Pool* pool_new(size_t pool_sz, size_t chunk_sz) {
    /* ... */

    VALGRIND_MAKE_MEM_NOACCESS(arr, pool_sz * chunk_sz);
    VALGRIND_MAKE_MEM_NOACCESS(pool->array_starts, sizeof(ArrayStart));
    VALGRIND_MAKE_MEM_NOACCESS(pool, sizeof(Pool));
    VALGRIND_CREATE_MEMPOOL(pool, 0, 0);

    return pool;
}

bool pool_expand(Pool* pool, size_t extra_sz) {
    if (pool == NULL || extra_sz == 0)
        return false;

    VALGRIND_MAKE_MEM_DEFINED(pool, sizeof(Pool));

    /* ... */

    VALGRIND_MAKE_MEM_NOACCESS(extra_arr, extra_sz * pool->chunk_sz);
    VALGRIND_MAKE_MEM_NOACCESS(array_start, sizeof(ArrayStart));
    VALGRIND_MAKE_MEM_NOACCESS(pool, sizeof(Pool));

    return true;
}

void pool_destroy(Pool* pool) {
    if (pool == NULL)
        return;

    VALGRIND_MAKE_MEM_DEFINED(pool, sizeof(Pool));

    ArrayStart* array_start = pool->array_starts;
    while (array_start != NULL) {
        VALGRIND_MAKE_MEM_DEFINED(array_start, sizeof(ArrayStart));
        /* ... */
    }

    VALGRIND_DESTROY_MEMPOOL(pool);
    free(pool);
}

void* pool_alloc(Pool* pool) {
    if (pool == NULL)
        return NULL;
    VALGRIND_MAKE_MEM_DEFINED(pool, sizeof(Pool));

    if (pool->free_chunk == NULL)
        return NULL;
    VALGRIND_MAKE_MEM_DEFINED(pool->free_chunk, sizeof(void**));

    /* ... */

    VALGRIND_MEMPOOL_ALLOC(pool, result, pool->chunk_sz);
    VALGRIND_MAKE_MEM_NOACCESS(pool->free_chunk, sizeof(void**));
    VALGRIND_MAKE_MEM_NOACCESS(pool, sizeof(Pool));
    return result;
}

void pool_free(Pool* pool, void* ptr) {
    if (pool == NULL || ptr == NULL)
        return;

    VALGRIND_MAKE_MEM_DEFINED(pool, sizeof(Pool));

    /* ... */

    VALGRIND_MAKE_MEM_NOACCESS(pool, sizeof(Pool));
    VALGRIND_MEMPOOL_FREE(pool, ptr);
}

Although it makes the code less readable, adding valgrind support is definitely worth it.

As I mentioned, this is the first time I use valgrind with a non-standard allocator, so feel free to contribute if you have any suggestions or improvements.

Footnotes:

This depends on many factors: the malloc implementation, the hardware, the CPU load, etc. Using the benchmark.sh script from commit 9b101a7 of my libpool repo showed that malloc took ~2.25 seconds to allocate 1 million blocks of 10 kilobytes each, while libpool took ~0.75 seconds (3 times faster). See also the README of the repository for the benchmark graphs. Credits to kragen for his informative replies in HN.

We could allocate the Pool and the chunk array with a single call, but I think this would make it less readable. Furthermore, since the Pool structure is small, we could even return it directly on the stack, instead allocating it and returning a pointer.

In this case, we store the “absolute address” of the adjacent chunk, but we could use less space by storing an offset relative to the start of the chunk array. This is not a big deal, though, since we will probably want to use the pool allocator with chunks bigger than a pointer in the first place.

⁴

An example of this method, which I wrote before I noticed the second problem, can be seen on commit bb0c8a2 of my libpool repository. That code doesn’t use Chunk unions, so the casts make it less readable.

⁵

If the pool (i.e. the anchor address) was registered with the is_zeroed argument set, valgrind will mark the chunk as DEFINED, instead of UNDEFINED.