The design of a simple bootloader

1. Introduction
2. Startup and BIOS
3. Stage 1
4. Stage 2
5. Building and debugging the disk image

1. Introduction

This article explains the code and overall design of 8dcc’s bootloader, which was originally designed for the NAOS kernel. This tiny bootloader, written from scratch, aims to be compliant with the Multiboot 1 protocol. This way, since the kernel is also compliant with this protocol, the kernel and the bootloader should work with any other component that follows the Multiboot 1 standard.

A bootloader is a computer program responsible for booting the operating system. The main responsibilities of the bootloader are:

Obtain information about the system. This information usually needs to be provided to the kernel somehow.
Switch to the specific environment that is expected by the kernel. This normally includes switching to Protected Mode (i.e. 32-bit mode) and enabling the A20 line.
Load the kernel into memory, and jump to it.

This bootloader is divided into two stages, Stage 1 and Stage 2. This division is mainly needed because of size restrictions of the Stage 1 binary, as explained below.

2. Startup and BIOS

When the machine is turned on, the CPU immediately starts execution at the BIOS program, which usually stored in some Read-Only Memory (ROM). This program performs basic hardware initialization, before transferring control to a bootable device.

To find a valid bootable device, the BIOS looks for bytes 0x55 and 0xAA in offsets 510 and 511 of each possible device. The order in which the BIOS searches for bootable devices (called the boot sequence) is stored in the CMOS. If the BIOS doesn’t find a valid bootable device, it will show an error and halt.

Once the BIOS has found a valid bootable device¹, it will load its first 512 bytes into the physical address 0x7C00, and jump there, executing the instructions it just loaded. In our case, the first 510 bytes of the device can be used for the Stage 1 bootloader, which will then load the Stage 2 binary.

3. Stage 1

As explained above, the BIOS only loads the first 512 bytes of the bootable device, so there isn’t a lot of space for the bootloader to do everything that it needs to do. For this reason, the bootloader is divided in two stages; the main purpose of Stage 1 is to load the Stage 2 binary, usually located in the same device where the Stage 1 is, and jump to it.

The Stage 2 binary is usually stored in the device using some standard file system, like FAT32 or ext4, so the Stage 1 code should be able to at least locate the file, and read the sectors where the file contents are stored. Furthermore, since the Stage 1 code must be stored in the first 512 bytes, and many file systems also use this region for storing data structures, the actual space left for the code (including things like strings for error messages) could be closer to 450 bytes.

In the NAOS bootloader, the Stage 2 binary is stored in the root directory of a FAT12 file system as stage2.bin². The data structure used for storing the volume information on FAT file systems is called the Extended BIOS Parameter Block (EBPB), and its size and elements change depending on the FAT version. For FAT12 and FAT16, this structure is the DOS 4.0 EBPB, which is 51 bytes wide and should be placed at offset 0xB of the disk.

The first 2 bytes of the disk are a short jump to the actual entry point of the bootloader, right after the EBPB.

3.1. Register initialization

The very first thing that the bootloader does is setting up the Data Segment (DS), Extra Segment (ES) and Stack Segment (SS) registers to 0x0000. Since the MOV instruction doesn’t support assigning immediate values to segment registers, the AX register is first cleared and then used to set those segment registers.

Next, the Stack Pointer (SP) is set to 0x7C00, the physical address where the BIOS loaded the Stage 1 binary; since the stack grows downwards, the memory region right below this address can be used for the stack³. At first glance, this address might seem incorrect, since the first PUSH would overwrite the first 2 bytes of our Stage 1 binary; this is incorrect, because the PUSH instruction decreases the Stack Pointer before writing the pushed value to the address at SS:SP (see Intel SDM, Vol. 1, p. 1235).

Next, a far jump is performed to ensure that the Code Segment (CS) and Instruction Pointer (IP) are set to 0000:7C00⁴, rather than 07C0:0000, which is used by some BIOSes.

3.2. Reading disk information with the BIOS

Although the EBPB is defined in that same Stage 1 binary with some basic information, it’s better to ask the BIOS for the actual disk information. This can be done by using the disk BIOS interrupt (0x13) with the AH register set to 0x08.

    ; Set Carry Flag (CF), set AH to "Read Drive Parameters", and
    ; call the "Disk" BIOS interrupt.
    stc
    mov     ah, 0x8
    int     0x13

    ; Jump if the Carry Flag was cleared by the BIOS.
    jnc     .success

.error:
    ; ...

.success:
    ; Read relevant values, mainly from DH and CX.

Specifically, the bootloader writes the Sectors per track and Head count values returned by the BIOS into the EBPB.

3.3. Loading the Stage 2 binary

In order for the Stage 1 to load the Stage 2 binary, it needs to find it first. Specifically, it needs to find the directory entry of the Stage 2 binary by traversing the FAT12 root directory, and then obtain the first cluster index where the actual contents of the Stage 2 file are stored.

Then, after knowing that first cluster number, it traverses the linked list of cluster indexes that is stored in the File Allocation Table (FAT), reading each cluster into memory.

If the reader is interested in more information about the FAT file system, and how this part should be implemented, see my Understanding the FAT file system article. However, it’s worth noting that the actual operation for reading from the disk is performed using the disk BIOS interrupt (0x13) with the AH register set to 0x02.

3.4. Jumping to the Stage 2 code

Once all the clusters of the Stage 2 binary have been read, the Stage 1 binary jumps to the address where it was loaded, using a far jump. Since the Stage 2 binary was loaded into the address at ES:BX, the bootloader should be able to just jump there.

; NOTE: Invalid.
jmp     es:bx

However, there isn’t a JMP instruction that allows the programmer to do a far jump to a segment and offset contained in registers. However, it allows the programmer to specify a pointer to a 32-bit memory location where the segment and offset are specified.

my_addr: resw 2

mov     word [my_addr + 0], bx
mov     word [my_addr + 2], es
jmp     far [my_addr]

However, this is not the best method, since the opcodes for these instructions take up many bytes, and 4 extra bytes are needed for the buffer. Alternatively, one can use two PUSH instructions and a far RET to accomplish the same thing, without using an intermediate buffer, and with shorter instructions.

push    es
push    bx
retf            ; Alternatively: RET FAR

The far jump method used a total of 16 bytes, while the far return method used only 3. This wouldn’t make much difference in a normal binary, but these extra 13 bytes can become really useful as the Stage 1 binary grows.

Note that, as mentioned, the jump is made to the first byte of the Stage 2 binary, not to the entry point of an ELF file, so the Stage 2 binary must be built with this in mind.

4. Stage 2

The Stage 2 binary is a flat binary (i.e. it is not an ELF file) located in the root directory of the FAT12 file system of the Stage 1. One of the main goals of Stage 1, because of its binary size limitations, is to search for this Stage 2 binary, load it into memory, and jump to it.

Therefore, the Stage 1 code should know where to load the Stage 2 binary, and the Stage 2 code should know the address where it’s going to be loaded. This consensus is achieved through two STAGE2_ADDR macros, defined in two different files, but that must match. The first one is defined in bootloader/src/include/boot_config.asm (used by Stage 1) and the other in bootloader/linker/boot_config.ld (used when linking Stage 2).

Once the Stage 2 binary is loaded, it can perform all of the bootloader initialization without worrying about size limitations. First, the Stage 2 shows an information message using the BIOS I/O functions, and then it tries to enable the A20 line.

4.1. Enabling the A20 line

The A20 line, which is disabled by default, limits the addressable memory to 1 MiB, and should be enabled by the bootloader before transferring control to the kernel, or simply for switching to protected mode. In order to understand what the A20 line is, and how it can be enabled, it’s important to know a bit of processor history, starting with how segmentation works in 16-bit real mode.

The Intel 8086 processor had 20 address lines, numbered A0 to A19; with these, the processor could access 2²⁰ bytes, or 1 MiB. Internal address registers of this processor were 16 bits wide. To access a 20-bit address space, an external memory reference was made up of a 16-bit offset address added to a 16-bit segment number⁵, shifted 4 bits to the left so as to produce a 20-bit physical address.

The following code shows how the real address would be calculated from a segment and an offset.

; Set data segment (DS) through intermediate register (AX).
mov     ax, 0x13A5
mov     ds, ax

; Write offset to the source index (SI), since not all registers can
; be used for addressing.
mov     si, 0x3327

;   13A5   (Segment: DS)
; +  3327  (Offset: SI)
; -------
;   16D77  (Address)
mov     ax, [ds:si]

Another important detail about this old processors is that, since they only had 20 address lines, addresses over 1 MiB caused the actual address to wrap around. For example, F800:8000, which should translate physical address 0x00100000, actually translates to address 0x00000000, since the 21st bit is discarded.

As processors evolved, starting from the Intel 80286, they were able to address more than 1 MiB of memory. However, for backwards compatibility, they were still supposed to emulate the behavior of a 8086 processor when booting up, which meant that they had to force this wrap-around behavior, since some programs depended on this. To control this wrap-around behavior, a logic gate was inserted in the A20 line between the processor and system bus, which got named Gate-A20.

This logic gate was supposed to be controlled from software, originally through the Intel 8042 keyboard controller. Since then, other more efficient methods are available, but they might not all work, so it’s best to try as many of them as possible. Without getting into much detail, these are the methods used in the bootloader, starting with the most likely to work:

Check if the A20 line was already enabled. This is done by comparing a known value at some address with the value located 1 MiB higher; if they match, it’s safe to assume that it wrapped around, so the A20 line is disabled.
Try to enable it through the BIOS. This is done through BIOS interrupt 0x15.
Try to enable it through the original keyboard method.

If the bootloader can’t enable the A20 line, it shows an error message and stops.

4.2. Loading the GDT

Before switching to protected mode, the Global Descriptor Table (GDT) has to be loaded.

4.3. Switching to protected mode

Before transferring access to the kernel, the bootloader has to switch to protected mode.

5. Building and debugging the disk image

The build process of the disk image has a few steps that are worth mentioning here. The target of the build process is to obtain a file that can be flashed into a device, making it bootable with our bootloader.

First, the assembly sources are assembled using nasm into a 32-bit ELF object file. This object file is then linked into a flat binary using an i686 cross-compiler, using an appropriate linker script. Furthermore, the ELF object file produced by NASM can be linked into an ELF binary, which can be used for debugging, as shown below. The following diagram explains the build process of an assembly file.

Once the Stage 1 and Stage 2 binaries are built, they can be inserted into the filesystem of the final image. First, an empty image is created with dd, and then the filesystem itself is created with mkfs.fat. Then, the mcopy (from the mtools package) is used to copy the files into the image that was just created.

To make the device bootable, however, we need to make sure that the Stage 1 image is placed in the first sector of the image. Some of the FAT12 filesystem information, such as the Bios Parameter Block (BPB), is also located on this sector, though, so we will only copy certain chunks of the Stage 1 binary. To do this, the copy-fat12-boot.sh script is used, which just calls dd with some special flags. For more information on the FAT12 file system, check out my Understanding the FAT file system article.

For debugging the image, as noted above, the ELF binaries can be used. They can both be loaded into GDB with the add-symbol-file commands, or used with many tools like addr2line. For more information on debugging, check out the bootloader repository’s, linked above.

Footnotes:

Actually, the BIOS starts by loading the first 512 bytes, and then checks for the boot signature.

Since FAT12 uses the 8.3 filename convention, the actual stored name, the one that the Stage 1 should look for, is STAGE2 BIN.

Keep in mind that the free memory region before the Stage 1 binary usually goes from physical address 0x0500 to 0x7BFF, and going below that 0x0500 address would overwrite the BIOS Data Area (BDA). See the OSDev wiki for more information.

⁴

This address is meant to illustrate the difference between the two main possible values set by the BIOS, but the bootloader jumps to the adjacent instruction, which would be at an offset like 0x7C46.

⁵

For more information on 16-bit segmentation, see this article by Gary Burt.