Introduction to network programming in C

1. Introduction
2. Network-related concepts
3. Getting address information
- 3.1. Usage for getaddrinfo
- 3.2. Example code for getaddrinfo
4. Communicating through TCP

1. Introduction

This article is meant to be a quick guide/reference for C programmers who are interested in network programming on Unix-like systems. The code in this article has been tested on Linux 6.11.6.

At the time of writing this article, I am somewhat new to network programming, so if you have any suggestions, please feel free to contribute to this page.

If you are interested on reading more about network programming, I recommend reading Beej’s Guide to Network Programming.

2.1. Transport protocol

The transport layer protocol provides end-to-end connection between a source and a destination computer, and it is responsible for features such as reliability, flow control or multiplexing, for example. The transport layer is the 4th layer in the OSI model.

There are two¹ main transport protocols: Transmission Control Protocol (TCP)² and User Datagram Protocol (UDP)³.

TCP is connection-oriented, meaning that the sender and receiver need to establish a connection before communicating. The server must be listening for connection requests from clients before a connection is established.

UDP is a connectionless protocol, meaning that messages are sent without establishing a connection, and that UDP doesn’t keep track of what it has sent. UDP is suitable for purposes where error checking and correction are either not necessary or are performed in the application.

2.2. IP address

An Internet Protocol address is a numerical label that is assigned to a device connected to a computer network that uses the Internet Protocol for communication.

Internet Protocol version 4 (IPv4) was the first standalone specification for the IP address, and it defined the addresses as 32-bit numbers such as 192.168.2.123. As the internet grew, Internet Protocol version 6 (IPv6) started being used, which uses uses 128-bit addresses such as 2001:db8::8a2e:370:7334⁴.

2.3. Port number

In networking, a port is a number assigned to identify a connection endpoint; and usually, to identify a specific service in that endpoint. A port is always associated with a network address (such as an IP address) and the type of transport protocol used (such as TCP or UDP).

For example, when connecting to another computer through SSH, the client connects to port 22 in the server. The client uses that port number because it knows that the SSH server will be listening for connections on that specific port by convention.

2.4. Socket

A socket, more specifically a socket descriptor, is essentially a connection identifier used by the operating system for sending and receiving data. The term socket is commonly used in the networking context, but it’s also used for inter-process communication (IPC).

Sockets are created with the socket(2) function, which will be discussed bellow. The returned socket descriptor is essentially a file descriptor, just like the ones returned by open(2).

A socket address is used to externally identify sockets in other computers. The socket address contains the transport protocol, the IP address, and the port number.

3. Getting address information

Before establishing a connection, we need to create a socket. As I mentioned above, the operating system uses socket descriptors for identifying connections and transmitting data. Sockets are created with the socket(2) function.

#include <sys/types.h>
#include <sys/socket.h>

int socket(int domain, int type, int protocol);

We could call socket with values such as PF_INET⁵, SOCK_STREAM and IPPROTO_IP. However, there is a cleaner way of obtaining the information that is used when making most of these networking calls: using the getaddrinfo(3) function.

3.1. Usage for `getaddrinfo`

The getaddrinfo function fills a linked list of addrinfo structures based on its arguments.

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

int getaddrinfo(const char* node,
                const char* service,
                const struct addrinfo* hints,
                struct addrinfo** res);

Here is a brief description of each parameter:

The node parameter is used to specify the target host. This is usually an IPv4 or IPv6 address⁶, but it can also be network hostname and it will be looked up and resolved. It can also be NULL, as we will see when doing a passive open below.
The service parameter is a string used to specify the target service. The string usually contains the target port as a decimal number, but it can also be a service name (such as “ftp” or “http”) which will be translated to the port number according to the services(5) file.
The hints parameter is an addrinfo structure containing some hints about the type of information we want to receive. Note that unused members this hints structure must be set to zero, so a call to memset is convenient after the definition.
The res parameter is a pointer to another addrinfo pointer, and the function will use it to build a linked list of addrinfo structures. The pointer that res points to should be freed by the caller with the freeaddrinfo function.

The getaddrinfo function returns 0 on success, or non-zero on error. The error codes returned by this function can be converted to a human-readable string with gai_strerror. The linked filled by getaddrinfo (the last argument) must be freed by the caller using freeaddrinfo.

Different members of the addrinfo will be used throughout this article, so here is the structure definition from <netdb.h>:

#include <sys/socket.h>

struct addrinfo {
    int ai_flags;             /* Input flags */
    int ai_family;            /* Protocol family for socket */
    int ai_socktype;          /* Socket type */
    int ai_protocol;          /* Protocol for socket */
    socklen_t ai_addrlen;     /* Length of socket address */
    struct sockaddr* ai_addr; /* Socket address for socket */
    char* ai_canonname;       /* Canonical name for service location */
    struct addrinfo* ai_next; /* Pointer to next in list */
};

The sockaddr structure is defined in <sys/socket.h, contains useful information about the socket address. However, since its members are a bit abstract, this sockaddr structure is usually casted to a sockaddr_in or sockaddr_in6 structure (depending on whether it’s an IPv4 or IPv6 address, respectively), both defined in <netinet/in.h>⁷.

#include <netinet/in.h>

struct sockaddr_in {
    sa_family_t     sin_family;     /* AF_INET */
    in_port_t       sin_port;       /* Port number */
    struct in_addr  sin_addr;       /* IPv4 address */
};

struct sockaddr_in6 {
    sa_family_t     sin6_family;    /* AF_INET6 */
    in_port_t       sin6_port;      /* Port number */
    uint32_t        sin6_flowinfo;  /* IPv6 flow info */
    struct in6_addr sin6_addr;      /* IPv6 address */
    uint32_t        sin6_scope_id;  /* Set of interfaces for a scope */
};

struct in_addr {
    in_addr_t s_addr;
};

struct in6_addr {
    uint8_t   s6_addr[16];
};

typedef uint32_t in_addr_t;
typedef uint16_t in_port_t;

3.2. Example code for `getaddrinfo`

The following example shows a call to getaddrinfo, although more specific examples will be shown below. Remember to check the value returned by getaddrinfo, and to free the linked list of addrinfo structures with freeaddrinfo after you are done using it.

struct addrinfo hints;
memset(&hints, 0, sizeof(hints));
hints.ai_family   = AF_INET;     /* IPv4 */
hints.ai_socktype = SOCK_STREAM; /* TCP */

struct addrinfo* server_info;
const int status = getaddrinfo(ip, port, &hints, &server_info);
if (status != 0) {
    fprintf(stderr, "Error: %s\n", gai_strerror(status));
    abort();
}

/* ... */

freeaddrinfo(server_info);

We can then use the members of the filled server_info to create the socket. Remember to check the value returned by socket, and to close the socket descriptor after you are done using it.

const int sockfd = socket(server_info->ai_family,
                          server_info->ai_socktype,
                          server_info->ai_protocol);
if (sockfd < 0) {
    fprintf(stderr, "Could not create socket: %s\n", strerror(errno));
    abort();
}

/* ... */

close(sockfd);

4. Communicating through TCP

To communicate data through TCP, we need to either listen and accept incoming connections (a passive open), or establish a connection to another computer on a listening port (an active open).

4.1. Connecting with a passive open

These are the general steps for establishing a connection through a passive open:

Obtain a socket descriptor, used for listening.
Bind a local port to the socket descriptor.
Start to listen on that socket descriptor.
Wait for connections, and accept them.

4.1.1. Getting our address information

We know how to obtain information about an external address (using getaddinfo), but we will also need to obtain information about ourselves before creating the socket. We need to make two small changes when making the call:

Set hints.ai_flags to AI_PASSIVE.
Pass NULL as the first (node) parameter of getaddrinfo.

From the getaddinfo(3) man page:

If the AI_PASSIVE flag is specified in hints.ai_flags, and node is NULL, then the returned socket addresses will be suitable for bind(2)ing a socket that will accept(2) connections.

It’s important to note that the second argument when calling getaddrifo will determine the port that we will use when listening, and therefore the port that the peer will have to use when connecting to us (i.e. when doing an active open). Note that all ports below 1024 are reserved⁸ for the system, so you should use a number in the range [1025..65535] (inclusive), and it should not be in use by another program.

This is the new code for obtaining our address information. In this case, the addrinfo structure filled by getaddrinfo will refer to the port 4321 of our machine.

struct addrinfo hints;
memset(&hints, 0, sizeof(hints));
hints.ai_family   = AF_INET;
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags    = AI_PASSIVE; /* New */

struct addrinfo* self_info;
const int status = getaddrinfo(NULL, "4321", &hints, &self_info); /* Updated */
if (status != 0) {
    fprintf(stderr, "Could not obtaining our address info: %s\n",
            gai_strerror(status));
    abort();
}

4.1.2. Creating the passive socket

The socket(2) function returns a socket descriptor from the specified domain (e.g. IPv4 or IPv6), socket type (e.g. TCP or UDP) and protocol (e.g. IP).

#include <sys/types.h>
#include <sys/socket.h>

int socket(int domain, int type, int protocol);

On error, -1 is returned and errno is set. If the returned socket is valid, it must be closed by the caller using close(2).

Now that self_info contains information about the current machine, we can call socket just like we did before.

const int sockfd_listen = socket(self_info->ai_family,
                                 self_info->ai_socktype,
                                 self_info->ai_protocol);
if (sockfd_listen < 0) {
    fprintf(stderr, "Could not create socket: %s\n", strerror(errno));
    abort();
}

That sockfd_listen variable will be used for the process of accepting connections, not for transmitting data after the connection is established. This is normally referred to as a passive socket.

4.1.3. Binding the socket address

Next, we need to bind the socket address (IP address, port and protocol) to the socket descriptor we just created. This can be done with the bind(2) function.

#include <sys/types.h>
#include <sys/socket.h>

int bind(int sockfd, const struct sockaddr* addr, socklen_t addrlen);

The bind function returns zero on success, or -1 on error, setting errno appropriately. We could create our own sockaddr structure, but getaddrinfo already filled one for us, so we should use that.

const int status = bind(sockfd_listen,
                        self_info->ai_addr,
                        self_info->ai_addrlen);
if (status != 0) {
    fprintf(stderr, "Could not bind to socket descriptor: %s\n",
            strerror(errno));
    abort();
}

4.1.4. Listening for connections

After binding the socket address, we can start listening for connections. We do this with the listen(2) function.

#include <sys/types.h>
#include <sys/socket.h>

int listen(int sockfd, int backlog);

The first parameter is the passive socket we created earlier, and the second parameter is the maximum length to which the queue of pending connections for sockfd may grow⁹. The listen function returns zero on success, or -1 on error, setting errno appropriately.

const int status = listen(sockfd_listen, 10);
if (status != 0) {
    fprintf(stderr, "Could not listen for connections: %s\n", strerror(errno));
    abort();
}

Now the system is listening for connections on the port we specified when calling getaddrinfo (in this case 4321), and it will queue incoming connections until we accept them.

4.1.5. Accepting connections

Once we encounter an incoming connection, we can accept it using the accept(2) function.

#include <sys/types.h>
#include <sys/socket.h>

int accept(int sockfd, struct sockaddr* addr, socklen_t* addrlen);

The first parameter of accept is the passive socket we created with socket(2) above. The other two parameters are used to retrieve information about the computer that is connecting to us, but they can be set to NULL if we don’t care about this information.

The accept function returns a new socket descriptor used for sending and receiving data in the accepted connection. On error, it returns -1 and sets errno.

const int sockfd_connection = accept(sockfd_listen, NULL, NULL);
if (sockfd_connection < 0) {
    fprintf(stderr, "Could not accept incoming connection: %s\n",
            strerror(errno));
    abort();
}

After the connection is accepted, we can send and receive data from the peer using the returned socket descriptor.

4.1.6. Cleaning up

After we are done sending and/or receiving data from that connection, we need to close it.

close(sockfd_connection);

And after we are done with all connections, we can stop listening by closing the first socket descriptor. Don’t forget to also free the linked list of addrinfo structures by calling freeaddrinfo.

close(sockfd_listen);
freeaddrinfo(self_info);

4.2. Connecting with an active open

These are the general steps for establishing a connection through an active open:

Obtain a socket descriptor with the server information.
Connect to the server.

4.2.1. Getting the server information

This is essentially the same code that was shown before, but now we point to a specific IP address and port.

struct addrinfo hints;
memset(&hints, 0, sizeof(hints));
hints.ai_family   = AF_INET;     /* IPv4 */
hints.ai_socktype = SOCK_STREAM; /* TCP */

struct addrinfo* server_info;
const int status = getaddrinfo("192.168.2.123", "4321", &hints, &server_info);
if (status != 0) {
    fprintf(stderr, "Could not obtain address info: %s\n", gai_strerror(status));
    abort();
}

4.2.2. Creating the socket

Now we have to create a socket, just like we did for the passive open. In this case, however, we will only need a single socket for connecting and communicating.

const int sockfd = socket(server_info->ai_family,
                          server_info->ai_socktype,
                          server_info->ai_protocol);
if (sockfd < 0) {
    fprintf(stderr, "Could not create socket: %s\n", strerror(errno));
    abort();
}

As you can probably tell, the nice part of using getaddrinfo is that we can obtain most of the important information from there, so both calls to socket are made using the same addrinfo members.

4.2.3. Connecting to the server

After creating the socket with the information about the server, we have to connect to it. For that, we use the connect(2) function.

#include <sys/types.h>
#include <sys/socket.h>

int connect(int sockfd, const struct sockaddr* addr, socklen_t addrlen);

The connect function expects the socket descriptor we just created, and a sockaddr structure with the information about the server. Fortunately, getaddrinfo also filled two ai_addr and ai_addrlen members for this. The connect function returns zero on success, or -1 on error, setting errno appropriately.

const int status = connect(sockfd,
                           server_info->ai_addr,
                           server_info->ai_addrlen);
if (status != 0) {
    fprintf(stderr, "Connection error: %s\n", strerror(errno));
    abort();
}

Once we are connected, we can send and receive data from the server.

4.2.4. Cleaning up

After we are done sending and/or receiving data from that connection, we need to close it. Don’t forget to also free the linked list of addrinfo structures by calling freeaddrinfo.

close(sockfd);
freeaddrinfo(server_info);

4.3. Sending and receiving data through sockets

Once a connection has been established, we can send and receive data through its socket descriptor. Most functions that operate on file descriptors, like read(2) and write(2), can work with socket descriptors too. However, it’s better to use recv(2) and send(2), even if we don’t specify any flags.

#include <sys/types.h>
#include <sys/socket.h>

ssize_t recv(int sockfd, void* buf, size_t len, int flags);
ssize_t send(int sockfd, const void* buf, size_t len, int flags);

The recv function is used to receive data into the specified buffer of size len. The send function is used to send a buffer of the specified size len.

The recv function returns:

The number of bytes received, when the call was successful.
Zero, when a stream socket peer has performed an orderly shutdown (the “end-of-file” indicator).
Negative one (-1), if an error occurred. The errno variable is also set.

The send function returns:

The number of bytes sent, when the call was successful.
Negative one (-1), if an error occurred. The errno variable is also set.

Note how both functions return a signed size type (ssize_t), defined in the stddef.h header.

4.3.1. Example code

When receiving data, we should check for errors and end-of-file indicators.

#include <stddef.h>  /* ssize_t */
#include <stdio.h>   /* fprintf() */
#include <string.h>  /* strerror() */
#include <stdlib.h>  /* abort() */

#define BUF_SZ 100 /* Arbitrary size */

char buf[BUF_SZ];
for (;;) {
    const ssize_t received = recv(sockfd_connection, buf, sizeof(buf), 0);

    /* Error */
    if (received < 0) {
        fprintf(stderr, "Receive error: %s\n", strerror(errno));
        abort();
    }

    /* End of file */
    if (received == 0)
        break;

    /* TODO: Handle data in `buf' */
}

When sending data, we should check if the function really sent all the bytes. For example, the following send_data function keeps trying to send data until all the buffer is sent:

#include <stdbool.h> /* bool */
#include <stddef.h>  /* size_t, ssize_t */
#include <stdio.h>   /* fprintf() */
#include <string.h>  /* strerror() */
#include <stdlib.h>  /* abort() */

#include <sys/types.h>
#include <sys/socket.h> /* send() */

bool send_data(int sockfd, void* data, size_t data_sz) {
    size_t total_sent = 0;

    while (data_sz > 0) {
        const ssize_t sent =
            send(sockfd, &((char*)data)[total_sent], data_sz, 0);
        if (sent < 0)
            return false;

        total_sent += sent;
        data_sz -= sent;
    }

    return true;
}

/* Calling the function */
if (!send_data(sockfd, buf, buf_pos)) {
    fprintf(stderr, "Send error: %s\n", strerror(errno));
    abort();
}

Footnotes:

Note that these are not the only existing transport protocols. Some other examples include the Datagram Congestion Control Protocol (DCCP) and the Stream Control Transmission Protocol (SCTP).

See RFC 793.

See RCC 768.

⁴

When one of the colon-separated numbers is zero, it can be omited. Therefore, the “expanded” version of that IPv6 address is 2001:0db8:0000:0000:0000:8a2e:0370:7334.

⁵

The PF prefix stands for Protocol Family, whereas AF stands for Address Family. In practise, AF_INET and PF_INET have the same value.

⁶

The IPv4 and IPv6 formats are valid acording to inet_aton(3) and inet_pton(3), respectively.

⁷

More specifically, the sockaddr structure from <sys/socket.h> contains only a sa_family_t member and a char data[] array. Based on the sa_family member, we can decide which sockaddr_in* structure we should use, since they provide a nicer interface.

⁸

See also Registered port (Wikipedia) and List of TCP and UDP port numbers (Wikipedia).

⁹

A value of 5 or 10 for the backlog argument is fine. The system silently truncates the argument to the value in /proc/sys/net/core/somaxconn. Since Linux 5.4, the default in this file is 4096; in earlier kernels, the default value is 128.