A Web Server from Scratch

Perhaps the most visible Internet protocol today is the HyperText Transfer Protocol (HTTP) — used by web browsers requesting resources from web servers. I figured improving my understanding of this protocol would help me when writing and troubleshooting web apps, configuring web proxies, and helping family and friends with their Internet woes. To this end, I was interested in taking a deeper dive into the HTTP specification and thought implementing a web server from scratch might be a good way to do that. I ended up using this web server as part of a Cocoa app to serve static websites for local development. The source is also available.

The Specifications

The Hypertext Transfer Protocol version 1.1 is described in a series of RFC's published by the Internet Engineering Task Force:

This project depends mainly on the first two: RFC 7230 and RFC 7231.

Concurrency

A requirement not just for most web servers, but for many types of server programs, is to be able to service a number of client connections at the same time. We could categorize common concurrency strategies into two broad groups: spinning off a process or thread to service each connection or handling actionable input from each connection as it arrives using an event loop.

Creating a separate process or thread of execution to handle each connection is conceptually simple and can scale well to use the large number of processor cores that are available in modern servers. There is some overhead in creating and having a large number of processes or threads, though some operating systems try to minimize this. It can also be mitigated by keeping a pool of processes or threads handy.

Handling all client connections in a single event loop can offer high performance and low memory usage. Scaling to multiple cores is often accomplished by simply running several copies of the server on the same machine. Programming in a way that makes good use of the event loop can be conceptually challenging or verbose, though in some cases this can be mitigated by the language or libraries used.

I think Grand Central Dispatch (also known as libdispatch) offers a nice hybrid between the two approaches. It provides tools for creating dispatch queues which hold tasks to be executed. It maintains a pool of threads to execute these tasks and scales the thread pool automatically to suit conditions and hardware on the host. Dispatch sources can be set up to automatically enqueue tasks to handle client requests. This combination provides event-loop performance that scales well on modern hardware and is fairly easy to reason about. This is the concurrency approach I chose for this project.

Listening for connections

Let's dive into the code. The class THFListener wraps a socket that will listen for new connections from clients. It is not designed to be specific to serving web requests, but should be usable for any network server. On initialization, it uses classic POSIX network API functions to create and configure the socket, bind it to an address and port, and start it listening.

During intialization, THFListener also sets up a dispatch queue and source. Each time a client tries to connect, THFListener will accept the connection, configure the new socket, and cause a given block to be executed with the new socket file descriptor as an argument. For our web server, this block will wire up the new socket to code that handles HTTP requests.

On deallocation, THFListener cancels the dispatch source. The dispatch source, in turn, closes the listening socket when it is canceled.

Reading and writing the client connection

The socket provided by THFListener for each new connection can be read to receive data from the client and written to send data to the client. THFSocket wraps this socket and provides the interface our HTTP implementation will use to communicate with the client. Like THFSocket, this class also is not meant to be specific to HTTP but should be usable for any network server.

On initialization, THFSocket sets up a dispatch queue and dispatch source for the socket. This dispatch source can be used to enqueue tasks to execute when data is read from the client or when data is finished being sent to the client. The dispatch source is configured to close the socket and notify THFSocket's delegate (our HTTP protocol code in this case) if the client disconnects or times out.

THFSocket has a buffer property. Initially, this property is not set. After wiring up the socket to the HTTP code, the block that handles new connections will initialize this property to an empty buffer. This will cause THFSocket to start reading data from the client.

When some data has been read from the client, THFSocket appends it to the buffer and notifies the delegate. Upon being notified of incoming data, the delegate may remove some bytes from the buffer. THFSocket will continue trying to read data from the client as long as there is room in the buffer, notifying the delegate every time it gets some new data.

THFSocket also provides a method for writing data to the client. It simply sends the data and handles any errors. If there are errors during reading or writing (typically caused by the client dropping the connection), the socket is closed and the delegate is notified.

HTTP overview

The essential back-and-forth of an HTTP session is that the client sends a request to the server and the server sends a response to the client. The client may stream several requests to the server before it sends a response or it may send a large request which the server will receive a little at a time. Both requests and responses have headers and each may also have a body. The body of a response is often typically an HTML document. A request might have a body when, for example, a form is being posted.

The block executed by THFListener for each new client connection that wraps the client socket in an instance of THFSocket also makes an instance of THFHTTPProtocol and wires it up as THFSocket's delegate. The THFHTTPProtocol instance holds a strong reference to the THFSocket instance to keep it allocated until the connection is wrapped up.

During initialization, THFHTTPProtocol starts a timeout timer. The timer holds a strong reference to the THFHTTPProtocol instance, creating a retain loop. This loop keeps both the timer and the protocol (as well as the socket) allocated until the loop is broken. THFHTTPProtocol breaks the retain loop and deallocates the timer (and itself and the socket) when the socket is closed. If a complete request is not received within the timeout, the socket is closed. This helps mitigate denial of service attacks like Slowloris.

As THFSocket notifies THFHTTPProtocol of incoming data from the client, the THFSocket buffer is checked to see if it contains a complete request header. According to the spec, a request header is separated from its body (if any) by a pair of carriage-return-and-line-feed sequences, so this is what THFHTTPProtocol looks for. If the THFSocket buffer fills up before a complete request header is received, an error response is sent to the client and the connection is closed. This is another mitigation against denial of service attacks which would fill up the server's memory with "shaggy dog stories" about the requests they would like to make.

Upon finding a complete request header in the THFSocket buffer, THFHTTPProtocol removes the request header data from the buffer and tries to intialize a THFHTTPRequest instance with it. If THFHTTPRequest cannot decode the request, an error response is sent to the client.

THFHTTPProtocol checks for some unsupported features in the request header, sending an error response if any are found. It also checks if the request header indicates that the client expects a "Continue" response and provides it if so. Some clients request a "Continue" response before sending the body of a request. This can save the client from sending a request body that the server will not accept because of an issue in the request header. If the server is not happy with the request header, it can send an error response instead of "Continue". "Continue" is (I think) the only case where a request can receive more than one response: after the request body is received, the server is expected to provide a final response. The HTTP spec is full of plot twists.

If the request header indicates that a request body should be expected, THFHTTPProtocol makes an instance of THFBody to hold it, then continues to read the number of bytes indicated in the header. The HTTP spec describes an optional method of "chunked encoding" that allows a body of indeterminate size to be sent in pieces, but I did not implement it here.

Once THFHTTPProtocol has both the request header and body (if any), it notifies its delegate. In the case of the static server example we're looking at, the delegate happens to be a window controller showing details about the directory it is serving. The delegate looks for the file specified in the request, tries to figure out what kind of file it is, and sends it to the client along with the file type information (in the content-type header).

In principle, the delegate could perform additional processing on certain file types. For example, if a file were requested whose name ended in ".php", it could first run the file through php-cgi and send the resulting output to the client. This would allow local development of simple PHP programs.

Inside an HTTP request header

As I mentioned, an HTTP request has a header and an optional body, and the header is terminated by a pair of CRLF sequences. The request header itself is made up of a request line and zero or more header fields. The request line and each header field is separated by a single CRLF sequence.

Each header field has a name and value. The name is separated from the value by a single colon character. The name is not case sensitive and may not contain any of what the spec refers to as "bad whitespace". The value may have what the spec calls "optional whitespace" before and after the value, but this is not considered part of the value.

THFHTTPRequest provides a class method for finding the length of a request header in a buffer (or finding that the buffer does not contain a complete request header). THFHTTPProtocol uses this method to check if a THFSocket buffer contains a complete request header and to remove it if so.

Upon intialization with raw request header data, THFHTTPRequest decodes the raw data as an ASCII string and splits it into lines on CRLF sequences. The first line is the request line and should have three parts separated by space characters: the protocol version, the request method, and the requested URI.

The protocol version should look like HTTP/x.y where x is the major version number and y is the minor version number. THFHTTPRequest checks only that the version string is well-formatted; THFHTTPProtocol checks that the version itself is supported before sending a response.

THFHTTPProtocol's delegate can check that the method is supported and that the URI is found. In this case, the method must be GET and the URI must refer to a file present in the served directory.

THFHTTPRequest decodes the request headers into a dictionary. THFHTTPProtocol checks a couple of these (like content-length) but mostly they're for the THFHTTPProtocol delegate to use as it sees fit. In this case it doesn't do much with them, but a more sophisticated server might use them for content negotiation, authentication, cache-control, and other weird and wonderful things.

The request body

A simple static server has little need to accept request bodies, but I built out basic functionality to receive and store a request body anyway. I stopped short of implementing chunked encoding, but the structure of the program should suit it if necessary.

While it is reasonable to limit the size of the request header to something that may reasonably be kept in memory, a request body may be of arbitrary size and may take a while to receive. For this reason, THFHTTPBody creates a temporary file during intialization. As body data is received, the append method is called on the THFHTTPBody instance to write the data to the temporary file. At any time, the length method may be called to get the length of the received data.

When a THFHTTPBody instance is deallocated, it closes and unlinks the temporary file. However, the file may still be left dangling if the program were to crash while THFHTTPBody instances were still allocated. So, THFHTTPBody also has a class initializer that cleans up any temporary files it finds that were created by a previous run of the program.

THFHTTPBody has a read-only path property which can be used to access the file that holds the request body data.

Inside an HTTP response

Like a request, an HTTP response has a header as well as an optional body and the header is terminated by a double CRLF sequence. Also similar to a request, the first line of a response header is a status line while the remaining lines (if any) are response header fields.

The status line in turn has three parts, much like the request line of a request: the protocol version, the status code, and a status description. The protocol version is formatted the same as in a request. The status code is a three-digit code typically selected from a list provided in the spec (though additional codes are defined in other RFC's or used informally). The status description is a free-form string.

Header fields have the same rules and formatting in both requests and responses.

THFHTTPResponse has properties for a status code, dictionary of header fields, and a response body. Its most important method is data, which encodes the response into a buffer ready to be sent to the client by THFSocket.

THFHTTPResponse provides a convenience initializer that can set up a response for a given NSError instance and status code. It constructs a response body containing the error's localized description and localized failure reason. THFHTTPProtocol uses this as a handy way to pass helpful error messages back to the client when it is not able to process a request.

The semantics of a few status codes prohibit the inclusion of a response body, so THFHTTPResponse checks for these before encoding the response. If a body is to be included, it adds or updates a content-length response header with the size of the body.

THFHTTPResponse then flattens the header field dictionary, joining the field names and values. Next, it assembles the status line and joins it together with the header field lines before encoding the whole string into ASCII bytes. If a body is to be sent, it is appended to the response header buffer before being returned.

Wrapping up

In this article we looked at some different concurrency strategies, the basics of the HyperText Transfer Protocol, and saw how libdispatch could be used to implement a web server with a hybrid concurrency strategy.

I hope that you've found this interesting and that it has helped shed some light on the Hypertext Transfer Protocol in particular as well as on strategies for concurrency and implementing network servers generally.

If this is the kind of thing you're into, you may enjoy my other work.

Aaron D. Parks
Parks Digital LLC
4784 Pine Hill Drive, Potterville, Michigan
support@parksdigital.com