⚡ HTTP/0.9 to HTTP/3: Why the Web Needed an Upgrade

In the previous post on computer networks, we explored network and transport layer protocols. But for most developers, it’s the application layer (Layer 5 - 7 of the OSI framework) that has the most practical relevance. At the application layer, we have protocols (a language contract) defining how applications talk with each other, even when they are running on machines oceans apart.

The Hypertext Transfer Protocol (HTTP) is a stateless, application-level protocol that has become the quiet workhorse powering everything we do on the internet. It is the language that your browser uses to talk with websites. Let’s understand this invisible thread tying the internet together.

HTTP/0.9 (1991)

Tim Berners-Lee, an English computer scientist, developed the first version of HTTP, now called HTTP/0.9. It ran over TCP/IP protocol and was extremely minimal by design. What made it extremely minimal?

Requests and responses had no headers (metadata about the message)
Requests were single line and only GET method was allowed
Only raw HTML files could be fetched from a server (no images, no buttons, no CSS)
No status codes would be returned. If something went wrong, the server might return an error message as raw HTML, but there was no formal way to tell the client what happened

In HTTP/0.9, a request looked as simple as GET /index.html. Despite being very basic in design, it laid down the foundations for everything that followed.

HTTP/1.0 (1996)

HTTP/1.0 built on the primitive design of HTTP/0.9 and brought the following upgrades:

POST and HEAD methods were added:
- POST allowed the client to send form data and create new resources
- HEAD (similar to GET) could be used to retrieve just the response headers without the actual resource
Headers (metadata) could now be added to each request and response. This made the web much more flexible. For example:
- Thanks to the Content-Type header, servers could now send documents other than plain HTML, such as plain text withtext/plain, images with image/png or image/jpeg, binary data with application/octet-stream, and more
- The Expires header (which tells the client when the returned resource will go stale) and Last-Modified header (which tells the client when a resource was last modified) allowed clients to do basic caching. For example, a client could now send a HEAD request to get the headers of a resource and check Last-Modified to know if the resource has changed since the last fetch
- Servers could now provide the number of bytes in the response body with the Content-Length header. This was important for reading responses correctly, especially for binary data like images
Status codes were sent at the beginning of the response, making it easier for clients to adapt their behavior based on whether requests succeeded or failed
HTTP/1.0 added version info to the request line, e.g., GET /index.html HTTP/1.0, to maintain backward compatibility, and pave the way for future upgrades

Though it advanced HTTP/0.9 quite a lot, HTTP/1.0 still had its limitations:

No Persistent Connections: Every request required a new TCP connection to the server. Thus, if an HTML file referenced 5 images, the browser would open 5 separate TCP connections to the server to fetch them all. This wasn’t efficient, especially as the websites grew heavier and more complex
No Pipelining or Multiplexing: Only 1 request could be handled per TCP connection at a time
No Built-in Compression: All payloads were sent raw, which slowed things down when transferring large responses

HTTP/1.1 (1997)

With web exploding in its popularity, HTTP/1.1 was introduced to handle the growing demand. It became the backbone of the internet for nearly two decades and is still widely used today.

So, what made HTTP/1.1 such a big deal?

Persistent Connections

HTTP/1.1 made persistent connections the default (can be overriden with the Connection: close header in the request). Unlike HTTP/1.0, the browser no longer had to open a brand new TCP connection for every request - a single connection could be reused for multiple requests. Thus, all the resources referenced on a webpage could be downloaded over a single connection, resulting in fewer round trips, less overhead from repeatedly setting up and tearing down connections, and faster page loads. Persistent connections made the web feel noticeably snappier.

Chunked Transfer Encoding

Before HTTP/1.1, the server had to know the exact length of the response before sending it (so it could set the value of Content-Length in the response header). While this worked fine for static pages, it wasn’t efficient for dynamic content or streaming data where the final size isn’t known upfront. For example: streaming logs, live data from a DB, or any scenario where the response is built on the fly.

HTTP/1.1 introduced chunked transfer encoding, which lets the server start sending data immediately (as chunks) without knowing the final size ahead of time.

How it works:

The server sets the Transfer-Encoding: chunked header in the response
The response body is sent as a series of chunks. Each chunk is sent like:
```
 {chunk size in hex}\r\n
 {chunk data}\r\n
```
When all chunks have been sent, the server sends a chunk size of 0 followed by an empty line to indicate the end of response
```
 0\r\n
 \r\n
```

Modern applications use chunked encoding behind the scenes for things like server-sent events (SSE), streaming APIs, and real-time dashboards.

Better Caching Mechanisms

HTTP/1.1 allowed both, servers and clients, to have more control over how content is stored and reused, with headers like:

Cache-Control:
- In requests, this is used by browsers to control how aggressively they want to bypass cached data. For example: when you hard refresh a page, the browser may send Cache-Control: no-cache to force a fresh version
- In responses, the server sets the Cache-Control header to specify the caching behavior for the sent resource. For example: Cache-Control: max-age=1800 tells the browser that it can cache and use this resource for 30 minutes without checking back with the server
ETag: An Entity Tag is a unique identifier assigned by a server to a resource. The client can cache the resource with this ETag (returned in response to the client). Later, it can ask the server if the resource has changed using If-None-Match: {ETag-value} header. If the server’s current ETag for the resource matches the client’s, it responds with a 304 Not Modified status code, indicating that the client’s version is still valid

Better Error Handling

More status codes were introduced in HTTP/1.1, allowing servers to give more meaningful feedback to the clients. This made error handling and the client-server communication more precise.

For example: 206 Partial Content allows serving range requests (important for resumable downloads), 101 Switching Protocols signals the client that the server is switching protocols as requested (used in websockets), 410 Gone tells the client that the requested resource is permanently gone, and more.

Host Header

This small feature helped the internet scale massively. Before HTTP/1.1, a typical request looked like GET /index.html HTTP/1.0. Imagine you want to run multiple websites on a single machine: how would the server distinguish which website the request is for? This meant that one IP could only server one website.

With Host header, HTTP/1.1 allowed the client to specify different domains in the request, like this:

GET /index.html HTTP/1.1
Host: awesomeHTTP.com

This simple feature allowed many websites to live on the same server/IP (virtual hosting), preventing IP exhaustion and also made it possible for shared hosting providers (like GoDaddy) to offer cheap hosting plans.

Additional Methods

HTTP/1.1 expanded the set of HTTP methods beyond the basic trio from HTTP/1.0 (GET, POST, HEAD) to offer more control over resources:

Method	Purpose
`OPTIONS`	Asks the server which `HTTP` methods are supported for a specific resource
`PUT`	Replace or create resource
`DELETE`	Remove a resource
`TRACE`	Echo back the sent request; useful for diagnostics
`CONNECT`	Convert the connection into a tunnel (e.g., for TLS)

Limitations

HTTP/1.1 served us well for over two decades, but as websites got heavier and more interactive, its cracks started to show:

Even though HTTP/1.1 introduced persistent connections, it still handles one request at a time per TCP connection. Thus, the client has to wait for the previous request to be served before sending the next request. This means that if a request is slow, it will be blocking all the subsequent requests (head-of-line blocking at the application-layer level)
HTTP/1.1 sends headers as plain text without any built-in compression. On top of that, the same headers often get exchanged over and over on the same connection, which is wasteful
HTTP/1.1 uses TCP connections, which are relatively expensive to open and manage, wasting bandwidth unnecessarily

Many of these limitations directly inspired the development of HTTP/2 and HTTP/3.

HTTP/2 (2015)

By mid-2000s, it was apparent that HTTP/1.1 couldn’t sustain the growing weight and complexity of the internet. In response, Google introduced SPDY in 2009, an experimental protocol designed to make web browsing faster by tackling some shortcomings of HTTP/1.1. It served as a proof of concept and helped shape what eventually became HTTP/2.

HTTP/2 didn’t change the semantics of HTTP i.e. methods like GET, POST, and status codes like 200 OK still worked exactly the same. It’s contribution were mostly under the hood, focusing on making the protocol faster and efficient at the transport layer level.

Binary Framing Layer & Multiplexing

Instead of sending data in plain text, HTTP/2 opted for binary format, which is easier for machines to parse and process - it’s their native language, after all. At the core of this is the binary framing layer, a set of rules for how to chop up the data, label it, send it, and rebuild it on the other end. Let’s look at how it works.

In HTTP/2, all communications is done through frames. Each HTTP message (whether it’s a request or response) is split into multiple frames. Every frame consists of:

A 9 byte frame header
A variable-length frame payload

All the frames of a message are associated with the same Stream ID. Stream ID is best understood with an example:
Suppose a client sends 2 requests to a server. Each request gets chopped up into frames:

Request 1 (Stream ID = 1): frame #1, frame #2, frame #3 (all have Stream ID = 1)
Request 2 (Stream ID = 2): frame #4, frame #5 (all have Stream ID = 2)

Now, instead of sending all of Request 1 and then all of Request 2, the client can interleave frames on a single TCP connection like this:

Frame #1 → Frame #4 → Frame #2 → Frame #5 → Frame #3

Thus, the frames of both requests get sent together and the second request gets delivered even before the first request. Unlike HTTP/1.1, the second request didn’t have to wait for the first request to be responded to.

This ability to send multiple streams concurrently over the same connection, without blocking, is called multiplexing. Multiplexing helps HTTP/2 overcome the head-of-line blocking issue at the application layer.

Note: Multiplexing in HTTP/2 doesn’t do anything about the head-of-line blocking issue at the transport layer level. The server might still be waiting for a lost or corrputed TCP segment before it can assemble the frame and pass it over to the application.

Thus, Stream ID enables the client to interleave frames on a single TCP connection and also allows the server to assemble full messages from different frames it receives.

Each HTTP/2 frame has a 9-byte header, which includes:

Length: Size of the frame’s payload
Type: What kind of a frame it is ()HEADERS, DATA, etc)
Flags: Control bits like END_STREAM or END_HEADERS
Stream ID: The stream this frame belongs to

Commonly used frame types:

HEADERS frame (different from a frame’s header) carries the header of the request or the response
DATA frame carries the body of the request or the response

A typical message starts with one or more HEADERS frames followed by one or more DATA frames. The sender sets:

END_HEADERS in the last HEADERS frame
END_STREAM in the last frame (whether HEADERS or DATA) to indicate that its the last fame from the sender

Now, what’s a stream? It’s a virtual bidirectional channel within a TCP connection that carries a single request-response pair. Here’s how it works in practice:

Imagine you’re loading a webpage. Your browser sends a request for index.html. All frames of this request get assigned the same Stream ID. The last frame includes the END_STREAM flag to indicate the request is complete. Now the stream is half-closed
The server replies using the same Stream ID, with its own HEADERS and DATA frames. It also sets END_STREAM in the last frame of its response. Now the stream is fully closed

Thus, the request and response shared the same stream. A stream goes through several states:

Idle: Not used yet
Open: A request is being sent or received
Half-closed: Client is done sending, but the server isn’t
Closed: Both sides are done; no more frames can be sent on this stream

Stream Prioritization

HTTP/2 enables multiple streams to be active together. Imagine a web page:

HTML (index.html)
CSS (style.css)
JavaScript (app.js)
2 images (img1.jpg, img2.jpg)

Even though all those requests can be active at the same time with HTTP/2 multiplexing, we want:

HTML first (so the browser can parse the layout)
CSS and JS next (to render and interact)
Images last (they’re not critical for page structure)

Stream prioritization is a mechanism that lets the client tell the server which streams are more important, so the server can allocate its resources smarter and faster to serve those streams at a priority.

For example: The client can tell the server “Please send this HTML stream first, it’s more important than the rest”. The client can express this prioritization in two ways:

By attaching a priority section to the stream’s initial HEADERS frame
Or by sending a separate PRIORITY frame in the stream

This information (attached or sent separately) contains the following fields:

Stream dependency ID: Which stream is this one dependent on
Weight (value between 1 to 256): Relative importance among sibling streams; higher weights get more bandwidth
Exclusive Flag: Tells the server to prioritize only this stream among all its siblings

Imagine a client assigns the following stream priorities:

Stream 3 = HTML (no dependency)
Stream 5 = CSS (depends on 3, weight 200, exclusive=1)
Stream 7 = Image 1 (depends on 3, weight 40)
Stream 9 = Image 2 (depends on 3, weight 10)

This forms the following dependency tree:

Stream 3
└── Stream 5 (exclusive)
    ├── Stream 7
    └── Stream 9

Here’s how the server interprets this:

The server will process stream 3 first as everything else depends on it
Then stream 5 gets processed as exclusive = 1 makes it take all the bandwidth
Then stream 7 and 9 are processed parallely but stream 7 gets 80% of the bandwidth of server while stream 9 gets 20%

Thus, stream prioritization allows clients to control how content is delivered.

Header Compression

In HTTP/1.1, headers are sent as plain text every time, even if they’re identical across messages. This creates unnecessary overhead, especially for common headers like Accept, User-Agent, etc. HTTP/2 addresses this with HPACK, a header compression format designed to reduce bandwidth usage allowing faster message transmission.

Here’s how HPACK works: HPACK uses 2 shared tables, maintained independently by both the client and server:

Static Table: Holds commonly used values of headers, eg: method: GET, method: POST, scheme: HTTP/2, etc
Dynamic Table: Starts empty and gets populated with the new headers encountered during the communication

To reuse a header, the sender doesn’t need to send its full key-value pair. Instead, it can just reference the index of the header (from the shared header tables) in its message, drastically reducing the size of the header block. For example: 1st request:

:method: GET
:path: /
user-agent: MyBrowser/1.0

2nd request:

:method: GET
:path: /search
user-agent: MyBrowser/1.0

For the second request, the client will just have to send :path: /search along with the index references to :method: GET and user-agent: MyBrowser/1.0. The server looks up the referenced indexes in its copy of the headers tables and reconstructs the full headers, allowing for smaller and faster transmissions.

Server Push

HTTP/2 allows the server to proactively send resources to the client before the client explicitly asks for them. Without Server Push:

Client asks for index.html
Client parses index.html and encounters <link href="style.css">
Client makes another request for style.css

With Server Push, the server predicts that style.css will be needed:

The client sends a request GET /index.html on stream 1
The server responds with:
- The usual HEADER and DATA frames on stream 1 for index.html
- A PUSH_PROMISE frame (also on stream 1) saying: “Hey, I’m going to push style.css on stream 2”
The server then initiates stream 2 to send the HEADERS and DATA frames for style.css

Note:

The PUSH_PROMISE frame is sent on a client initiated stream while a new server initiated stream carries the pushed resource
Clients can reject pushed resources, either per stream or gloabally via a SETTINGS frame
The server can’t force the client to use the pushed content; the client may discard it, especially if it has the asset already cached (HTTP/2 server push is cache-aware and rejects the server sent assets if the client already has them cached)

While Server Push sounds like a great way to reduce latency on paper, it is rarely used today due to its complexity and poor results in real world usage. The complexity comes from the fact that it’s hard to predict what the client truly needs. Also, bandwidth is wasted if the client already has the resource cached. Due to its underwhelming performance, HTTP/3 removed Server Push entirely.

Limitations

Even though it brought significant improvements over HTTP/1.1, HTTP/2 has its limitations:

Still relies on TCP, which means it still suffers from head-of-line blocking at the transport layer
In high latency or lossy networks, the benefits of multiplexing and header compression can be negated by TCP’s retransmission behavior
Because it uses binary framing, HTTP/2 traffic isn’t human-readable. Unlike HTTP/1.1, you can’t simply inspect it with basic tools like telnet. Debugging now requires more specialized tools like Wireshark or browser developer tools
HTTP/2 uses HPACK, which maintains synchronized header tables on both client and server. While it reduces bandwidth usage, this shared state must be carefully managed. Desynchronization (due to lost packets or bugs) can cause errors that are hard to detect and recover from

HTTP/3 (2022)

HTTP/3 solved some of the biggest bottlenecks in HTTP/2 by moving away from TCP entirely and building on top of QUIC (see the last post for how QUIC works). Just like HTTP/2, HTTP/3 didn’t change any HTTP semantics as well. By using QUIC as the transport layer protocol:

HTTP/3 no longer suffers from head-of-line blocking at the transport layer level
QUIC allows faster connection setup between the client and the server
QUIC doesn’t use (IP address, port) to define a connection. Instead, it each connection is identied with a Connection ID (CID). This means if your device changes its network (switching from Wi-Fi to mobile data), which changes its IP addresses, the old connection between your device and any server it is connected to remains usable. The server won’t care that the IP address has changed as it now reles on the CID to idenify your device. Thus, QUIC connections can survive a network change
QUIC runs over UDP and is implemented in the user space and not the OS kernel space. This facilitates rapid development and deployment of updates and new features. For example, it gives browser vendors more freedom to iterate and deploy improvements faster
QUIC makes HTTP/3 more resilient to high latency and lossy networks

Usage Trends (2024)

Version	Global traffic share (Source)
HTTP/1.1	29.9%
HTTP/2	49.6%
HTTP/3	20.5%

That’s all for the HTTP story - from humble text based roots to multiplexed, compressed, and QUIC. And all of it, just to load cat pictures faster. Thanks for reading!

Tags: system-design