In the previous post on computer networks, we explored network and transport layer protocols. But for most developers, it’s the application layer (Layer 5 - 7 of the OSI framework) that has the most practical relevance. At the application layer, we have protocols (a language contract) defining how applications talk with each other, even when they are running on machines oceans apart.
The Hypertext Transfer Protocol (HTTP) is a stateless, application-level protocol that has become the quiet workhorse powering everything we do on the internet. It is the language that your browser uses to talk with websites. Let’s understand this invisible thread tying the internet together.
HTTP/0.9 (1991)
Tim Berners-Lee, an English computer scientist, developed the first version of HTTP, now called HTTP/0.9. It ran over TCP/IP protocol and was extremely minimal by design. What made it extremely minimal?
- Requests and responses had no headers (metadata about the message)
- Requests were single line and only
GETmethod was allowed - Only raw HTML files could be fetched from a server (no images, no buttons, no CSS)
- No status codes would be returned. If something went wrong, the server might return an error message as raw HTML, but there was no formal way to tell the client what happened
In HTTP/0.9, a request looked as simple as GET /index.html. Despite being very basic in design, it laid down the foundations for everything that followed.
HTTP/1.0 (1996)
HTTP/1.0 built on the primitive design of HTTP/0.9 and brought the following upgrades:
POSTandHEADmethods were added:POSTallowed the client to send form data and create new resourcesHEAD(similar toGET) could be used to retrieve just the response headers without the actual resource
- Headers (metadata) could now be added to each request and response. This made the web much more flexible. For example:
- Thanks to the
Content-Typeheader, servers could now send documents other than plain HTML, such as plain text withtext/plain, images withimage/pngorimage/jpeg, binary data withapplication/octet-stream, and more - The
Expiresheader (which tells the client when the returned resource will go stale) andLast-Modifiedheader (which tells the client when a resource was last modified) allowed clients to do basic caching. For example, a client could now send aHEADrequest to get the headers of a resource and checkLast-Modifiedto know if the resource has changed since the last fetch - Servers could now provide the number of bytes in the response body with the
Content-Lengthheader. This was important for reading responses correctly, especially for binary data like images
- Thanks to the
- Status codes were sent at the beginning of the response, making it easier for clients to adapt their behavior based on whether requests succeeded or failed
HTTP/1.0added version info to the request line, e.g.,GET /index.html HTTP/1.0, to maintain backward compatibility, and pave the way for future upgrades
Though it advanced HTTP/0.9 quite a lot, HTTP/1.0 still had its limitations:
- No Persistent Connections: Every request required a new TCP connection to the server. Thus, if an HTML file referenced 5 images, the browser would open 5 separate TCP connections to the server to fetch them all. This wasn’t efficient, especially as the websites grew heavier and more complex
- No Pipelining or Multiplexing: Only 1 request could be handled per TCP connection at a time
- No Built-in Compression: All payloads were sent raw, which slowed things down when transferring large responses
HTTP/1.1 (1997)
With web exploding in its popularity, HTTP/1.1 was introduced to handle the growing demand. It became the backbone of the internet for nearly two decades and is still widely used today.
So, what made HTTP/1.1 such a big deal?
Persistent Connections
HTTP/1.1 made persistent connections the default (can be overriden with the Connection: close header in the request). Unlike HTTP/1.0, the browser no longer had to open a brand new TCP connection for every request - a single connection could be reused for multiple requests. Thus, all the resources referenced on a webpage could be downloaded over a single connection, resulting in fewer round trips, less overhead from repeatedly setting up and tearing down connections, and faster page loads. Persistent connections made the web feel noticeably snappier.
Chunked Transfer Encoding
Before HTTP/1.1, the server had to know the exact length of the response before sending it (so it could set the value of Content-Length in the response header). While this worked fine for static pages, it wasn’t efficient for dynamic content or streaming data where the final size isn’t known upfront. For example: streaming logs, live data from a DB, or any scenario where the response is built on the fly.
HTTP/1.1 introduced chunked transfer encoding, which lets the server start sending data immediately (as chunks) without knowing the final size ahead of time.
How it works:
- The server sets the
Transfer-Encoding: chunkedheader in the response - The response body is sent as a series of chunks. Each chunk is sent like:
{chunk size in hex}\r\n {chunk data}\r\n - When all chunks have been sent, the server sends a chunk size of 0 followed by an empty line to indicate the end of response
0\r\n \r\n
Modern applications use chunked encoding behind the scenes for things like server-sent events (SSE), streaming APIs, and real-time dashboards.
Better Caching Mechanisms
HTTP/1.1 allowed both, servers and clients, to have more control over how content is stored and reused, with headers like:
Cache-Control:- In requests, this is used by browsers to control how aggressively they want to bypass cached data. For example: when you hard refresh a page, the browser may send
Cache-Control: no-cacheto force a fresh version - In responses, the server sets the
Cache-Controlheader to specify the caching behavior for the sent resource. For example:Cache-Control: max-age=1800tells the browser that it can cache and use this resource for 30 minutes without checking back with the server
- In requests, this is used by browsers to control how aggressively they want to bypass cached data. For example: when you hard refresh a page, the browser may send
ETag: An Entity Tag is a unique identifier assigned by a server to a resource. The client can cache the resource with this ETag (returned in response to the client). Later, it can ask the server if the resource has changed usingIf-None-Match: {ETag-value}header. If the server’s current ETag for the resource matches the client’s, it responds with a304 Not Modifiedstatus code, indicating that the client’s version is still valid
Better Error Handling
More status codes were introduced in HTTP/1.1, allowing servers to give more meaningful feedback to the clients. This made error handling and the client-server communication more precise.
For example: 206 Partial Content allows serving range requests (important for resumable downloads), 101 Switching Protocols signals the client that the server is switching protocols as requested (used in websockets), 410 Gone tells the client that the requested resource is permanently gone, and more.
Host Header
This small feature helped the internet scale massively. Before HTTP/1.1, a typical request looked like GET /index.html HTTP/1.0. Imagine you want to run multiple websites on a single machine: how would the server distinguish which website the request is for? This meant that one IP could only server one website.
With Host header, HTTP/1.1 allowed the client to specify different domains in the request, like this:
GET /index.html HTTP/1.1
Host: awesomeHTTP.com
This simple feature allowed many websites to live on the same server/IP (virtual hosting), preventing IP exhaustion and also made it possible for shared hosting providers (like GoDaddy) to offer cheap hosting plans.
Additional Methods
HTTP/1.1 expanded the set of HTTP methods beyond the basic trio from HTTP/1.0 (GET, POST, HEAD) to offer more control over resources:
| Method | Purpose |
|---|---|
OPTIONS |
Asks the server which HTTP methods are supported for a specific resource |
PUT |
Replace or create resource |
DELETE |
Remove a resource |
TRACE |
Echo back the sent request; useful for diagnostics |
CONNECT |
Convert the connection into a tunnel (e.g., for TLS) |
Limitations
HTTP/1.1 served us well for over two decades, but as websites got heavier and more interactive, its cracks started to show:
- Even though
HTTP/1.1introduced persistent connections, it still handles one request at a time per TCP connection. Thus, the client has to wait for the previous request to be served before sending the next request. This means that if a request is slow, it will be blocking all the subsequent requests (head-of-line blocking at the application-layer level) HTTP/1.1sends headers as plain text without any built-in compression. On top of that, the same headers often get exchanged over and over on the same connection, which is wastefulHTTP/1.1uses TCP connections, which are relatively expensive to open and manage, wasting bandwidth unnecessarily
Many of these limitations directly inspired the development of HTTP/2 and HTTP/3.
HTTP/2 (2015)
By mid-2000s, it was apparent that HTTP/1.1 couldn’t sustain the growing weight and complexity of the internet. In response, Google introduced SPDY in 2009, an experimental protocol designed to make web browsing faster by tackling some shortcomings of HTTP/1.1. It served as a proof of concept and helped shape what eventually became HTTP/2.
HTTP/2 didn’t change the semantics of HTTP i.e. methods like GET, POST, and status codes like 200 OK still worked exactly the same. It’s contribution were mostly under the hood, focusing on making the protocol faster and efficient at the transport layer level.
Binary Framing Layer & Multiplexing
Instead of sending data in plain text, HTTP/2 opted for binary format, which is easier for machines to parse and process - it’s their native language, after all. At the core of this is the binary framing layer, a set of rules for how to chop up the data, label it, send it, and rebuild it on the other end. Let’s look at how it works.
In HTTP/2, all communications is done through frames. Each HTTP message (whether it’s a request or response) is split into multiple frames. Every frame consists of:
- A 9 byte frame header
- A variable-length frame payload
All the frames of a message are associated with the same Stream ID. Stream ID is best understood with an example:
Suppose a client sends 2 requests to a server. Each request gets chopped up into frames:
Request 1 (Stream ID = 1): frame #1, frame #2, frame #3 (all have Stream ID = 1)
Request 2 (Stream ID = 2): frame #4, frame #5 (all have Stream ID = 2)
Now, instead of sending all of Request 1 and then all of Request 2, the client can interleave frames on a single TCP connection like this:
Frame #1 → Frame #4 → Frame #2 → Frame #5 → Frame #3
Thus, the frames of both requests get sent together and the second request gets delivered even before the first request. Unlike HTTP/1.1, the second request didn’t have to wait for the first request to be responded to.
This ability to send multiple streams concurrently over the same connection, without blocking, is called multiplexing. Multiplexing helps HTTP/2 overcome the head-of-line blocking issue at the application layer.
Note: Multiplexing in HTTP/2 doesn’t do anything about the head-of-line blocking issue at the transport layer level. The server might still be waiting for a lost or corrputed TCP segment before it can assemble the frame and pass it over to the application.
Thus, Stream ID enables the client to interleave frames on a single TCP connection and also allows the server to assemble full messages from different frames it receives.
Each HTTP/2 frame has a 9-byte header, which includes:
- Length: Size of the frame’s payload
- Type: What kind of a frame it is ()
HEADERS,DATA, etc) - Flags: Control bits like
END_STREAMorEND_HEADERS - Stream ID: The stream this frame belongs to
Commonly used frame types:
HEADERSframe (different from a frame’s header) carries the header of the request or the responseDATAframe carries the body of the request or the response
A typical message starts with one or more HEADERS frames followed by one or more DATA frames. The sender sets:
END_HEADERSin the lastHEADERSframeEND_STREAMin the last frame (whetherHEADERSorDATA) to indicate that its the last fame from the sender
Now, what’s a stream? It’s a virtual bidirectional channel within a TCP connection that carries a single request-response pair. Here’s how it works in practice:
- Imagine you’re loading a webpage. Your browser sends a request for
index.html. All frames of this request get assigned the sameStream ID. The last frame includes theEND_STREAMflag to indicate the request is complete. Now the stream is half-closed - The server replies using the same
Stream ID, with its ownHEADERSandDATAframes. It also setsEND_STREAMin the last frame of its response. Now the stream is fully closed
Thus, the request and response shared the same stream. A stream goes through several states:
- Idle: Not used yet
- Open: A request is being sent or received
- Half-closed: Client is done sending, but the server isn’t
- Closed: Both sides are done; no more frames can be sent on this stream
Stream Prioritization
HTTP/2 enables multiple streams to be active together. Imagine a web page:
HTML (index.html)
CSS (style.css)
JavaScript (app.js)
2 images (img1.jpg, img2.jpg)
Even though all those requests can be active at the same time with HTTP/2 multiplexing, we want:
HTML first (so the browser can parse the layout)
CSS and JS next (to render and interact)
Images last (they’re not critical for page structure)
Stream prioritization is a mechanism that lets the client tell the server which streams are more important, so the server can allocate its resources smarter and faster to serve those streams at a priority.
For example: The client can tell the server “Please send this HTML stream first, it’s more important than the rest”. The client can express this prioritization in two ways:
- By attaching a priority section to the stream’s initial
HEADERSframe - Or by sending a separate
PRIORITYframe in the stream
This information (attached or sent separately) contains the following fields:
- Stream dependency ID: Which stream is this one dependent on
- Weight (value between 1 to 256): Relative importance among sibling streams; higher weights get more bandwidth
- Exclusive Flag: Tells the server to prioritize only this stream among all its siblings
Imagine a client assigns the following stream priorities:
Stream 3 = HTML (no dependency)
Stream 5 = CSS (depends on 3, weight 200, exclusive=1)
Stream 7 = Image 1 (depends on 3, weight 40)
Stream 9 = Image 2 (depends on 3, weight 10)
This forms the following dependency tree:
Stream 3
└── Stream 5 (exclusive)
├── Stream 7
└── Stream 9
Here’s how the server interprets this:
- The server will process stream 3 first as everything else depends on it
- Then stream 5 gets processed as exclusive = 1 makes it take all the bandwidth
- Then stream 7 and 9 are processed parallely but stream 7 gets 80% of the bandwidth of server while stream 9 gets 20%
Thus, stream prioritization allows clients to control how content is delivered.
Header Compression
In HTTP/1.1, headers are sent as plain text every time, even if they’re identical across messages. This creates unnecessary overhead, especially for common headers like Accept, User-Agent, etc. HTTP/2 addresses this with HPACK, a header compression format designed to reduce bandwidth usage allowing faster message transmission.
Here’s how HPACK works: HPACK uses 2 shared tables, maintained independently by both the client and server:
- Static Table: Holds commonly used values of headers, eg:
method: GET,method: POST,scheme: HTTP/2, etc - Dynamic Table: Starts empty and gets populated with the new headers encountered during the communication
To reuse a header, the sender doesn’t need to send its full key-value pair. Instead, it can just reference the index of the header (from the shared header tables) in its message, drastically reducing the size of the header block. For example: 1st request:
:method: GET
:path: /
user-agent: MyBrowser/1.0
2nd request:
:method: GET
:path: /search
user-agent: MyBrowser/1.0
For the second request, the client will just have to send :path: /search along with the index references to :method: GET and user-agent: MyBrowser/1.0. The server looks up the referenced indexes in its copy of the headers tables and reconstructs the full headers, allowing for smaller and faster transmissions.
Server Push
HTTP/2 allows the server to proactively send resources to the client before the client explicitly asks for them. Without Server Push:
- Client asks for
index.html - Client parses
index.htmland encounters<link href="style.css"> - Client makes another request for
style.css
With Server Push, the server predicts that style.css will be needed:
- The client sends a request
GET /index.htmlon stream 1 - The server responds with:
- The usual
HEADERandDATAframes on stream 1 forindex.html - A
PUSH_PROMISEframe (also on stream 1) saying: “Hey, I’m going to pushstyle.csson stream 2”
- The usual
- The server then initiates stream 2 to send the
HEADERSandDATAframes forstyle.css
Note:
- The
PUSH_PROMISEframe is sent on a client initiated stream while a new server initiated stream carries the pushed resource - Clients can reject pushed resources, either per stream or gloabally via a
SETTINGSframe - The server can’t force the client to use the pushed content; the client may discard it, especially if it has the asset already cached (
HTTP/2server push is cache-aware and rejects the server sent assets if the client already has them cached)
While Server Push sounds like a great way to reduce latency on paper, it is rarely used today due to its complexity and poor results in real world usage. The complexity comes from the fact that it’s hard to predict what the client truly needs. Also, bandwidth is wasted if the client already has the resource cached. Due to its underwhelming performance, HTTP/3 removed Server Push entirely.
Limitations
Even though it brought significant improvements over HTTP/1.1, HTTP/2 has its limitations:
- Still relies on TCP, which means it still suffers from head-of-line blocking at the transport layer
- In high latency or lossy networks, the benefits of multiplexing and header compression can be negated by TCP’s retransmission behavior
- Because it uses binary framing,
HTTP/2traffic isn’t human-readable. UnlikeHTTP/1.1, you can’t simply inspect it with basic tools liketelnet. Debugging now requires more specialized tools like Wireshark or browser developer tools HTTP/2uses HPACK, which maintains synchronized header tables on both client and server. While it reduces bandwidth usage, this shared state must be carefully managed. Desynchronization (due to lost packets or bugs) can cause errors that are hard to detect and recover from
HTTP/3 (2022)
HTTP/3 solved some of the biggest bottlenecks in HTTP/2 by moving away from TCP entirely and building on top of QUIC (see the last post for how QUIC works). Just like HTTP/2, HTTP/3 didn’t change any HTTP semantics as well. By using QUIC as the transport layer protocol:
HTTP/3no longer suffers from head-of-line blocking at the transport layer levelQUICallows faster connection setup between the client and the serverQUICdoesn’t use (IP address, port) to define a connection. Instead, it each connection is identied with a Connection ID (CID). This means if your device changes its network (switching from Wi-Fi to mobile data), which changes its IP addresses, the old connection between your device and any server it is connected to remains usable. The server won’t care that the IP address has changed as it now reles on the CID to idenify your device. Thus,QUICconnections can survive a network changeQUICruns overUDPand is implemented in the user space and not the OS kernel space. This facilitates rapid development and deployment of updates and new features. For example, it gives browser vendors more freedom to iterate and deploy improvements fasterQUICmakesHTTP/3more resilient to high latency and lossy networks
Usage Trends (2024)
| Version | Global traffic share (Source) |
|---|---|
| HTTP/1.1 | 29.9% |
| HTTP/2 | 49.6% |
| HTTP/3 | 20.5% |
That’s all for the HTTP story - from humble text based roots to multiplexed, compressed, and QUIC. And all of it, just to load cat pictures faster. Thanks for reading!