Jean-Paul Rustom

Posted on Mar 9, 2024

WebSockets Explained Under 10 Minutes (With Visuals)

#webdev #backenddevelopment #javascript #mobile

Use-cases

WebSockets are used in systems where we need to display data in real time and with low latency, such as chatting applications, stocks prices fluctuations, or game leaderboards.

History of HTTP 1.0 & HTTP 1.1

Now before moving forward, let’s step back and talk a little bit about history.

HTTP 1.0 ( 1996 )

In HTTP 1.0, each separate request would have its own TCP connection.

We would open a TCP connection, send a request, and as soon as a response is received, we would close that connection.
For example if we would want to load four images, we would open and close four separate TCP connections, which would kill the performance.

HTTP 1.1 ( 1997 )

Now in HTTP 1.1, things have changed.

We have a new header called Connection: ‘Keep-Alive’.

We can initiate a TCP connection, keep it open, and have multiple requests and responses in this single TCP connection.

Because persistent connections were introduced in HTTP 1.1, WebSockets’ minimum HTTP version should be 1.1.

Polling

Historically, creating web apps that needed bidirectional communication, has required an abuse of HTTP to poll the server for updates.

But sending multiple requests is expensive and could cause server overload.

A simpler solution would be to use a single TCP connection for traffic in both directions.

This is what the WebSocket Protocol provides.

Bi-directional Protocol

As previously stated, WebSocket is a stateful bidirectional protocol, built on top of HTTP.

It use a single TCP connection for traffic in both directions.

The connection between client and server will keep alive until it is terminated by client, or by server.

WebSocket uses those default URI schemes, for secure and unsecure connections respectively



wss://jaypmedia.com/socket
ws://jaypmedia.com/socket

WebSocket vs HTTP

Also, WebSocket is not HTTP.
It is, indeed, more complex, and of course more persistent and more lightweight.
It is an independent TCP-based protocol.
Its only relationship to HTTP is that its handshake is interpreted by HTTP servers as an HTTP Upgrade Request.

Now don’t get the wrong idea, HTTP is great, but the request response model doesn’t cover bi-directional communication.

Handshake

The WebSocket handshake is the bridge from HTTP to WebSockets.

Client

The WebSocket protocol begins its connection to the server as a simple HTTP request.

In order to start the connection, clients sends an http GET request, that includes at least the following headers:



GET /chat HTTP/1.1

Host: server.jaypmedia.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

If any of these are not included in the HTTP headers, the server should respond with an HTTP error code 400 Bad Request.

The Connection: Upgrade header was introduced in HTTP/1.1 to allow the client to notify the server of alternate means of communication.
The Sec-WebSocket-Key header is used during the WebSocket handshake to ensure that the client and server are speaking the WebSocket protocol.
It is a base64 encoded value that is generated by randomly selecting 16-byte value as a nonce.
The Sec-WebSocket-Version header indicates the version of the WebSocket protocol that the client supports.
If the client is a web browser, it will supply the Origin header.
If the server does not wish to accept connections from this origin, it can choose to reject the connection.
Server will only accept connections from listed origins.

If you are using a browser that supports WebSocket, the whole handshake and the generation of the relevant headers will be handled automatically by using the JavaScript API.

Server

The handshake from the server looks like this:



        HTTP/1.1 101 Switching Protocols

        Upgrade: websocket
        Connection: Upgrade
        Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

According to the WebSocket spec, the only indication that a connection to the WebSocket server has been accepted is the header field Sec-WebSocket-Accept.
To get its value, the server would concatenate the value of Sec-WebSocket-Key received from the client, with a predefined global unique identifier, defined by the RFC.
Then, the string formed will be hashed, then base64 encoded.

This magic string exists because it will very likely not be used by servers that do not understand WebSockets.

The server responds with a 101 status code.

Any code other than 101 results in an error and means that WebSocket handshake was not completed.

Once the client and server have both sent their handshakes, and if the handshake was successful, then the data transfer part starts.

WebSocket Frames

Now let’s get a little bit deeper, shall we ?

After a successful handshake, clients and servers transfer data back and forth using, at the bit level, a sequence of frames.

There are control frames and data frames.

Control frames communicate state about the WebSocket, for example the close frame which is used for closing connections.

Data frames on the other hand, as their name implies, carry regular application data.

Contrary to control frames, they can be fragmented.

For security reasons and other concerns explained by the RFC, it is required that a client MUST mask all frames that it sends to the server, whether or not TLS is used.

Those concerns are related to proxies in the middle that do not understand WebSockets.