WebSockets are an amazing technology that allow us to have two way communication with our clients. The default example for this is always a simple chat application where two browsers are able to communicate in near real time, but WebSockets are so much more than just a protocol for chat. WebSockets allow us to build interactive event driven experiences for our users. However before we can go pushing data out to our users browser we need to know who they are, this is where we need to talk about authentication.
Over the years we have established well understood standards REST based SPA applications using access tokens. These mechanisms work well by allowing us to send data about the user and proof of their identity with every request. Unfortunately one of the trade offs with WebSockets is that they do away with sending cookies and headers with each request. This means lower overhead but it also means that now we need a new mechanism for authenticating users.
When talking about Authentication and WebSockets it's important to remember that the communication paradigm has gone from one way to two way. In the REST based SPA world the authentication could be checked when the client requested a resource, it only really mattered that the users authentication could be established at that one point in time. When moving to the asynchronous WebSocket model the client isn't requesting a resource but is instead subscribing for updates, now the server needs to store that the connection is authenticated and how long the authentication is valid for in a data store that persists across connections. Because the server is able to communicate with the client at it's own terms it needs to check this store to establish which connection to use for a given identity. It's important to check the expiry of this authentication to prevent potentially exposing user data by pushing data to an abandoned browser session.
There are two ways that the server can use WebSocket messages to get this authentication information into it's datastore.
The first strategy for authentication is to have the clients send an explicit authentication message. In this strategy the message that causes the connection to be considered authenticated has a special type that indicates that the client is proving it's identity. The payload of this message includes the JWT provided by the trusted identity provider. The server is then able to process this message, validate the token and connect that connection with the identity of the client.
Generally speaking the Explicit Authentication Message strategy is a good choice for how to establish authentication on WebSocket connections. It requires less overhead in each message resulting in less traffic on the network and less overhead being required for the server to process each message. WebSockets are a stateful protocol and the server needs to hold information about the clients connected to it, and I don't see a reason why storing data about the authentication would be less secure as long as you're not storing the actual JWT.
The second strategy is to include authentication in each message. In this model the client adds a property to the sent object that includes the JWT. The server then processes that JWT as part of the message handing and updates it's data store to represent the current state of the connections authentication.
The case for Authentication In Each Message is when the order of processing cannot be guaranteed. While WebSockets are a stateful protocol there is nothing enforcing that the thing terminating the WebSocket is the same thing that is processing the message. Messages may be put onto a queue or passed off to a load balanced set of handlers. In this case including authentication tokens in each message may be simpler than orchestrating an authentication process before messages can be sent.
Another key difference when using authenticated WebSockets rather than authenticated REST requests is that we need to consider how we notify the client that their authentication needs renewing. This problem has two parts. First how does the server respond when receiving an request that is not authenticated or for a connection with expired authentication. When using WebSockets there is no explicit response from a server connected to the receiving of a message that can include an authentication expired response.The second part of the problem is what does the server do when the authentication has expired and the server wants to send messages to client.
There are a few options that could be used in either or both of these cases.
In this "solution" the server just continues to function. It accepts messages from and pushes messages to the client. It accepts that if the socket users identity has been validated in the past then that is good until the socket closes. It risks pushing user data to the client even after the authentication has expired but makes this trade off in exchange for greater simplicity.
The second simplest solution is to close the socket. When the socket is closed the client will be notified and it can re-establish the socket and re-authenticate. The downsides of this are that the server needs to make a decision about any messages yet to be processed. If the server doesn't want to lose these messages it's going ot need a way to tie them to the next connection.
This solution is to send a
reauthentication required message to the client. The client is then able get a new authentication token and pass it back. If the server doesn't want to lose any messages it needs to keep these aside until the authentication is updated and then pass them back to the client. This is the most complex of the options but comes with the greatest level of security.
In the end WebSockets aren't that hard to authenticate. We can continue to use the access token based authentication that we are used to from building REST based single page applications. They come with some amazing advantages in being send data two ways, but these advantages come with their own complexity around how we manage authentication.