Why are WebSockets so hard?

#frontend #momento #websockets

A couple of years ago I worked on a project to bring real-time notifications into my web application. I was excited at the idea of “real-time” and immediately knew I was going to get a chance to implement WebSockets.

I knew what WebSockets did, but I didn’t really know what they were - meaning I knew you could send messages from a server to a browser, but I had no idea how. I didn’t know much more than the fact there were “connections” that you could use to push data both to and from the back-end.

I set off to build what I thought was going to be a two-day task. What could possibly go wrong?

I then plunged into a downward spiral of complexity that made me rethink being a software engineer. Let’s talk about it.

WebSocket API structure

I come from a REST background. Endpoints have resource-based paths with intent shown by which http method you’re using (GET = load data, POST = create data, PUT = update data, etc…).

The first thing I saw in the AWS API Gateway documentation were these weird $connect and $disconnect routes. By naming convention I assumed what these routes did, but I didn’t know what to do with them.

It wasn’t intuitive to me how to uniquely identify a user who was trying to connect. I didn’t know if data would freely pass back and forth across this connection once it was established. I also had no idea how to keep track of the connection or if I even needed to keep track. It was just one rabbit hole after another.

Eventually I discovered that with AWS API Gateway, the connections are managed by the service itself, but you (the developer) are responsible for keeping track of who is connected and what information they receive. I also learned that data does not just freely flow back and forth.

For interactions going from the client to the server, you have to define your own routes and point them to backing compute resources. Each route required an API Gateway V2 Route, API Gateway V2 Integration, Lambda function, and Lambda function permission resource defined in my Infrastructure as Code, which was about 50 lines per route.

For data going from the server to the client, you can send anything you want. You need to develop a convention for identifying different types of messages so you can handle them appropriately.

The disparity between client-to-server and server-to-client threw me for a loop. One was very rigid and structured, while the other was loosey goosey. It doesn’t quite feel like a way to build scalable, maintainable software.

Connection management

As I said earlier, API Gateway manages maintaining connections for you, but you’re responsible for figuring out what data to send to which connection. Let’s take an example:

Imagine our user, Mallory, wants to be notified when tickets for Taylor Swift, Adele, or Ed Sheeran become available. When she connects to our ticket vendor site we save 4 records into our database:

One record that identifies the connection and user metadata
One record for each artist she wants to be notified for

For the artist records, the pk is her connection id and the sk indicates it’s a subscription record. We add the artist name as a GSI so when we get an event indicating that Ed Sheeran tickets are on sale, we can immediately notify all the connections subscribed to him.

To notify the subscribers with an AWS serverless back-end, we’d trigger a Lambda function on an EventBridge event saying which artist had tickets available. The function would query the artist GSI in DynamoDB to find all the connections subscribed to the incoming artist. Then, we’d iterate over each record publishing the ticket information to the connected users. That’s a lot of work!

When the user disconnects, we can query the database for all records with the pk containing the connection id and delete them. In case we miss the disconnect event from API Gateway, we set a time to live (TTL) on the connection records for 24 hours (or whatever fits your use case) to delete them automatically.

This is a lot of infrastructure for something with “technically” no business value. This is simply a microservice that alerts users. This is code that you have to maintain over time that could get stale, slow, or deprecated. Code is a liability, after all.

Security

I come from a GovTech background. An app isn’t secure until it’s overly secure. So when I found out that the only route on a WebSocket API that supports auth is $connect, I was a little taken aback. Once a connection is established, it has free reign to call any route it wants without passing in an auth header or any other form of credentials.

I’ve had a while to stew on this, and it makes sense in theory. Since WebSocket connections are stateful, you shouldn’t need to reauthenticate every time you make a call. That would be like knocking on someone’s door, saying your name, then after you’re inside, restating who you are every time you do something. Doesn’t really make sense.

Passing in an auth header to a WebSocket isn’t as easy as you’d think either. Popular clients like SocketIO don’t really support auth headers well unless you use it for both the client and server. Best way I found to pass a bearer token through to a WebSocket hosted in AWS was to use a query string parameter. You could also repurpose the Sec-WebSocket-Protocol header to accept both a subprotocol and the auth token, but that is against the grain and one of those “just because you could doesn’t mean you should” moments.

Client-side SDKs

People seem to love SocketIO. It has over 4 million weekly downloads on npm and is arguably one of the better ways to connect to a WebSocket. But just because it’s popular doesn’t mean it’s easy.

For whatever reason I struggled big time to get it working with API Gateway. Something with the WebSocket protocol (wss instead of https) and the way AWS set up the API just didn’t get along well.

Through much trial and error, shifting auth around, and a few rage quits, I’ve been able to get WebSockets hooked up to my user interfaces once or twice. But every time I do it, I have to relearn the tricks of getting it just right. Sometimes when things do everything, like SocketIO, they lose a bit of their intuitiveness and developer experience.

An easier way

With Momento Topics, all the hard parts of WebSockets are abstracted away. There is no API structure to build. Subscribers can connect and register for updates to specific channels with a single API call:

await topicClient.subscribe('websocket', 'mychannel', {
  onItem: (data) => { handleItem(data.valueString()); },
  onError: (err) => { console.error(err); }
});

To publish to a channel, the call is even simpler:

await topicClient.publish('websocket', 'mychannel', JSON.stringify({ detail }));

You can connect service to service, service to browser, even browser to browser with Topics. Since the service uses Momento’s servers for connection management, you have options available that haven’t been possible before, like having two browser sessions communicate without getting a server involved. This leaves you with two responsibilities: publishing data when it’s ready and subscribing for updates.

As with other serverless services, Momento Topics comes with security at top of mind, but also leaves you with flexible options to restrict access. With fine-grained access control, you can configure your API tokens to be scoped as narrowly as possible. An example access policy might be:

const tokenScope = {
  permissions: [
    {
      role: 'subscribeonly',
      cache: 'websocket',
      topic: 'mychannel'
    }
  ]
};

An API token created with this set of permissions would only be allowed to subscribe to the mychannel topic in the websocket cache. If someone intercepted the token and attempted to publish data or subscribe to a different topic, they would receive an authorization error.

Momento has a plethora of SDKs for you to integrate with. For browsers, you can use the Web SDK. For server-side development, the Topics service is available for TypeScript/JavaScript, Python, and Go, with support for .NET, Java, Elixir, PHP, Ruby, and Rust coming soon.

What’s the catch?

Hopefully that sounds too good to be true. It did to me at first. Heck, it still does. But there is no catch. Momento’s mission is to provide best-in-class developer experience for their serverless services and take as much of the burden off of developers as possible.

You don’t need to spend weeks building notification services that handle complex connection management and event routing. Let SaaS providers like Momento take the operational overhead from you so you can focus on what really matters.

Pricing is simple, $.50/GB of data transfer in and out, with a 5GB perpetual free tier. There’s no reason not to try it!

Looking for examples? Check out this fully functional chat application built with Topics in Next.js. You can also try our work-in-progress game Acorn Hunt, built on both Momento Cache and Topics.

If you have any questions, feedback, or feature requests, please feel free to let the team know via Discord or through the website. These services are for all of us and we want to build the best possible product to get you to production safely and quickly.

Happy coding!