mkadirtan for Noop Today

Posted on Feb 1, 2023 • Originally published at nooptoday.com

Scalable Websocket Server Implemented by ChatGPT

#showdev #architecture #discuss

In the previous post Why Websockets are Hard to Scale we've talked about the problems with scaling websockets and some of the existing solutions as well. The post mentioned using a load balancing technique called consistent hashing, which is a very elegant solution to this problem. I promised you for a post about how to implement scalable websocket server with consistent hashing.

Well, the day has come, take your coffee and follow along. Special thanks to ChatGPT, almost all of the code you will find in this post is written by ChatGPT. I only made some adjustments, so the code actually works.

The plan is to use code generated by ChatGPT, and do as little modifications as possible on it. If you've read the previous post I'm sure you have an outline of what we will do. But to remind you again here are steps for our implementation:

Create a websocket server
Create a hashring with server addresses, update when necessary
Retrieve list of servers, and keep it updated for all servers
Redirect clients to correct servers
Extra: Create a client that can redirect itself

New favourite IDE unlocked: The Prompter

I was reluctant to test out the ChatGPT but it turns out to be very helpful, intelligent enough to give me code blocks that I can work on but still dumb enough to take my job ^^. Whether we like it or not, it is already here. As with all other technologies if you embrace it sooner than later, you can use it for your advantage. At least that is what I realized with my experience.

ChatGPT can be challenging to work with at times, but with practice and by using more specific prompts, it is possible to unlock its full potential. I recommend you to have a look at awesome-chatgpt-prompts by my dear colleague, Fatih Kadir Akın, for inspiration on how to effectively communicate with ChatGPT.

How to Better Use ChatGPT for Code Generation

Since ChatGPT will not create a fully completed project for us, each implementation step covers a single concept. Therefore we can get a working code block for each step and glue them together!

Hashring Implementation

I've started with the hashring, and asked ChatGPT to create me a hashring implementaiton in TypeScript, with support for virtual nodes. The first output from ChatGPT seemed okay but it used a > comparison for strings. Also, the hash algorithm didn't create random and even results. First, I asked for fixing the bug with > comparison. Then, I wanted it to use a hash function with better distribution.

import { createHash } from 'crypto';

class HashRing {
  private nodes: string[];
  private ring: { [key: string]: string };
  private replicas: number;

  constructor(nodes: string[], replicas: number) {
    this.nodes = nodes;
    this.replicas = replicas;
    this.ring = {};

    // Create a hash for each node and add it to the ring
    for (const node of nodes) {
      for (let i = 0; i < replicas; i++) {
        const hash = createHash('sha1')
          .update(node + ':' + i)
          .digest('hex');
        this.ring[hash] = node;
      }
    }

    // Sort the keys in the ring to ensure that they are in clockwise order
    this.sortedKeys = Object.keys(this.ring).sort((a, b) =>
      a.localeCompare(b, undefined, { numeric: true })
    );
  }

  // Get the node responsible for a given key
  getNode(key: string): string {
    const hash = createHash('sha1')
      .update(key)
      .digest('hex');
    const keys = this.sortedKeys;

    // Find the first node with a position greater than the hash
    let index = keys.findIndex((k) => k.localeCompare(hash, undefined, { numeric: true }) > 0);

    // If no such node is found, the key belongs on the first node
    if (index === -1) {
      index = 0;
    }

    return this.ring[keys[index]];
  }
}

With that complete, I copied the code into hashring.ts and exported the class. Whenever the server list is updated, this class will be recreated. Then we can iterate through all the connected sockets and send them a redirect message if they are not supposed to be on this server.

Full conversation with hashring implementation: https://sharegpt.com/c/kxhBEp6

Websocket Server Implementation

Next up, I requested ChatGPT to create a Websocket server. It immediately created a Hello World example using ws library.

import * as WebSocket from 'ws';

const wss = new WebSocket.Server({ port: 8080 });

wss.on('connection', (ws: WebSocket) => {
  ws.on('message', (message: string) => {
    console.log(`Received message => ${message}`);
  });

  ws.send('Hello! Message From Server');
});

ChatGPT has a tendency for not using default imports, but it is easy to fix. With the basic server setup, I requested ChatGPT to add feature for redirecting clients based on hashring algorithm.

import * as crypto from 'crypto';

const serverList = ['server1', 'server2', 'server3'];

function getServer(clientIp: string) {
  const hash = crypto.createHash('md5').update(clientIp).digest('hex');
  const index = parseInt(hash, 16) % serverList.length;
  return serverList[index];
}

const clientIp = '192.168.0.1';
const server = getServer(clientIp);
console.log(`Redirecting client to ${server}`);

But my request was a little unclear, so the output had the general idea but it was missing server and client interaction. So, I wanted ChatGPT to merge two outputs. After a few prompts it created a working example.

import * as WebSocket from 'ws';
import * as crypto from 'crypto';

const serverList = ['server1', 'server2', 'server3'];

function getServer(clientIp: string) {
  const hash = crypto.createHash('md5').update(clientIp).digest('hex');
  const index = parseInt(hash, 16) % serverList.length;
  return serverList[index];
}

const wss = new WebSocket.Server({ port: 8080 });

wss.on('connection', (ws: WebSocket) => {
  const clientIp = ws.upgradeReq.connection.remoteAddress;
  const server = getServer(clientIp);
  console.log(`Redirecting client ${clientIp} to ${server}`);

  ws.send(`You have been redirected to ${server}`);

  ws.on('message', (message: string) => {
    console.log(`Received message from ${clientIp} => ${message}`);
  });
});

The example was looking OK, but sending a plain text message to clients is not very flexible. So, I asked the ChatGPT to use a more managable message body. Also, it was sending redirect message whether the client was on correct server or not, so I wanted ChatGPT to fix that.

import * as WebSocket from 'ws';
import * as crypto from 'crypto';

const serverList = ['server1', 'server2', 'server3'];

function getServer(clientIp: string) {
  const hash = crypto.createHash('md5').update(clientIp).digest('hex');
  const index = parseInt(hash, 16) % serverList.length;
  return serverList[index];
}

const wss = new WebSocket.Server({ port: 8080 });

wss.on('connection', (ws: WebSocket) => {
  const clientIp = ws.upgradeReq.connection.remoteAddress;
  const correctServer = getServer(clientIp);
  console.log(`Client ${clientIp} should be connected to ${correctServer}`);

  if (ws.upgradeReq.headers.host !== correctServer) {
    console.log(`Redirecting client ${clientIp} to ${correctServer}`);

    const message = {
      type: 'redirect',
      server: correctServer,
    };
    ws.send(JSON.stringify(message));

    ws.on('message', () => {});
  } else {
    console.log(`Client ${clientIp} is already connected to ${correctServer}`);

    ws.on('message', (message: string) => {
      console.log(`Received message from ${clientIp} => ${message}`);
    });
  }
});

Voila! This looks great but there are still more work to do. For example, the server list in this example is static, and we need to manually replace getServer function with the getNode method from Hashring. One amazing side note, I asked the ChatGPT if it can create a client side example that can interact with this server. And it did, it is a very intelligent tool that will amaze me for a long time.

Full conversation with websocket server and websocket client implementation: https://sharegpt.com/c/pIc7amA

import * as WebSocket from 'ws';

let ws = new WebSocket('ws://localhost:8080');

ws.on('open', () => {
  console.log('Connected to WebSocket server');
});

ws.on('message', (data: string) => {
  const message = JSON.parse(data);
  if (message.type === 'redirect') {
    console.log(`Received redirect message. Disconnecting from current server and connecting to ${message.server}`);
    ws.close();
    ws = new WebSocket(`ws://${message.server}:8080`);
  } else {
    console.log(`Received message => ${data}`);
  }
});

ws.on('close', () => {
  console.log('Disconnected from WebSocket server');
});

Another important thing to note: Code examples from ChatGPT are not flawless, and I don't think they were meant to be. Sometimes they are easily noticable by IDEs or humans, but sometimes ChatGPT does something very unusual and doesn't work in any way. In that case, you can always ask it to fix its own code, and to my surprise again, it does a good job at it, too!

Implementing Service Registry

To create a hashring, we need addresses of all the available servers. This might sound simple, but it is really difficult to implement such things on the servers. You have to consider failure scenarios and update notifications etc. That is out of scope of this post, but as a simpler solution I was ready to use a 3rd party service registry solution. The problem is, I've never used a service registry before.

At first, I asked ChatGPT to create a service registry implementation in TypeScript. It created a solution using Redis, but the code looked too complicated to me and I didn't want to use it. Instead I've found myself having a conversation with ChatGPT about which Service Registry Tools are there, what are their differences etc.

**Me: **can you compare eureka, zookeeper and consul

**ChatGPT: **Sure!

Eureka, Zookeeper, and Consul are all tools that can be used for service discovery in a distributed environment.

Eureka is a REST-based service that is primarily used in the Java ecosystem for locating services for the purpose of load balancing and failover of middle-tier servers.

Apache Zookeeper is a distributed coordination service that enables distributed applications to achieve high availability. Zookeeper helps to coordinate distributed applications by providing a central place for them to store and access shared data.

Consul is a distributed, highly available, and data center aware tool that can be used for service discovery, configuration, and orchestration. It is written in the Go programming language and provides a HTTP API for interacting with it.

In summary, Eureka is primarily used in the Java ecosystem, Zookeeper is a distributed coordination service, and Consul is a tool that can be used for service discovery, configuration, and orchestration.

Then I asked about simple examples using each one of the tools. In the end, Consul seemed like a good solution for this purpose and as an added bonus it was written in Go, so I gave it a try.

Consul implementation from ChatGPT didn't work. Even though I asked ChatGPT to fix the code, something was not right. I didn't want to debug and fix it but I was wanting to use as much code from ChatGPT. Problem was much likely due to breaking changes from npm package ( ChatGPT doesn't have up-to-date information on everything ).

So, what did I do? I went back to square one and asked for a Redis implementation, but this time I asked for Service Discovery instead of Service Registry. Appearantly, I was asking for something slightly different - related SO post.

import Redis from 'ioredis';

const redis = new Redis({ host: '127.0.0.1' });
const updateInterval = 5; // in seconds
const heartbeatInterval = 10; // in seconds

const serviceList: Set<string> = new Set();
const heartbeatIntervals: { [instance: string]: NodeJS.Timeout } = {};

async function registerServiceInstance(instance: string) {
  await redis.sadd('service-name', instance);
  await redis.set(instance, 'alive', 'EX', heartbeatInterval * 2);
  heartbeatIntervals[instance] = setInterval(() => {
    redis.set(instance, 'alive', 'EX', heartbeatInterval * 2);
  }, heartbeatInterval * 1000);
}

async function deregisterServiceInstance(instance: string) {
  await redis.srem('service-name', instance);
  await redis.del(instance);
  clearInterval(heartbeatIntervals[instance]);
  delete heartbeatIntervals[instance];
}

async function removeStaleServiceInstances() {
  const instances = await redis.smembers('service-name');
  for (const instance of instances) {
    const ttl = await redis.ttl(instance);
    if (ttl === -2 || ttl === -1) {
      await redis.srem('service-name', instance);
    }
  }
}

async function updateServiceList() {
  const instances = await redis.smembers('service-name');
  serviceList.clear();
  for (const instance of instances) {
    serviceList.add(instance);
  }
}

setInterval(removeStaleServiceInstances, updateInterval * 1000);
setInterval(updateServiceList, updateInterval * 1000);

In addition to raw service discovery implementation, I also asked for automatic update and removing stale services. And it did a great job at creating them, too.

Full conversation with service discovery implementation: https://sharegpt.com/c/bGUAItM

Putting it All Together: Scalable Websocket Server

So far we've implemented:

Hashring, for determining which server a user belongs to,
Service Discovery to keep list of available servers,
Websocket Server for clients to connect,
Websocket API for re-routing clients when necessary,
Websocket Client to connect our servers

with the help of ChatGPT. We could of course implement these ourselves, but there is nothing wrong with getting help and speeding up the process. As far as you know what you are asking for and you can verify the code works as intended, ChatGPT only speeds up our development times. Also, it is very fun to work with.

Now we have all the pieces in our hand, we can just glue them together! Github repository with final project and ChatGPT conversations: https://github.com/nooptoday/chatgpt-scalable-websockets

Results

You can clone the repository and test it for yourself! Here is a terminal output from running instances:

node1              | [node1] - [client_size] -> [1884]
node2              | [node2] - [client_size] -> [2237]
node3              | [node3] - [client_size] -> [1879]

6000 connections are sent to node1 initially, and clients are redirected to other nodes. In an ideal world, we would expect to see something like:

node1              | [node1] - [client_size] -> [2000]
node2              | [node2] - [client_size] -> [2000]
node3              | [node3] - [client_size] -> [2000]

You can play with number of virtual nodes, which hash function to use or which parameter from client to hash and see how the results change.

If you have any questions about the implementation details you can ask here in the comments or create an issue in the GitHub repository. If you are asking, why there is a need for implementing such a solution, you can read previous post that led to this implementation: Why Websockets are Hard to Scale

That is it for this post, I hope you enjoyed and learned something new, let me know what you think in the comments!