Wagner Mattei

Posted on Sep 16, 2023

How I think CloudGaming works + Building a Game with GO and Streaming the frames via WebRTC.

#webrtc #go #gamedev #javascript

Introduction

Cloud gaming has been gaining momentum in recent years, with companies like Google, Microsoft, and Nvidia investing heavily in the technology. But what is cloud gaming, exactly? In short, it's a way to play video games without needing a powerful gaming computer or console. Instead, the game is run on a remote server and streamed to your device over the internet.

You may notice that the title of this post is not particularly assertive, and you would be correct. My objective is not to teach how to create a game that renders on the server, but rather to speculate on how this might be achieved.

While this technology has existed for a long time, there isn't much information available about it. Big players like Google or Microsoft (Xbox Cloud) offer only a high-level description of how it works, and likely use custom protocols and mixed rendering between server and client to achieve their goals.

So, I decided to give it a try!

TLDR

https://github.com/wmattei/go-snake/tree/dev-to-post

Defining the goals

My main goal for this project was to create a simple game that can be played with very few commands, and does not require a high frame-rate. Therefore, I chose to create the Snake Game.

Artifacts of a server side rendered game

After deciding which game to build, I conducted some research to understand how a cloud game should work. As a result, I came up with a few artifacts.

Commands channel

We require a method for sending commands from the client to the server. For instance, if the player presses the "Up Arrow" key, we want to send a command to the server to make the snake go up.

Game loop

Games typically have a game loop, which is an infinite loop that updates the game state based on new commands, or if no new commands are received, the game state still needs to update our snake's position.

Game Renderer

We need to render a new frame to display it back to the client. In our case, we have fixed the FPS to 10 frames per second.

Encoding

The rendered frame will be in a raw RGBA format, which is simply a matrix of pixels, with each pixel containing information about the red, green, and blue colors. However, this format is not suitable for streaming as it is not compressed. To stream the frames, we need to encode them into a new format. In this case, we have chosen h.264.

Streaming

Now that we have the frames in the correct format, we need to stream them back to the client so that they can be displayed in a simple HTML 5 <video> tag.

WebRTC

Two of our game artifacts require communication between the client and the server: the "commands channel" and the "streaming". Fortunately, we have an effective communication protocol that can solve both problems: WebRTC.

WebRTC, which stands for Web Real-Time Communication, is an open-source project that enables real-time communication of audio, video, and data directly between web browsers and mobile applications. It allows for peer-to-peer communication without the need for additional plugins or downloads.

Although WebRTC is mostly used for client-to-client connections, it can also be used as a server-client communication tool. It can be thought of as a websocket on steroids because it has two-way data channels, as well as video and audio tracks that we can use to send our frames to the client.

Encoding

Encoding is one of the most challenging aspects of our game. We need to convert our raw RGBA frames into a compressed format to stream back to the browser as quickly as possible, ensuring low latency.

Let's code

The HTML:

First, we create a basic HTML page that will serve as the starting point for the player. Most importantly, we include the video tag, which will display the game itself.

<!DOCTYPE html>
<html>
  <head>
    <script src="./index.js"></script>
    <link rel="preconnect" href="https://fonts.googleapis.com" />
    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
    <link
      href="https://fonts.googleapis.com/css2?family=Press+Start+2P&display=swap"
      rel="stylesheet"
    />
    <style>
      * {
        margin: 0;
        padding: 0;
        background-color: #1a1a1a;
      }
      #welcome_panel {
        display: flex;
        flex-direction: column;
        align-items: center;
        color: white;
        height: 100vh;

        font-family: "Press Start 2P", cursive;
      }

      #welcome_panel h1 {
        font-size: 2rem;
        margin: 2rem;
      }
      #welcome_panel p {
        text-align: center;
        line-height: 2.5rem;
      }

      #start_btn {
        padding: 1rem 2rem;
        border: none;
        border-radius: 0.5rem;
        background-color: #4caf50;
        color: white;
        font-size: 1.5rem;
        cursor: pointer;
        margin-top: 10rem;
      }
      #start_btn:hover {
        background-color: #3e8e41;
      }

      #video {
        display: none;

        width: 100vw;
        height: 100vh;
      }
    </style>
  </head>
  <body>
    <div id="welcome_panel">
      <h1>Welcome to my Snake Game</h1>
      <p>
        The page you are seeing right now is a simple HTML rendered by the
        browser.
      </p>
      <p>
        But as soon as you press the Start button, this whole page will turn
        into a single <b>HTML 5 video</b> tag
      </p>
      <button id="start_btn">Start</button>
    </div>
    <video id="video" autoplay></video>
  </body>
</html>

I also added a bit of CSS to make it look a bit better:

JavaScript

Since we are using WebRTC, we need to establish a connection with our other pair (our Golang server) and capture the player's commands to play the game.

First, let's initialize a few variables that we will use throughout the code. We will explain the meaning of each later in this post.

let dataChannel;
let peerConnection;
let signalingChannel;
let videoElement;
let welcomePanelElement;

window.onload = async function () {
  const startBtn = document.getElementById("start_btn");
  startBtn.addEventListener("click", start);

  videoElement = document.getElementById("video");
  welcomePanelElement = document.getElementById("welcome_panel");

  keyBindings();
};

function keyBindings() {
  document.addEventListener("keydown", (event) => {
    // Send command based on key press
    if (event.key === "ArrowUp") {
      sendCommand("UP");
    } else if (event.key === "ArrowDown") {
      sendCommand("DOWN");
    } else if (event.key === "ArrowLeft") {
      sendCommand("LEFT");
    } else if (event.key === "ArrowRight") {
      sendCommand("RIGHT");
    }
  });
}

In this section, we define several variables and set the HTML element variables with their reference to the DOM. Additionally, when the window loads, we define the key bindings - in our case, we bind the arrow keys - and call the function sendCommand on each key press. We will explore the sendCommand function later on.

Following this, we must manage the WebRTC negotiation involving our counterpart and the signaling server. In our scenario, the counterpart and the signaling server are synonymous since we aren't constructing a standard client-to-client communication system.

Even though all game communication happens through WebRTC, we first need to share some details with the signaling server using websockets to set up the WebRTC connection.

async function start() {
  // Initialize signaling channel
  signalingChannel = new WebSocket("ws://localhost:4000/ws");

  // Event handler for signaling channel open
  signalingChannel.addEventListener("open", async () => {
    // Initialize peer connection and data channel
    peerConnection = new RTCPeerConnection();
    createDataChannel(peerConnection);

    peerConnection.onicecandidate = handleIceCandidateEvent;
    peerConnection.ontrack = handleTrackEvent;

    // Create and send offer
    const offer = await peerConnection.createOffer({
      offerToReceiveVideo: true,
    });
    await peerConnection.setLocalDescription(offer);
    signalingChannel.send(JSON.stringify({ type: "offer", data: offer.sdp }));
  });

  // Event handler for signaling channel messages
  signalingChannel.addEventListener("message", async (event) => {
    const message = JSON.parse(event.data);

    if (message.type === "ice") {
      handleIceMessage(message.data);
    }

    if (message.type === "answer") {
      handleAnswerMessage(message.data);
    }
  });
}

function createDataChannel(peerConnection) {
  // Create a data channel for commands
  dataChannel = peerConnection.createDataChannel("commandsChannel");

  dataChannel.onerror = (error) => {
    console.log("Error on data channel:", error);
  };

  dataChannel.onclose = () => {
    setTimeout(() => {
      videoElement.style.display = "none";
      welcomePanelElement.style.display = "flex";
    }, 1000);
  };
}

As shown in the code above, we set up a websocket connection to our server (which will be built later).

Once the connection is established, we handle various WebRTC-related tasks:

Initialize a new RTCPeerConnection
Establish a data channel for sending commands
Set up a handler for when new ice candidates are gathered
Set up a handler for when a track is added to the connection
Generate and send an offer to the signaling server through our websocket connection

Additionally, we implement a listener for incoming messages from the websocket. We anticipate two message types: one for receiving the WebRTC answer and another for obtaining the ICE candidates of our counterpart.

See below the implementation of our additional functions:

const handleTrackEvent = (event) => {
  if (event.track.kind === "video") {
    // Display video and hide welcome panel
    videoElement.style.display = "block";
    welcomePanelElement.style.display = "none";

    // Set video stream
    videoElement.srcObject = event.streams[0];
  }
};

const handleIceCandidateEvent = (event) => {
  if (event.candidate) {
    // Send ICE candidate
    signalingChannel.send(
      JSON.stringify({ type: "ice", data: event.candidate.candidate })
    );
  }
};

const handleIceMessage = async (iceData) => {
  const iceCandidate = new RTCIceCandidate({
    ...iceData,
    sdpMLineIndex: 0,
    sdpMid: "0",
  });
  try {
    await peerConnection.addIceCandidate(iceCandidate);
  } catch (error) {
    console.log("Error adding ICE candidate:", error);
  }
};

const handleAnswerMessage = async (answerData) => {
  const remoteDescription = new RTCSessionDescription({
    sdp: answerData,
    type: "answer",
  });
  await peerConnection.setRemoteDescription(remoteDescription);
};

The handleTrackEvent function is responsible for updating the HTML page's view, hiding the welcome panel, and displaying the video tag. It also assigns the new track added by our WebRTC counterpart (our Go server) to the video element's source (src).

The handleIceCandidateEvent function transmits the collected ICE candidates from the local WebRTC connection to our counterpart via the WebSocket connection.

The handleIceMessage function receives ICE candidates from our counterpart's WebRTC pair and sets them for the local pair.

The handleAnswerMessage function sets the remoteDescription using the SDP file received as a WebSocket message.

Lastly, we define our function for sending commands through the WebRTC channel once the negotiation is completed.

function sendCommand(command) {
  dataChannel.send(JSON.stringify({ type: "command", data: command }));
}

Let's code our server!

In WebRTC negotiations, using a signaling server to exchange SDP definitions and ICE candidates between pairs is a common practice. And that's exactly what we did using a WebSocket as you saw earlier.

In our specific case, we'll develop a Golang server that serves as both a signaling server and one of the WebRTC pairs.

In this post, I'll omit the detailed implementation of the signaling server and the Golang portion of the negotiation since it closely resembles what we've already covered in JavaScript.

The starting point of our server, starts a simple HTTP server and listens for request on /ws

func main() {
    http.HandleFunc("/ws", handleWebsocketConnection)
    err := http.ListenAndServe(fmt.Sprintf(":%d", port), nil)
    snake_errors.HandleError(err)
}

snake_errors.HandleError is just an utility module that I created to PANIC on any error (lol)

Here is the handleWebsocketConnection code:

func handleWebsocketConnection(w http.ResponseWriter, r *http.Request) {
    peerConnection, err := snake_webrtc.CreateAndNegotiatePeerConnection(w, r)
    snake_errors.HandleError(err)

    track := peerConnection.GetSenders()[0].Track().(*webrtc.TrackLocalStaticSample)
    fmt.Println("Peer connection established")

    handleDataChannel(peerConnection, track)
}

The entire WebRTC signaling and negotiation process is encapsulated in snake_webrtc.CreateAndNegotiatePeerConnection. I mentioned that I wouldn't delve into the specifics in this post, but feel free to explore how it operates in the project's GitHub repository. In fact, the function does way more than just creating and negotiating a peer connection, but best practices are beyond the scope of this post.

Once we have our peerConnection connected to the other pair (our javascript code) and we already have a created track to send our videos, we will start listening to our data channel:

func handleDataChannel(peerConnection *webrtc.PeerConnection, track *webrtc.TrackLocalStaticSample) {
    peerConnection.OnDataChannel(func(dataChannel *webrtc.DataChannel) {
        fmt.Println("Data channel established")
        closeSignal := make(chan bool)

        commandChannel := make(chan string)
        gameStateCh := make(chan *game.GameState)
        pixelCh := make(chan []byte)
        encodedFrameCh := make(chan []byte)

        gameLoop := game.NewGameLoop(&game.GameLoopInit{CommandChannel: commandChannel, GameStateChannel: gameStateCh, CloseSignal: closeSignal})
        go gameLoop.Start()

        go renderer.StartFrameRenderer(gameStateCh, pixelCh)
        go encoder.StartEncoder(pixelCh, encodedFrameCh)
        go stream.StartStreaming(encodedFrameCh, track)

        go handleChannelClose(dataChannel, peerConnection, gameStateCh, commandChannel, pixelCh, encodedFrameCh, closeSignal)

        dataChannel.OnMessage(func(msg webrtc.DataChannelMessage) {
            handleDataChannelMessage(msg, commandChannel)
        })
    })
}

Once a new data channel connection is established, we kickstart our game loop, renderer, encoder, and streamer. We also set up a listener for incoming messages, which will contain players' commands.

func handleDataChannelMessage(msg webrtc.DataChannelMessage, commandChannel chan string) {
    var message snake_webrtc.Message
    err := json.Unmarshal(msg.Data, &message)
    if err != nil {
        fmt.Println("Error unmarshalling message:", err)
        return
    }
    if message.Type != "command" {
        fmt.Println("Channel used for wrong message type:", message.Type)
        return
    }

    fmt.Println("Received command:", message.Data.(string))
    commandChannel <- message.Data.(string)
}

The handleDataChannelMessage decodes new messages coming from our counterpart pair and if the message has the correct Type, it will send the commands to the commandChannel

The game loop.

As depicted earlier, we initiated a game loop responsible for computing the game state in each iteration. This game loop must respond to new commands issued by the player, as well as update the game when no commands are sent. For instance, if the snake is moving left and no new commands are received, the snake should continue in that leftward direction.

type GameLoop struct {
    gameState      *GameState
    commandChannel chan string
    gameStateCh    chan *GameState
    closeSignal    chan bool
    frameTicker    *time.Ticker
}

type GameLoopInit struct {
    CommandChannel   chan string
    GameStateChannel chan *GameState
    CloseSignal      chan bool
}

func NewGameLoop(options *GameLoopInit) *GameLoop {
    return &GameLoop{
        gameState:      NewGameState(constants.ROWS, constants.COLS),
        commandChannel: options.CommandChannel,
        gameStateCh:    options.GameStateChannel,
        closeSignal:    options.CloseSignal,
        frameTicker:    time.NewTicker(time.Second / constants.FPS),
    }
}

func (gl *GameLoop) Start() {
    defer gl.frameTicker.Stop()

    for {
        select {
        case command := <-gl.commandChannel:
            gl.handleCommand(command)

        case <-gl.frameTicker.C:
            gl.updateGameState(nil)
            gl.gameStateCh <- gl.gameState
        }
    }
}

func (gl *GameLoop) Close() {
    gl.closeSignal <- true
}

func (gl *GameLoop) handleCommand(command string) error {
    return gl.updateGameState(&command)
}

func (gl *GameLoop) updateGameState(command *string) error {
    gameOver := !gl.gameState.handleCommand(command)
    if gameOver {
        gl.Close()
    }
    return nil
}

This code manages updates to the game state and sets a flag (closeSignal) to true upon detecting a game over.

I'll bypass the detailed implementation of the game state in this post, as it exceeds the intended scope. Feel free to examine the complete implementation on GitHub.

The game renderer.

The game renderer's role is to receive the game state and generate a frame based on that state. For instance, if the game state indicates that the snake is 5 squares long, the renderer should display the corresponding snake colors in the pixels representing those 5 squares. This rendering process occurs each time the game state is updated.

var (
    snakeHeadColor = color.RGBA{R: 255, G: 0, B: 0, A: 255}
    snakeBodyColor = color.RGBA{R: 255, G: 255, B: 255, A: 255}
    foodColor      = color.RGBA{R: 0, G: 255, B: 0, A: 255}
)

const bytesPerPixel = 3 // RGB: 3 bytes per pixel

func drawRectangle(img *image.RGBA, min, max image.Point, col color.RGBA) {
    for x := min.X; x < max.X; x++ {
        for y := min.Y; y < max.Y; y++ {
            img.Set(x, y, col)
        }
    }
}

func convertRGBAtoRGB(img *image.RGBA) []byte {
    width, height := img.Rect.Dx(), img.Rect.Dy()
    rawRGBData := make([]byte, bytesPerPixel*width*height)

    idx := 0
    for y := 0; y < height; y++ {
        for x := 0; x < width; x++ {
            pixel := img.RGBAAt(x, y)
            rawRGBData[idx] = pixel.R
            rawRGBData[idx+1] = pixel.G
            rawRGBData[idx+2] = pixel.B
            idx += bytesPerPixel
        }
    }

    return rawRGBData
}

func StartFrameRenderer(gameStateCh chan *game.GameState, pixelCh chan []byte) {
    for {
        gameState := <-gameStateCh
        if gameState == nil {
            break
        }

        img := image.NewRGBA(image.Rect(0, 0, constants.FRAME_WIDTH, constants.FRAME_HEIGHT))
        matrix := gameState.GetMatrix()

        for y := 0; y < len(matrix); y++ {
            for x := 0; x < len(matrix[0]); x++ {
                rectMin := image.Point{X: x * constants.CHUNK_SIZE, Y: y * constants.CHUNK_SIZE}
                rectMax := image.Point{X: rectMin.X + constants.CHUNK_SIZE, Y: rectMin.Y + constants.CHUNK_SIZE}
                switch matrix[y][x] {
                case 1:
                    drawRectangle(img, rectMin, rectMax, snakeHeadColor)
                case 2:
                    drawRectangle(img, rectMin, rectMax, snakeBodyColor)
                case 3:
                    drawRectangle(img, rectMin, rectMax, foodColor)
                }
            }
        }

        rawRGBData := convertRGBAtoRGB(img)
        pixelCh <- rawRGBData
    }
}

As observed in the provided code, we transmit raw RGB data to a pixel channel. This RGB data must undergo encoding into h.264, as we previously discussed in this post.

Encoding.

We decided to use ffmpeg to do our encoding:

const ffmpegCommand = "ffmpeg -hide_banner -loglevel error -f rawvideo -pixel_format rgb24 -video_size %dx%d -framerate %d -i pipe:0 -c:v libx264 -preset ultrafast -tune zerolatency -f h264 pipe:1"

func StartEncoder(pixelCh chan []byte, encodedFrameCh chan []byte) {
    for {
        rawRGBDataFrame, ok := <-pixelCh
        if !ok {
            // Channel closed, exit the loop
            break
        }

        cmd := exec.Command("bash", "-c", fmt.Sprintf(ffmpegCommand, constants.FRAME_WIDTH, constants.FRAME_HEIGHT, constants.FPS))
        cmd.Stderr = os.Stderr

        // Create a pipe for input and output
        inPipe, err := cmd.StdinPipe()
        snake_errors.HandleError(err)
        outPipe, err := cmd.StdoutPipe()
        snake_errors.HandleError(err)

        // Start the command
        if err := cmd.Start(); err != nil {
            snake_errors.HandleError(err)
            continue
        }

        // Write raw RGB data to the input pipe
        _, err = inPipe.Write(rawRGBDataFrame)
        snake_errors.HandleError(err)

        // Close the input pipe to indicate no more input
        inPipe.Close()

        // Read H.264 NAL units from the output pipe and send to the channel
        encodedData, err := readH264NALUnits(outPipe)
        snake_errors.HandleError(err)

        // Wait for the command to finish
        err = cmd.Wait()
        snake_errors.HandleError(err)

        // Send the encoded data to the channel
        encodedFrameCh <- encodedData
    }
}

func readH264NALUnits(outPipe io.Reader) ([]byte, error) {
    h264, err := h264reader.NewReader(outPipe)
    if err != nil {
        return nil, fmt.Errorf("failed to create H.264 reader: %v", err)
    }

    var data []byte
    var spsAndPpsCache []byte

    for {
        nal, h264Err := h264.NextNAL()
        if h264Err == io.EOF {
            // Finished sending frames
            break
        } else if h264Err != nil {
            return nil, fmt.Errorf("error reading H.264 NAL: %v", h264Err)
        }

        nal.Data = append([]byte{0x00, 0x00, 0x00, 0x01}, nal.Data...)
        if nal.UnitType == h264reader.NalUnitTypeSPS || nal.UnitType == h264reader.NalUnitTypePPS {
            spsAndPpsCache = append(spsAndPpsCache, nal.Data...)
            continue
        } else if nal.UnitType == h264reader.NalUnitTypeCodedSliceIdr {
            nal.Data = append(spsAndPpsCache, nal.Data...)
            spsAndPpsCache = []byte{}
        }

        // Append NAL unit data to the result
        data = append(data, nal.Data...)
    }

    return data, nil
}

The encoding code leverages FFmpeg, a powerful multimedia processing tool, to encode raw RGB frames into the H.264 video format. The RGB frames, representing individual frames of the game, are continuously fed into a FFmpeg command that utilizes specific settings for ultrafast encoding and zero-latency tuning. The RGB frames are processed through the FFmpeg command, producing H.264-encoded frames that are then sent for further use or streaming. This process is crucial for efficiently compressing and transmitting video frames in real-time applications such as live streaming or video communication, ensuring optimal performance and reduced bandwidth consumption.

Streaming

After all our frames are converted into a streamable format, we will now send those frames to our WebRTC track that we opened before.

func StartStreaming(encodedFrameCh chan []byte, videoTrack *webrtc.TrackLocalStaticSample) {
    for {
        encodedFrame := <-encodedFrameCh
        if encodedFrame == nil {
            break
        }
        videoTrack.WriteSample(media.Sample{Data: encodedFrame, Duration: time.Second / constants.FPS, Timestamp: time.Now()})
    }
}

Every new frame gets written to the track along with information regarding duration and Timestamp.

Conclusion

With all our steps finished, we should have a running Snake Game, rendered in the server and streamed to the client on a very low latency:

Final considerations:

The goal of this post, as mentioned before, is not to be a tutorial on how to build a cloud game, rather, it is a report on how I did it.

In a real time cloud game, I would never mix the game engine with the cloud game server, which is what we are doing here. Instead, I would use some sort of “frame capturing” approach and my server only responsibility would be to stream those frames back to the client.

I deeply encourage criticism on this post, as I don't see much content of Cloud Gaming in the developer community.

Thanks for reading! Bye!!

Top comments (1)

Abde miller • Jan 30 '24 • Edited

Cloud gaming typically operates by rendering game frames on powerful servers in the cloud, where the actual game processing takes place. As for Hailey's Adventure in building a game with Go, the journey involves utilizing Go's robust capabilities for game development. Once the frames are generated, they are streamed to the user's device via WebRTC, a real-time communication protocol that facilitates low-latency transmission. This immersive experience allows players to embark on Hailey's adventure seamlessly, with the game's frames dynamically delivered over the internet, ensuring a smooth and responsive gaming experience. The combination of Go's game development prowess and WebRTC's real-time streaming capabilities creates a captivating adventure for players engaging with Hailey's world.

DEV Community

How I think CloudGaming works + Building a Game with GO and Streaming the frames via WebRTC.

Introduction

TLDR

Defining the goals

Artifacts of a server side rendered game

Commands channel

Game loop

Game Renderer

Encoding

Streaming

WebRTC

Encoding

Let's code

The HTML:

JavaScript

Let's code our server!

The game loop.

The game renderer.

Encoding.

Streaming

Conclusion

Final considerations:

Top comments (1)

Read next

Stay ahead in web development: latest news, tools, and insights #69

How 🚀 Go is Changing 💻 the Tech 🌐 Landscape 🏞️ in 2025 👀

Game Dev Digest — Issue #266 - Shaders, Testing, Trailers, and more

Game Dev Digest — Issue #262 - VFX, XR, Replay Systems, and More