DEV Community

Cover image for CloudGaming Part 2: That code was terrible lol
Wagner Mattei
Wagner Mattei

Posted on

CloudGaming Part 2: That code was terrible lol

A few weeks ago, I wrote my first post about Cloud Gaming. My goal was to learn how cloud gaming works. I made a simple game that takes player moves, makes frames of the game, and sends them to the player with WebRTC video.

I did it! I made a snake game and it worked.

But, I only called it a success because a snake game doesn't need more than 10fps.

I knew that a bigger game would have delay issues and might not run smoothly at 30 or 60fps.

So, I chose to make a new game.

TLDR

https://github.com/wmattei/go-snake/tree/dev-to-10-27-2023

The Latency Check Game

I aimed to create an animation where a red rectangle would appear at the player's current mouse position. This meant every change in the mouse position had to be sent to our server. Then, 60 times a second, I needed to update the game state, render a new frame, encode that frame to h264, and stream it via WebRTC.

Here's the outcome with the architecture I used:

This is bad

As evident, the results were poor. I was losing many frames as I couldn't deliver them quickly enough.

The bottleneck

I began timing each step of my process to identify the slow parts and see how to speed them up.

The main issue was the encoder, which took about 40ms for each frame. Here's the code I had in place:

func StartEncoder(pixelCh chan []byte, encodedFrameCh chan []byte, windowWidth, windowHeight int) {
    ffmpegCommand = fmt.Sprintf(ffmpegBaseCommand, windowWidth, windowHeight, constants.FPS)
    for {
        rawRGBDataFrame, ok := <-pixelCh
        if !ok {
            break
        }
        encodedData, err := encodeFrame(rawRGBDataFrame, windowWidth, windowHeight)
        logutil.LogFatal(err)

        encodedFrameCh <- encodedData
    }
}

const ffmpegBaseCommand = "ffmpeg -hide_banner -loglevel error -f rawvideo -pixel_format rgb24 -video_size %dx%d -framerate %d -i pipe:0 -c:v libx264 -preset ultrafast -tune zerolatency -f h264 pipe:1"

var ffmpegCommand string

func encodeFrame(rawFrame []byte, windowWidth, windowHeight int) ([]byte, error) {
    cmd := exec.Command("bash", "-c", ffmpegCommand)
    cmd.Stderr = os.Stderr

    inPipe, err := cmd.StdinPipe()
    logutil.LogFatal(err)
    outPipe, err := cmd.StdoutPipe()
    logutil.LogFatal(err)

    if err := cmd.Start(); err != nil {
        logutil.LogFatal(err)
        return nil, err
    }

    _, err = inPipe.Write(rawFrame)
    if err != nil {
        return nil, err
    }

    inPipe.Close()

    encodedData, err := readH264NALUnits(outPipe)
    if err != nil {
        return nil, err
    }

    err = cmd.Wait()
    if err != nil {
        return nil, err
    }

    return encodedData, nil
}

func readH264NALUnits(outPipe io.Reader) ([]byte, error) {
    h264, err := h264reader.NewReader(outPipe)
    if err != nil {
        return nil, fmt.Errorf("failed to create H.264 reader: %v", err)
    }

    var data []byte
    var spsAndPpsCache []byte

    for {
        nal, h264Err := h264.NextNAL()
        if h264Err == io.EOF {
            break
        } else if h264Err != nil {
            return nil, fmt.Errorf("error reading H.264 NAL: %v", h264Err)
        }

        nal.Data = append([]byte{0x00, 0x00, 0x00, 0x01}, nal.Data...)
        if nal.UnitType == h264reader.NalUnitTypeSPS || nal.UnitType == h264reader.NalUnitTypePPS {
            spsAndPpsCache = append(spsAndPpsCache, nal.Data...)
            continue
        } else if nal.UnitType == h264reader.NalUnitTypeCodedSliceIdr {
            nal.Data = append(spsAndPpsCache, nal.Data...)
            spsAndPpsCache = []byte{}
        }

        data = append(data, nal.Data...)
    }

    return data, nil
}
Enter fullscreen mode Exit fullscreen mode

It was lagging because for every frame, we initiated a new ffmpeg process, wrote the raw frame to an input pipe, and read the result from an output pipe. Using that output, we gathered all the H264 NAL units to set up a batch of data for WebRTC transmission.

I basically build the encoder in the worst possible way lol.

The First Solution

The first fix I attempted was encoding several frames at once by starting 4 new go routines to grab raw frames from the pixelCh. While this made the game run quicker, it mainly just hid the problem that the encoder was too slow.

The final solution

After researching how ffmpeg and pipes function, I discovered that I could initiate an ffmpeg process at the start of the function. Then, I could continuously "feed" the command with data by writing raw frames to the input pipe in a separate go routine. Concurrently, I'd launch another goroutine to retrieve the compressed frames from the output pipe and forward them to our streamer, bypassing the need to read the NAL.

Our brand new encoder can now encode our Latency Check™ game frames in less than 2ms.

And here is the code:

func (e *Encoder) Start() {

    debug := "-hide_banner -loglevel error"
    if constants.FFMPEG_BANNER {
        debug = ""
    }

    ffmpegCommand := fmt.Sprintf(ffmpegBaseCommand, debug, e.windowWidth, e.windowHeight, constants.FPS, constants.FPS)
    cmd := exec.Command("bash", "-c", ffmpegCommand)
    cmd.Stderr = os.Stderr
    inPipe, err := cmd.StdinPipe()
    logutil.LogFatal(err)
    outPipe, err := cmd.StdoutPipe()
    logutil.LogFatal(err)
    err = cmd.Start()
    logutil.LogFatal(err)

    e.wg.Add(2)

    go e.writeToFFmpeg(inPipe)
    go e.streamToWebRTCTrack(outPipe)

    go func() {
        _, ok := <-e.closeSignal
        if !ok {
            e.markAsClosed()
            fmt.Println("Closing encoder")

            cmd.Wait()
            e.wg.Wait()
            close(e.encodedFrameCh)
        }
    }()

}

func (e *Encoder) writeToFFmpeg(inPipe io.WriteCloser) {
    defer e.wg.Done()

    for canvas := range e.canvasCh {
        if e.isClosed() {
            return
        }
        select {
        case <-e.encodedFrameCh: // Check if there's a backlog.
            e.debugger.ReportDroppedFrame()
            continue
        default:
            _, err := inPipe.Write(canvas.Data)
            logutil.LogFatal(err)
        }
    }

    inPipe.Close()
}

func (e *Encoder) streamToWebRTCTrack(outPipe io.Reader) {
    defer e.wg.Done()

    buf := make([]byte, 1024*8)
    for {
        if e.isClosed() {
            return
        }
        timestamp := time.Now()
        n, err := outPipe.Read(buf)
        if err == io.EOF {
            break
        } else if err != nil {
            logutil.LogFatal(fmt.Errorf("error reading from FFmpeg: %v", err))
            continue
        }

        e.encodedFrameCh <- &webrtcutil.Streamable{Data: buf[:n], Timestamp: timestamp}
    }
}
Enter fullscreen mode Exit fullscreen mode

Here is the final result:

It is fast

Some people call this "Blazingly fast".

Never be confident

I was genuinely happy at that point (doesn't happy often). I believed that any game could be developed using this updated structure. Now, the game's speed was only limited by the time it took to render a frame. I even made the frame renderer parallel and arrived at this final setup:

Cloud gaming architecture

It was time to test it and begin creating actual game elements, like gravity.

I aimed to design a basic game where, every time a player clicked the page, a new ball would appear. This ball would continue bouncing until it ran out of energy.

I build the logic (which is not very complex):


func NewBall(x, y int, radius float64, ground int, color *artemisia.Color) *Ball {
    return &Ball{
        Radius: radius,
        Position: Position{
            X: x,
            Y: y,
        },
        Ground:     ground,
        Elasticity: 0.8,
        Color:      color,
    }
}

const gravity = 1

func (b *Ball) Update(dt int64) {
    b.Velocity.Y += float64(gravity)
    b.Position.X += int(b.Velocity.X)
    b.Position.Y += int(b.Velocity.Y)
    if b.Position.Y >= b.Ground {
        b.Position.Y = b.Ground
        b.Velocity.Y = -(float64(b.Velocity.Y) * b.Elasticity)
    }
    if b.Position.Y == b.Ground && int(b.Velocity.Y) == 0 && b.StoppedAt == nil {
        now := time.Now()
        b.StoppedAt = &now
    }

    if b.StoppedAt != nil && time.Since(*b.StoppedAt) > 3*time.Second {
        b.IsDead = true
    }
}
Enter fullscreen mode Exit fullscreen mode

I would even remove the ball after it was dead!

And this is the result I got:

Laggy balls

My "masterpiece" of an engine choked on just two balls, and my spirits plummeted again.

After some research, I found out something pretty basic: the more detailed the frame, the longer it takes to encode. And circles? They're hard for the CPU to handle.

Luckily, there's a simple answer: "Hardware accelerated encoding." ffmpeg can do this, so I just changed my command to use a different method to convert my raw frames to H264.

Here is the update command:

ffmpeg -re -f rawvideo -pixel_format rgb24 -video_size %dx%d -framerate %v -r %v -i pipe:0 -pix_fmt yuv420p -c:v h264_videotoolbox -f h264 pipe:1
Enter fullscreen mode Exit fullscreen mode

The h264_videotoolbox encoder uses the GPU on Mac OS. For other operating systems, you might need to install different libraries.

Here is my Bouncing Balls game with hardware accelerated encoding:

Fast bouncing balls

Let them balls bounce!

Yeah, now I am not happy because I learned my lesson, but this is a smooth real time animation driven by user input and its really close to an architecture of a real game.

Conclusion

It works now, it runs at 60fps, it uses hardware accelerated encoding, and I think we can start thinking in an actual game. Stay tuned for the next posts where we will build a real game.

Top comments (0)