DEV Community

Cover image for Building a Cost-Effective Valheim Server on Azure with Serverless Discord Bot Integration
Rodolfo Albuquerque
Rodolfo Albuquerque

Posted on

Building a Cost-Effective Valheim Server on Azure with Serverless Discord Bot Integration

In this blog post, I'll walk you through how I built a cost-effective Valheim game server on Azure, complete with a Discord bot that lets players start and stop the server using slash commands. The setup leverages Azure's serverless capabilities and spot instances to minimize costs while providing flexibility and scalability.

Intro

This GitHub repository contains the infrastructure code and Discord bot code for the Valheim game server. The primary goal is to create a server as cost-efficiently as possible by utilizing:

  • Azure Virtual Machine Scale Sets (VMSS) with spot instances for compute.
  • Azure File Share for persistent game server data.
  • Azure Functions for event-driven automation through Discord slash commands.

Azure Functions were chosen for their cost-effectiveness, offering 1 million free executions per month. However, this choice introduces some complexities, which I'll discuss later.

Architecture

Interactions and Reactions

The system's interaction flow starts with Discord slash commands, which are handled by an HTTP-triggered Azure Function (interactions). Discord requires a response within 3 seconds, so the API's only responsibility is to enqueue the command in an events queue and quickly respond.

Another queue-triggered Azure Function (reactions) picks up the command, performs the requested task (e.g., starting the server), and reports back to Discord.

Here’s the sequence diagram for the start command:

sequence diagram /start command

Game Events

To enhance the experience, the solution monitors Valheim game server logs and reports events such as:

  • Server availability for connections
  • Player connections
  • Player disconnections

This is achieved with a script configured with cloud-init.yml that runs on the VM. The script listens to the container logs, extracts relevant log lines, and enqueues them in the events queue.

Here's how the flow looks:

flowchart game events

Persisting State

To maintain the server state, I chose Azure Table Storage for its simplicity and cost-efficiency. The following attributes are persisted:

  • ip (server IP address)
  • online_players (number of players currently online)
  • status (e.g., running, stopped)

Given Azure Functions can execute in parallel, I implemented optimistic concurrency control using ETags. This ensures that if multiple events are processed simultaneously, only the first write succeeds, and retries handle the rest. Retries are built-in with message dequeue counter (5 max) and configured with a visibility timeout to allow time for state reconciliation.

Azure Function OS and Language Choice

One of the biggest challenges was ensuring the bot responded within Discord's 3-second timeout, even during cold starts.

  • Initial Setup: I started with a Python bot on a Linux Azure Function. However, cold starts frequently caused timeouts.
  • Switch to Go: I migrated to Go, known for its faster performance. Surprisingly, deploying the Go bot on a Windows Function App yielded significantly better cold start times compared to Linux.

To quantify this, I tested the following setups using a Postman monitor:

  • Python on Linux
  • Go on Linux
  • Go on Windows

Here are the results:

Setup Average (s) P95 (s) P99 (s)
Python 10.43 12.02 13.35
Go (Linux) 5.84 7.77 7.64
Go (Windows) 1.07 1.70 1.85

This trial revealed that Go on Windows offered the best performance for this use case.

Possible Improvements

While the setup works well, there’s room for improvement:

Spot Instance Risks: Spot instances can be preempted, risking progress loss if the game server isn’t stopped gracefully. A solution involves monitoring Azure's scheduled events endpoint. Upon detecting a preemption event, the VM can:

  • Stop the Valheim server (docker stop valheim-server) to send a SIGTERM, triggering a world save.
  • Restart the server in a different zone or instance size.

Additional Features:

  • Automating backups of the game world.
  • Adding more granular state persistence, such as player-specific data.

Top comments (0)