BrendanLoBuglio

Posted on May 23, 2019 • Originally published at freeformlabs.xyz

Welcome to Rabbit Hell! Reliable AI Locomotion with TDD

#gamedev

Test-driven development (TDD) is a software workflow where code is written alongside small "unit test" programs that quantify whether each component is working. While automated unit tests are hard to apply to game development due to randomness, 3D interaction, and unpredictable player input, we were able to utilize a TDD workflow to write stable and regression-proof creature locomotion for ElemenTerra.

The Basics of Test-Driven Development

Those experienced in TDD can skip this section; for everyone else, here is a basic primer.

Let’s imagine we’re coding a custom function that adds two numbers. In a traditional workflow we would just write it the way we think it should work and move on. But to use TDD, let’s instead start the process by making a placeholder function and some unit tests:

int add(int a, int b){
 return -1;
}
// Our unit tests that throw errors unless “add” produces correct results:
void runTests(){
 if (add(1, 1) is not equal to 2)
   throw error;
 if (add(2, 2) is not equal to 4)
   throw error;
}

Initially, our unit tests will fail because our placeholder function returns -1 for every input. But now we can go on to correctly implement add, and have it return a + b; all our tests will succeed! This may seem like a roundabout way to do things, but there are a few advantages:

If we didn’t get enough sleep and write add as a - b, our tests still fail and we’ll immediately know to fix the function. Without the tests we might not catch our mistake, and experience strange behavior that takes time to debug later on.
We can keep our tests around and run them every time we build our code. This means that if another coder accidentally changes add in the future, they’ll know immediately that they need to fix it because the tests will once again fail.

This is all unnecessary for this simple example, but with complex features like predictable state-machine behavior (after eating 100 food, is isFull true?) TDD saves time and improves a program’s stability.

TDD Application in Game Development

There are two problems with TDD in game development. First, many game features have subjective goals that defy measurement. And second, it’s difficult to write tests that cover the entire possibility space of worlds full of complex interacting objects. Developers who want their character movement to “feel good” or their physics simulations to “not look jittery” will have a hard time expressing these metrics as deterministic pass/fail conditions.

However, I believe that workflows more loosely based on TDD principles can still be applied to complex and subjective features like character movement, and in our game ElemenTerra we did just that.

Unit Tests vs Debug Levels

Before I get into my TDD practice, I want to make the distinction between an automated unit test and a traditional "debug level." It’s a common practice in gamedev to create hidden scenes with contrived circumstances that allow programmers and QA professionals to witness specific events.

A secret debug level full of different objects in The Legend of Zelda: The Wind Waker. Image Source

We have many of these In ElemenTerra: a level full of problematic geometry for the player character, levels with special UIs that trigger certain game states, etc. Like unit tests, these debug levels can be used to reproduce and diagnose bugs, but a few aspects separate the two:

Unit tests divide systems into their atomic parts and evaluate each individually, whereas debug levels test features on a more holistic level. After observing a bug in a debug level, developers still may need to search for the point of failure manually.
Unit tests are automated and should produce deterministic results every time, whereas many debug levels are “piloted” by a player. This creates variance between sessions.

None of this is to suggest that unit tests are strictly superior to debug levels; in many cases the latter is a more practical tool. But I also believe that unit testing is underutilized in game development, and should be explored further with systems to which it is not traditionally applied.

Welcome to Rabbit Hell!

In ElemenTerra, players use mystical nature powers to rescue the creatures hurt by a cosmic storm. One of those powers is the ability to create pathways out of the ground which guide creatures to food and shelter. Because these pathways are dynamic player-created meshes, the creature locomotion needs to handle strange geometric edge cases and arbitrarily complex terrain.

Character movement is one of those nasty systems where "everything affects everything else"; if you’ve ever implemented such a system, you’ll know that it’s very easy to break existing functionality when writing new code. Need the rabbits to climb small ledges? Fine, but now they’re jittering up and down slopes! Trying to get your lizards to avoid each others’ paths? It looks like that works, but now their normal steering is all messed up.

As the person responsible for both the AI systems and most of the gameplay code I knew I didn’t have a lot of time to be surprised by bugs. I wanted to immediately catch regressions as they came up, and so Test-Driven Development seemed appealing. The next step was to set up a system where I could easily define each creature locomotion use case as a simulated pass/fail test:

This "Rabbit Hell" scene is composed of 18 isolated corridors, each with a creature body and a course designed to be traversable only if a specific locomotion feature is working. The tests are considered successful if the rabbit is able to continue indefinitely without getting stuck, and considered failures otherwise. Note that we’re only testing the creatures’ bodies ("Pawns" in Unreal terms), not their AI. In ElemenTerra creatures can eat, sleep, and react to the world, but in Rabbit Hell their only instructions are to run between two waypoints.

Here are a few examples of these tests:

1, 2, 3: Unobstructed Movement, Static Obstacles, and Dynamic Obstacles

10: "Navmesh Magnet" failsafe for floating creatures

13: Reproduction for a bug where creatures would infinitely circle around nearby targets

14 & 15: Step-Up ability on flat and complex ledges

Let’s talk about the similarities and differences between this implementation and "pure" TDD.

My system was TDD-like in that:

I started features by making failed tests, and then wrote the code needed to pass them.
I kept running old tests as I added new features, preventing me from pushing regressions to source control.
Each test measured only one feature in the system, allowing me to quickly pinpoint issues.
The tests were automated and did not require player input.

Unlike Strict TDD:

There was an element of subjectivity in evaluating the tests; while true pathing "failures" (did not get from A to B) could be detected programmatically, things like position popping, animation syncing issues, movement jitters, and whether the steering “looked smooth” required human evaluation.
The tests were mostly but not completely deterministic. Random factors like framerate variation caused small deviations, and a few courses had dynamic elements with randomized timing. Overall, creatures still usually followed the same paths and had the same successes/failures between sessions

Limitations

Using TDD to write the ElemenTerra creature locomotion was a huge boon for our schedule, but my approach did have a few limitations:

The unit tests evaluated each movement feature in isolation, and so bugs with combinations of multiple features were not encompassed; unfortunately I didn’t have time to make 18² courses! Thus, my unit tests sometimes had to be supplemented by traditional debug levels.
ElemenTerra has four creature species, but the tests only have rabbits. This is an artifact of our production schedule; the other three species were added much later in development. Fortunately all four share the same movement capabilities, but the larger body of our Mossmork caused a few problems. If I had to do this all again, I would make the tests dynamically spawn a chosen species instead of using pre-placed rabbit bodies.

This galloping Mossmork requires a little more room to turn than a rabbit

Evaluation: Is Test-Driven Development Right for You?

As developers it can be tempting to put a little too much work into testing scenes the player will never appreciate; I won’t deny that I had a lot of fun building Rabbit Hell. Internal features like this can be a big waste of work and jeopardize milestones, so we need to take a hard look at when & whether a given feature warrants a unit testing apparatus. Below I’ve identified a few criteria that, in my eyes, justified TDD for ElemenTerra’s creature locomotion.

1. Are your test cases time-consuming to produce manually?

Before spending time on automated testing, we need to check whether we can evaluate a feature with the regular game controls. If you want to make sure your keys unlock doors, spawn a key and unlock a door with it! Creating unit tests for this feature would be an irresponsible time sink because it takes only seconds to test manually.

2. Are your test cases difficult to produce manually?

Automated unit tests become justified when there are known, difficult-to-produce edge cases. Rabbit Hell course #7 tests creatures stepping off ledges, a circumstance their AI works very hard to avoid. Course #12 simulates the navmesh desyncing from the floor geometry, which only occurs during extreme lag. Such situations can be difficult or impossible to contrive with the game controls, while our tests produce them effortlessly.

3. Do you know the desired results won’t change?

Game design is all about iteration, and the goals of individual features may change as your game is redesigned. Even small changes in intent can invalidate the metrics by which you evaluate your features, and thus any unit tests along with them. While the creatures' eating, sleeping, and player interaction behaviors underwent several redesigns, we always needed them to get from point A to point B. Thus, the locomotion code and its unit tests remained valid throughout development.

4. Are regressions likely to go unnoticed?

Maybe you’ve been in this situation: you’re wrapping up one of your final tasks before shipping a game, and suddenly you find a game-breaking bug in a feature you finished ages ago. Games are giant interconnected systems, and so it’s only natural that adding New Feature B might break Old Feature A.

This isn’t so bad when the broken feature is ubiquitous, like the player’s ability to jump. If your core mechanic breaks, you’re bound to notice immediately. But the bug might slip under the radar if the broken feature is less frequently observed, like what happens when the player steps into a narrow crevice. Bugs detected in late development can jeopardize your schedule, and post-launch they hurt your player experience. Thus, unit tests can be great tools for maintaining edge case behavior, but are often redundant for functionality which already gets a lot of incidental testing.

5. What’s the the worst-case cost of having tests, and what’s the worst-case cost of not having tests?

Setting up a testing apparatus is a form of risk management. Let’s imagine that you’re deciding whether to buy insurance for a vehicle. The three questions you need to answer are:

How much do the monthly premiums cost?
How likely is it the vehicle will be damaged?
How expensive would the worst-case scenario be if you were uninsured?

For TDD we can imagine the monthly premiums as the production cost of maintaining our unit tests, the likelihood of vehicle damage as the likelihood of our feature breaking, and the cost of fully replacing the vehicle as the worst-case scenario for a regression bug.

If a feature’s tests take a lot of time to create, the feature is uncomplicated and not likely to be changed, or it would be manageable if it broke in late development, then unit tests may be more trouble than they’re worth. If the tests are simple to put together, the feature is volatile and interconnected, or its bugs would cost a lot of production time, then tests can help keep us on schedule.

The limits of automation

Unit tests can be a great supplement to catching and reducing bugs, but I want to emphasize that they don’t replace the need for professional quality assurance on large-scale games. Proper QA is an art that requires creativity, subjective judgment, and excellent technical communication, which means that you need skilled and well-taken-care-of humans!

Testing the Waters

While not the right choice for every circumstance, Test-Driven Development is a powerful tool that can and should be applied to more game development contexts. Let’s expand the horizons of what and how we test!

If you have opinions on the points made in this article or have stories of using TDD in your own games, please shoot us an email. If you liked what you read and are looking for a skilled group of developers to build, improve, or help finish your game, you can get in touch with Freeform Labs here. Until next time!

Freeform Labs, Inc. is a games and software development team with a commitment to providing the highest quality of digital craftsmanship, and a mission to inspire learning and creativity through cutting-edge experience design. The team is centrally located in Los Angeles, and maintains a network of trusted partners across the globe. With specialized experience in VR, AR, Game AI, and more, Freeform Labs provides an array of services including consultation, work-for-hire software development, and on-site "ship it" assistance. The company's portfolio includes work for Microsoft, Disney, and Starbreeze - as well as award-winning original content.

DEV Community