In this article we’ll introduce genetic algorithms by teaching a squirrel how to find food and shelter, then see how different fitness functions can influence its behavior. Along the way we’ll discuss concepts of genetic algorithms, the F# programming language, and important design considerations in artificial intelligence applications.
In this article, you’ll learn:
- The basics of genetic algorithms
- The role of crossover and mutation in genetic algorithms
- The nature and importance of fitness functions
- The power of various factors of a genetic algorithm
- The value of a functional programming language in a problem like this
- Aspects of working with sequences in F#
This is the 6th and final article in a series on creating a genetic algorithm in F# and .NET Core. While earlier articles were tutorials, here we’ll be focusing less on the code and more on the overall design.
If you only read one article in the series, I’d recommend this one.
If you’re curious about F# or the .NET Core desktop app I’m using to simulate this, take a look at the previous articles in the series:
- Creating a .NET Core 3.0 F# Console App
- F# Squirrel Brains: Adding Actors and Getting Functional
- F# Unit Testing – Refining the Squirrel Simulation
- WPF Core with F# Libraries
- F# Genetic Algorithm – Defining Squirrel Genes
While prior articles in the series have been more focused on individual steps and language features, this article is a higher level design view on the application as a whole.
All code is freely available via my GitHub repository.
The application involves a squirrel, a rabbit, a dog, an acorn, and a tree.
The dog remains stationary and kills the rabbit or squirrel if they enter next to it.
The rabbit moves around the map randomly.
The squirrel must get the acorn and make it back into its tree in order to not starve.
All of this is implemented simply and takes place on a small square game grid.
The application is hosted in a WPF Core C# user interface that allows you to advance to the next generation or visualize any member of the current generation. This user interface talks to an F# .NET Standard class library containing the application logic.
When I tell people about this project, I typically get one of a few reactions:
- “Wait, you’re building what?”
- “Why on Earth would you build this?”
- “What is wrong with you?”
- “Dude! That’s pretty cool!”
So, long story short, I originally built this application back in 2003. I was living in a rural area and had a lot of squirrels in the area. I happened to be reading about genetic algorithms and one thing led to another and… yeah, I came up with a very inefficient and clunky application in Windows Forms.
The simulation was a massive success and it managed to profoundly surprise me by coming up with solutions I couldn’t have dreamed up.
I’ve since lost the code, but I wanted to redo the application as a more experienced developer and learn the F# programming language in the process.
So, what IS a genetic algorithm? While I defined it in more detail in my prior article, here’s the definition I’m going with:
A Genetic Algorithm is a form of artificial intelligence that looks to emulate patterns in genetics to come to an optimal solution to a given problem as judged by some measurable criteria.
So, when it comes to our problem domain, we’re trying to find a perfect set of digital genes that produce a squirrel that can seek out an acorn, then get safely to its tree before it stares to death or encounters the dog.
The genes for a particular candidate squirrel are referred to as its chromosome.
In genetic algorithms chromosome is just a collection of genes representing a single member of the overall population.
For our example, the squirrel’s “brain” takes in information around surrounding tiles and determines how attractive each tile is based on a number of factors.
The simulation provides the values to the algorithm, but the weight or importance of each factor is determined by the chromosome.
Here’s a sample squirrel chromosome that performs fairly well:
Note that what we’re doing in this chromosome is looking at the positive or negative appeal of a factor. For example, this chromosome causes the squirrel to run away from dogs and pursue acorns and rabbits.
We can represent this structure in F# via the following code:
Note that the value of each individual gene is just a double. In this case, specifically, it’s a value between -1 and 1 with -1 meaning revulsion and 1 meaning total magnetism.
We also capture the age of each chromosome so that we can track data on individual members of the population as they survive from generation to generation.
Before we move on, I should stress how important your choice on what genes are included in your chromosome is. Each gene is effectively another piece of information your entity has available to it during its optimization process.
If you add in worthless data, that will just be noise that slows down convergence on a solution. On the other side, if you severely limit the types of data that are fed into the genetic algorithm, you can prevent the algorithm from finding a solution at all or cause it to find a less optimal solution.
As an example, I found that in order to replicate an outcome I encountered back in college that I needed to tell the squirrel if a tile was next to a rabbit or a dog. Additionally, while I removed a random number factor from the genetic structure in order to support a debugger visualizer, back when it was in the application, early squirrels relied heavily on randomness to find things.
Let’s look at how this squirrel sees the world in a sample scenario.
Here the squirrel is evaluating the 8 tiles around itself, plus whether or not to wait in its current spot. It does this by measuring the relevant factors of these tiles and plugging in the importance of each factor.
The result is a numerical score for the tile, which can be compared to the other tiles surrounding it. The squirrel will always choose the best tile as judged by the measurements and its chromosome.
This can be visualized in a heatmap as follows:
Here the squirrel will prefer the darker tiles to the lighter ones due to an arbitrary coloring choice I made while implementing this visualizer.
Think of it like a marble on a hill. It’s going to roll downhill towards the most attractive point following the curvature of the hill. As things move around the grid or are added or removed, the attractiveness will change, but at its core, the squirrel is just following a course of a sequence of most attractive points.
Genetic algorithms don’t just involve one organism. We structure genetic algorithms in terms of a larger population of varied entities. This population changes over time in the form of generations.
Over time, the best chromosomes live on and produce offspring while those that perform poorly are replaced by children of better chromosomes or completely random chromosomes to keep the gene pool fresh.
Choosing a population size is an important consideration. If you pick a small population, there might not be enough diversity to get a good result in a short period of time. Conversely, if your population size is too large, it will drastically slow down your calculations.
I’ve found that this simulation is the most manageable anywhere from 10 – 25 squirrels, but part of that is due to the simple nature of the problem.
Another thing I should note here is that each time the simulation runs, it uses the same random number generation sequence so that random behavior will be consistent from squirrel to squirrel and from generation to generation.
So, now that we understand how an individual squirrel thinks, let’s take a look at how the squirrel is graded.
The simulation will run a squirrel through a series of turns simulating its lifetime. If the squirrel gets eaten by the dog, that simulation stops. Likewise, if the squirrel makes it to the tree with the acorn, the simulation will stop as well.
After the simulation completes, the squirrel is scored via what is called a fitness function. This function gets a snapshot of every turn in the simulation and comes up with a single numerical score for how well that squirrel chromosome did.
Fitness Functions are one of the most impactful parts of building a genetic algorithm.
What you measure – and how you measure it – drastically shapes the behavior of the algorithm.
In general you want to have a large reward for finding a solution and some gradual rewards for how well that solution fits. You can also penalize poor solutions.
Think about a problem involving sailing a ship from New York City to Europe. You don’t want to only give out points for the first ship to reach Europe, or you’d never reward the traits that help get closer to a solution. Instead, you’d reward for getting closer and closer to Europe. You’d also want to penalize ships that ran aground or went to Africa or South America.
In the realm of our squirrel problem, I initially set up the simulation to reward the squirrel for getting the acorn and reward again for getting to the tree with the acorn. I also penalized getting eaten by the dog or starving to death.
But then I decided to reward squirrels that stayed alive but didn’t win by giving them a point per turn alive (I also subtract the turns used for winning solutions to encourage a quick solution).
What I found was that the extra bonus for staying alive resulted in squirrels that tried to move randomly around the map in order to stay alive as long as possible and rack up points. I later reduced this reward and then eliminated it entirely.
What you measure matters. It encourages certain types of behavior just by the rules and rewards you set. If I reward squirrel’s getting killed, I can expect squirrels that seek out the dog like missiles to evolve. If I reward longevity, I should expect to see squirrels who ignore the tree entirely or take an inefficient route to their destination.
Here is the fitness function I eventually settled on:
Now that we know how an individual squirrel is graded, let’s talk about generations.
Building on our discussion from last article, after every squirrel in a generation is simulated and evaluated by the fitness function, they can be ordered by score.
From there, the elite top performers of the generation will live on to the next generation while the less optimal solutions die out to be replaced by children of the elite or completely random chromosomes.
The random chromosomes are simple – we just build a random value for each gene in the chromosome like so:
I should point out here some of this F# syntax briefly.
In line 4, we’re creating a sequence of 7 new elements. We then define a function to define what each of these values will be. In this case it calls to
getRandomGene to build a value between -1 and 1.
Finally, we pipe the sequence of genes into
Seq.toArray which transforms it to an array like other parts of our application prefer.
Okay, so that’s the random chromosome – what about the children?
Essentially, we choose two top performers and then merge their genes together into a child. While we could simply take a gene from each parent, I thought simulating crossover a bit closer to a genetic way would be fun.
What we do is start by taking genes only from one parent. After we cross a random threshhold, we cross over to the other parent’s genetic sequence and take the rest of the genes from that point onwards. We do allow for identical copies of parents as well by offering the algorithm the possibility of taking entirely one parent or the other.
The code for this crossover looks as follows:
Note the use of
Array.mapi2. This is an F# function that takes two identically-sized arrays and iterates over them allowing you to run a function off of them including the index of the two items. Here our function just checks if we’ve hit the point of crossover yet.
Finally, after we have our candidate gene sequence, we introduce the chance of mutation. Mutation modifies values of various genes, allowing the algorithm to break out of local maximums – solutions that are good, but not the best available solution.
Mutation is fairly simple and relies on the
Array.map function to take an input value (the gene value) and transform it into a new value.
Also note here that we’re using
min to constrain the results to the -1 to 1 range.
Alright, enough discussion – let’s see some results. How did the squirrel do?
Each simulation is different, but the simulation is simple enough that I am frequently able to see very effective squirrels within 10 generations.
Often early successful squirrels will be more interested in the rabbit than I would have expected and effectively follow the rabbit around the map until they reach an interesting destination. Sometimes this works in reverse with squirrels fleeing the rabbit until they accidentally bump into an acorn.
Eventually, a squirrel that reliably goes towards the acorn and then heads back to the tree is found.
Once this type of squirrel emerges, progress slows down significantly as future squirrels only surpass it if they find a slight way to make the route faster or if the set of game worlds the squirrel is being evaluated with changes.
Okay, so what’s with the title? These squirrels only seem interested in survival.
Once I saw how a fitness function could produce a squirrel capable of gathering an acorn and going to a tree without encountering the dog, I wondered what different fitness functions could bring about.
Certainly it’s easy to make a fitness function that rewards squirrel death and encourages squirrels that charge to their death at the dog, but that’s not terribly surprising.
No, what I decided to do was take the survival fitness function and tweak it just a little bit so that the squirrel also got bonus points if the rabbit dies.
Remember that the squirrel cannot attack the rabbit in any shape or form. The squirrel should not be able to impact the rabbit at all, but I decided to simulate it.
This is where genetic algorithms wowed me.
Even though I thought a solution was impossible, the application had enough data for it to discover the solution on its own.
Take a look:
Here we have a squirrel that runs right to the rabbit and stays next to it. By remaining next to the rabbit and generally on the side of the rabbit away from the dog, the squirrel is actually subtly herding the rabbit towards the dog by taking away movement options and biasing the rabbit towards movement in the other direction.
Put another way, the squirrel wakes up and decides to herd the rabbit into the dog. Once the rabbit is out of the way, the squirrel’s survival instinct takes over and it runs for the acorn and then the tree.
All of this was possible by using this fitness function:
I don’t know about you, but a program that can surprise me with a solution I didn’t even realize was possible is something that blows my mind.
I set out this series to create a small app to replicate the attack squirrel behavior that I originally encountered on accident back in 2003.
Surprisingly, I had a lot of trouble replicating that result, even when I was steering towards it.
While this time I knew that such a solution was possible, a combination of things slowed me down from getting an application that could come to that solution:
- I hadn’t given the squirrel genes to tell if a tile was next to the rabbit or next to the dog.
- There was a small bug in my distance calculation logic causing the squirrel to pay less attention to near objects
- The fitness function excessively rewarded squirrels that stayed alive and waited for rabbits to accidentally stumble into the dog
All of this comes back to show you that when you’re building a genetic algorithm, you need to think strongly about your fitness function as well as the genetic structure of your entities.
Proper testing and ensuring that individual functions are working adequately is important as well since visualization is hard.
Speaking of functions, I found that F# was a language well suited for the problem I was trying to solve. In particular, the immutability it encourages went very well for allowing me to visualize a simulation at any state in a turn.
Additionally, being able to chain together functions and compose different functions via partial application in F# helped me reuse or adapt small bits of polished and proven logic.
If you’d like to play more with my code, take a look at the master branch on the GitHub repository.
If you have additional questions, leave a comment or get in touch with me.
I also recommend you look around for books and videos on genetic algorithms. While I don’t have a modern one to recommend, there’s a great deal out there.
Finally, if you build something awesome or something surprising, please let me know. I’d love to hear about it.