After finishing with my Coursera specialization on deep learning I wanted to pick up a simple enough machine learning project to sink my teeth in. I wanted something that would be simple enough to formulate and something I can relate to.
So I decided on predicting match results for my favorite MOBA, Vainglory.
Table of Contents
- The problem
- A little more context
- The dataset
- Training the Model
- Initial results
- Cleaning data
- Later results with clean data
- Difference (or lack thereof) between training with and without talents
- What did we learn?
- Next steps
The problem
Predicting talent effectiveness
I decided to focus on one mode in particular, "Aral", a casual mode which picks heroes at random for the players and the players then have a selection of talents to pick from. The goal is to destroy the other team's base while keeping your own still standing.
To put it simply, I want to answer the following question:
Which talent gives me the best odds at winning a specific match?
With "specific match" defined as the roster of already selected heroes from both teams.
A little more context
What is a MOBA
A MOBA, Multiplayer Online Battle Arena, is a video game genre popularized (maybe even originated) by DOTA, Defense of the Ancients, a mod of Warcraft 3.
The mod was so popular that it spawned quite a large number of follow up games, most notably DOTA 2 and League of Legends. On the mobile side, Vainglory and Arena of Valor have been pretty successful and personally, lacking a good PC setup, I flocked to these mobile ones.
The game genre plays a little bit like this:
- there are two teams on a map
- each team has a base and some defenses
- the first team that destroys the other team's base wins
What is a hero
Before a match begins, each player has to choose their hero. Each hero has specific traits and abilities that differentiate them from the rest. They differ by movement speed, attack speed, damage output, defenses, special abilities, passive traits and more.
For example, the hero I play the most on Vainglory is Lyra. She is a mage with healing and protective magic. She has a ranged basic attack that slows targets and not much in the way of personal defense mechanisms.
Her usual role is one of the Captain, which basically means she's best at supporting other heroes be their best and keeping them alive.
What is an ability
Each hero in Vainglory has 1 perk and 3 abilities.
The perk is often tied to their basic attack. For example Lyra's perk, Principle Arcanum, adds a second hit to her basic attack, making it more powerful and slowing down her target.
The abilities are usually activated by the player to provide an effect and have a cooldown and maybe some sort of cost (like mana, stamina, rage etc).
For example Lyra has these 3 abilities:
- Imperial Sigil: she scribes a sigil on the ground that heals allies and damages enemies
- Bright Bulwark: she puts up a bubble which prevents enemies from using movement-based abilities
- Arcane Passage: she creates a teleportation tunnel between two points
The basic attack, the perk and the abilities are the main ways that a hero affects their environment and the course of the match.
What is a talent
In select modes of Vainglory, namely Aral and Blitz, the players may select one of 3 talents for their heroes. These talents are unique to each hero and affect how their hero plays.
For example here are Lyra's talents:
- Twin Missile: trades the slowdown effect of her basic attack for more damage
- Mobile Bulwark: makes her bubble follow her instead of remaining on the spot where she cast it
- Gythian Ward: grants a small barrier and cleanses all debuffs to allies within its perimeter
These talents are usually a trade between an advantage and a disadvantage. They can also be upgraded to improve their effectiveness.
The dataset
My data consist of matches already played along with their results.
Three-hot vector for teams composition
To model the hero composition of each team, I ended up using a three-hot vector. (I have no idea if this is a real thing)
What is three-hot?
A one-hot vector is a vector (with vector being essentially an ordered list of numbers) where all values are 0 and only one of them is 1. It looks a bit like this:
vector = [0, 0, 0, 1, 0, 0, 0, 0, 0]
The idea behind a three-hot vector is that it's like a one-hot vector but with three positions set to 1 instead of only one. This makes it possible to represent a set of 3 unique values. It looks like this:
vector = [0, 1, 0, 1, 1, 0, 0, 0, 0]
Another way to explain this is with type theory. In this case, the one-hot vector can represent an enum and the three-hot vector can represent a unique set of exactly three enums.
Why three-hot?
In our case, a team's roster can only contain one of each hero. It's impossible for example to have a team that consists of two Lyras and one Idris.
Additionally, the heroes are not ordered in the team.
This allows us to express one team's roster with one three-hot vector whose size is the amount of available heroes (45 at the time of writing).
Talents one-hot vector
When deciding how to structure the hero talents in the dataset, I had a few options.
Since each hero has 3 talents (rare, epic, legendary), I could choose a softmax 3 activation for the output node. I decided against this because it felt at the time as too complicated for a model to learn all the associations.
Another approach would be to model the output again as softmax but to include all talents of all heroes (135 in total). This also felt suboptimal because there will always be 132 talents that are unavailable to the player.
Instead, I went with a trick I learned in the Coursera course. I put the list of talents in the input and asked the model to predict the possible outcome given a selected talent. This way I'll have to make 3 predictions each time to answer my question of "which talent?".
So, I ended up representing the talents as a one-hot vector of 45x3+1=136 positions (45 heroes x 3 talents each + 1 "no talent selected"). Note that it is possible for the player to not have unlocked any talents of their chosen hero. This is the case where the "no talent selected" position would be flipped to ON.
Sidenote: on second thought, the "no talent selected" position on the talents vector could probably be eliminated. I'll have to perform an experiment or two to validate this.
Label experiments (tanh, sigmoid, softmax)
For the labels I had no idea how to model them so I run a few experiments.
Value range | Output layer activation function |
---|---|
(-1, 1) | tanh |
(0, 1) | sigmoid |
2x(0, 1) | softmax |
Of the above, the softmax
approach resulted in the best results (as measured by accuracy of the validation set) so I stuck with it. The difference was small and I can only guess that this was partly in the way Tensorflow (the Keras backend I'm using) is optimized.
Putting them all together
To generate the input from one match, I took the match and for each hero I defined:
- my team as a three-hot vector
- other team as a three-hot vector
- my talent as a one-hot vector
- the verdict from the perspective of the hero as a number from 1 (win) to -1 (lost)
and stacked the vectors to produce a 226-sized vector.
For example here is a pre-stacked data entry:
{
"matchID": "ab137e32-e381-11e8-a4e9-02b7582ce766",
"x": {
"ours": [
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0
],
"theirs": [
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0
],
"myTalents": [
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
]
},
"y": 1
}
Which represents a match between "our" team with Inara, Petal and Varya and "their" team with Kinetic, Ringo and Skaarf. The selected talent is Petal's "Bounce" and the verdict was a win for our (Petal's) team.
Because we compile each data entry from the perspective of one hero/player, we end up with 6 entries per match (2 teams x 3 heroes).
API for fetching matches
To compile my dataset I made use of the Vainglory API. It has a handy endpoint to fetch latest matches and then another to fetch the telemetry of a match (which includes the talents selected for each hero). With a suite of little scripts, running on a remote VM, I managed to get about 30k data entries per day.
Why ScaleWay VM
Initially, I was fetching the data directly to my laptop. This quickly got old as I was generating gaps in the fetching schedule because of putting the laptop to sleep often.
To keep the data fetching going, I got a cheap VM by ScaleWay. At $2 per month, it can't get much cheaper and it allows me to fetch data and compile my dataset 24/7.
Split to train/dev/test
To begin training, I split my dataset in 3 parts:
- A 10k test set, to be used after all training has been performed to evaluate the model.
- A 10k dev set (also called validation set sometimes), to be used on every training epoch to evaluate the training progress.
- The rest as a training set, used to train the model.
Training the Model
All of my code can be found at https://gitlab.com/gademo/vainglory-stats. The code makes a few assumptions on where things reside. Feel free to fork and edit and run your own experiments.
Why Keras
I opted for Keras on this one because I wanted to focus more on the process of training and evolving a model and less on the details of optimizing it. Keras allows me to define and train a model with very few lines of code:
# a few imports...
X = Input(shape=(n_x,))
Y = Dense(1024, activation=relu)(X)
Y = Dropout(0.1)(Y)
Y = Dense(256, activation=relu)(Y)
Y = Dropout(0.1)(Y)
Y = Dense(n_y, activation=softmax)(Y)
model = Model(inputs=X, outputs=Y)
model.compile('Adam')
model.fit(
X_train, Y_train, # the training set
epochs=num_of_epochs,
validation_data=(X_dev, Y_dev) # the dev set
)
With the above snippet, Keras will train the model and print useful metrics like accuracy and loss on every epoch.
The entirety of the training script can be found on Gitlab.
Why FloydHub
Like with compiling the dataset, I also wanted to decouple my laptop being on and training the model. FloydHub is a pretty simple way to achieve this. After setting up an account and installing their CLI, training the model is remarkably simple:
floyd run --data vrinek/datasets/casual-aral:casual_aral "python train.py"
That --data
argument comes from having previously uploaded the compiled dataset to FloydHub with:
floyd data init vrinek/casual-aral # one time setup
floyd data upload # every time the dataset updates (about daily)
It may be worth mentioning here that FloydHub come with a few hours of CPU and GPU usage for free and more can be purchased.
Initial results
Results at 99% accuracy
So, I have my dataset, I shuffle it, split it in training/dev/test sets and let my training script train the model.
After some experimentation I was able to hit 99% accuracy (as measured on the test set) with a neural network of 2 hidden layers: 1024 ReLU units and 256 ReLU units. Each unit was followed by a 10% dropout layer.
The whole model looked like this when written in Keras:
X = Input(shape=(n_x,))
Y = Dense(1024, activation=relu)(X)
Y = Dropout(0.1)(Y)
Y = Dense(256, activation=relu)(Y)
Y = Dropout(0.1)(Y)
Y = Dense(n_y, activation=softmax)(Y)
Triumph giving way to skepticism
Results of 99% accuracy are a bit odd for this kind of problem. And even more odd for a first time researcher.
Taking a good look at the results, I noticed that for a set roster, varying the talent only affected the result by ~5%. This was not what I was expecting.
To verify I re-trained the model, this time without the talents. I realized that taking out the talents portion of the input data did not change the results, accuracy stayed at 99%.
Skepticism intensifying... ๐ค
Matches present in both train and dev/test sets
Digging a little deeper, I realized that shuffling the dataset was a mistake. Because each match result is represented in 6 different data entries, shuffling the dataset ended up spreading the data of each match across the training, dev and test sets.
In other words, I was validating my model on the same data that I was training it on.
Cleaning data
My obvious next step was to clean up this mess.
Assertions
My first priority was to introduce some assertions so this does not happen again (I'm used to TDD so this made absolute sense to me).
training_set_ids = set([match['matchID'] for match in training_set])
dev_set_ids = set([match['matchID'] for match in dev_set])
test_set_ids = set([match['matchID'] for match in test_set])
# Assert that the three sets do not overlap
assert(len(training_set_ids & dev_set_ids) == 0)
assert(len(training_set_ids & test_set_ids) == 0)
assert(len(dev_set_ids & test_set_ids) == 0)
Omit shuffling
Obviously, with these lines in place, my training script was failing which only confirmed my earlier observations. The simplest way to fix this was to omit shuffling my data (and also tweak the size of the sets to a multiple of 6).
Treating data as time-series
I wasn't particularly happy to omit shuffling my data. Trying to debate it with my rubber duck companion, I recalled that when a model is trained on a time-series of data (eg meteorological or financial) the data is not shuffled. In those cases, it's important for the model to be able to predict the future given the past.
In this case, the argument was pretty weak though. Each match is pretty much independent of the previous ones. It is dependent on the previous matches of the players partaking in the match but our model is not built to accommodate this knowledge.
Nevertheless, this debate was enough to let me rest for a little and lower the priority of safely shuffling my data. Not shuffling would suffice for now.
Later results with clean data
Compared to baseline accuracy of 50%
Re-training the model on the training set (this time with 500k data entries, ~83k matches), the results were pretty bad:
- Dev set accuracy hovered around 56% (assuming that a "pick a random number" baseline algorithm would succeed at 50%)
- dev set loss steadily growing while train set loss shrinks
Difference (or lack thereof) between training with and without talents
I also compared results between training the model with and without talents:
- https://www.floydhub.com/vrinek/projects/vainglory-stats/31 is the model trained with talents
- https://www.floydhub.com/vrinek/projects/vainglory-stats/30 is the same model, trained without talents
The difference in accuracy is marginal on the dev set. Even on the training set there does not seem to be any benefit when adding the talents to the dataset (80% vs 79.3%).
Contribution of talents in first layer
In order to verify the contribution level of the talents, I plotted the weights of the first layer as a heatmap:
With this visualization, it was pretty easy to spot the difference. 0-89 on the Y axis represents the hero feature weights and 90-225 represents the talent ones. On average it looks like the hero-specific features are fitted to more extreme weights than the talents.
Some random sampling of the results validates this. Picking one data entry and varying the selected talent usually affected the output value by up to 5%.
What did we learn?
In the end, my motive to work on this problem had more to do with gaining hands-on experience on a machine learning problem top-to-bottom and less with actually solving the problem.
So, what did I learn?
About the problem
The model did not learn much. This could mean a few things:
The model's architecture cannot fit the data
This could be a possibility. Given though that the same architecture managed to learn all its matches by heart, I doubt a more complex model would be of much benefit.
Not enough data
Quite possible. Right now, with 600k data entries, we stand at 100k matches. This may be a small dataset for this problem.
The obvious solution to this is to gather more data. Given I have a little VM working tirelessly to gather data, this should be a matter of time.
A different mix of features would improve results
As an active Vainglory player, I have a little "expert" knowledge on the subject matter. A match's result is influenced by the team formation (the heroes of each team) but that's hardly the only influencing factor. Other factors can be:
-
Player experience
One of the most important factors when it comes to a match's verdict is the experience of the players involved. Vainglory is very much a skill-based game.
The Vainglory API exposes data on the players' skill level but it was not included in the model in an effort to keep it simple.
-
Player fitness
It can be argued that a player performs best at certain times or under certain conditions. These may include time of day (being tired if playing at 2AM at night), network connection speed (lag hurts both the performance and the mental stability of the player) and even notifications popping up while playing (especially if the player accidentally taps on one).
This is not something that the Vainglory API has (or can easily have) data on so the point here is that our model will not be able to predict this. We could though maybe infer it somehow, maybe.
-
Talent levels
As I mentioned earlier, talents have levels. It's possible that a talent at level 1 and at level 10 influence the result in different ways.
-
Player AFK
Sometimes a player goes AFK. Sometimes a bot takes over (if the game recognises this behaviour), other times the hero is seen idling in the map. In both cases, it's expected that the chances of the team worsen.
The Vainglory API has info on this and it could be incorporated into our data pretty easily.
-
Player - hero compatibility/experience
In other words, "how good is this player when they play this particular hero?". Until recently, Vainglory was not providing data on this. It could be inferred by examining a player's historic matches with said hero but that's a separate problem on its own.
Recently though, Vainglory started recording a sort of "hero XP" bar that fills up after a match finishes. This could be a good enough proxy for hero/player compatibility.
About data hygiene
The only way a model has to learn is through data. If the training set is not sufficiently separate from the dev and test ones, then the model will not generalize.
A lack of generalization was the problem I had but it manifested in a different way than what I expected: instead of a big difference between training and dev set error, I saw perfect scores where I should most probably not be seeing.
Next steps
More data
Like I mentioned before, I'll keep gathering more data. Once I hit 1.2m records (200k matches), I'll give this another shot. I do expect the dev set accuracy to not improve much but I also expect the training set accuracy to fall closer to where the dev set one will be. This is because the model will be forced to generalize more with all this data.
I'll also try expanding my dataset on the other dimension: adding more features. This will take some thinking on which features to focus on and how to model them.
Application
The expected end result of this experiment will be to build an application that serves this model as a web service. I have identified a couple of useful technologies to get this done (eg tensorflow-js) and I'm planning to give it a try while waiting for the dataset to gather itself.
Top comments (0)