Mage

Posted on Apr 1, 2022

League of Legends rank(ing) guide

#machinelearning #beginners #gamedev

TLDR

Channel your inner scholar and Ryze to the top – leverage machine learning ranking models that can increase your rank in League of Legends through match analysis and predictions, free of code!

Glossary

Intro
Objective
Dataset
Data cleaning
Feature engineering
Model training
Evaluation
Deploying API
Conclusion

Intro

League of Legends is a multiplayer online game that is played by millions casually and professionally. For a competitive 5 vs 5 game where players fight to take down the others’ home base, it’s no sup-Ryze that if you choose to play the most over-powered (“OP”) champions, which are characters you play, you’ll have a higher likelihood of winning games against players of your skill level.

Ryze (Source: Riot Games)

Thus, if you’re looking to improve in the competitive League of Legends ranked game mode, how about trying an ML analysis to ensure you make the most statistically-backed decisions in your next game? How about learning more about a buzzword in the tech industry, “machine learning (ML)” while enjoying content about your interests?

Like Hextech in Arcane, we at Mage see ML as a tool that can be wielded by anyone and for anything, not just a few data scientists who excel at math and know how to code. Neither of these are required, you just need to understand the concept of a ranking model.

What is ranking?

Ranking is an application of machine learning that sorts data based on a parameter (like whether you win or lose in a League of Legends game). Depending on the order of how the data is sorted, we can make predictions on future outcomes of new data.

In other words, if you take a list of Teemo games and rank them based on whether he wins or loses, the model will score the champion Teemo as “low” on relevancy to “wins.” This means that Teemo’s presence in the game has little to no effect on whether you win or lose, giving him a low score in relevance to wins. Then, later when you ask the model to predict whether you’ll win this new Teemo game, it'll want more information.

Teemo, the “Beemo” skin (Source: Riot Games)

Objective

Other than making glorified Teemo slander, this guide aims to pique your interest in ML concepts through your interests in League. If we can begin by teaching gamers, like you, ML concepts (like what ranking models are through your interests), you can harness the power of a data-driven analysis to show others cool and easy ways AI can be implemented for your projects.

Therefore, in this guide, we want to answer the following questions using the ML ranking model we are building:

Which champions increase my likelihood of winning plat/diamond* games? In other words, which champions are meta in my “elo” (meaning skill level) and help me climb?
Which factors have the greatest impact on whether I win or lose?
What’s the likelihood that I’ll win a diamond game playing this champion and earning that amount of gold by 10 minutes?
Note: I generated my dataset using high elo games because I wanted to be taken seriously. 😩 Check out my Github if you’d like to learn how to query the Riot Games API to make your own dataset for a more accurate analysis on players in your elo (whether it’s higher or lower 😊).

Dataset

Most data scientists know that the best models originate from a strong and suitable dataset. For this guide, we painstakingly generated 5000 rows and 14 columns of actual platinum and diamond-tier ranked game data that we carefully selected from parsing responses from the Riot Games API. To download it and make your own data analyses, simply wish upon this star. 🌠

You can find more details about how we generated our dataset from the publicly available League of Legends ranked game data on my Github repository.

Stargazer Soraka (Source: Riot Games)

At a glance, this dataset contains 5000 rows, each containing a player's match info from a platinum or diamond-level ranked game. There are 14 columns, containing data like the player's KDA, champion they picked, ban, if a teammate went Afk, gold at 10 minutes, gold by the end of the game, and whether they won or lost.
Now that we have the data, our next steps are to prepare our data for the model to train.

Data cleaning

Look out! A tsu-Nami is incoming to wash away excess data 🌊
Nami in her "Program Nami" skin (Source: Riot Games)

Data cleaning is a step in the process of machine learning to reduce noise in your data to prepare clear and concise data for your ML model to train on. What this means is that most of the time, your raw data may contain empty or unrelated rows or columns of data that, if left alone, would confuse your model when it's looking for trends in the rows that lead to an outcome (like winning or losing).

The most difficult part of data cleaning and the next step (feature engineering) is that there is no specific formula or order to do things, which is why it seems so intimidating to get started. Like League of Legends, there's a high learning curve, but you can definitely do it!

As you go through the following steps, always ask yourself "why we are performing" this operation, and you'll learn things quickly 😊

Removing duplicates

At a glance and from how the data was collected from the Riot Games API, we gathered data by choosing a random plat or diamond-level player "A", and generating their 20 recent ranked games. However, if our list of 25 players that we gathered data from also contained player A's party member, "B," then there will be games containing player A and B's teammates and opponents that are recorded twice.

Duplicate data can skew prediction results by giving the model an impression that an occurrence of an event happens more frequently. Imagine how falsely a model would predict a Yasuo player's victory if we kept duplicates of a time the one-trick Yasuo main dominated a game.

We can remove duplicate instances, if any, of player A and B's games overlapping by removing rows that contain the same values where players played in the same match, on the same team, and played the same champion.

Filter

Sometimes, when we're unsure what data needs to be cleaned, we can go through each of the columns and look for empty, invalid, or incorrect values and filter them.

There are 5 roles to play in the game, but the "position" feature contains a 6th value, "Invalid," for when the Riot Games API was unable to guess what role a player was playing. (Check the right side of the image below to see the unique values of the column "position")

If we kept the rows with the "Invalid" value in the "position" column, we will be informing the model that players can play the "Invalid" role. We don't want the model to train on inaccurate values, as it will reduce the model's accuracy.

So we will use the filter operation to keep only the rows where the role played could be determined.

By extension, we have a column representing the amount of the gold earned by their lane opponent at the 10 minute mark, and it contains 35 "0" values. Unfortunately, these 35 rows with empty values in the "ten_min_lane_opponent_gold" column need to be filtered out because the data in these rows are incomplete, and would result in an inaccurate prediction of an outcome.

Removing unrelated columns

Thinking back on our intention for building a model, we're interested in finding out which champions that a player "picks" that maximize their chances of winning. So, we will need to remove unrelated columns. We will come back to this step later, since some of the columns will be removed after extracting relevant information from them.

Feature engineering

The point 🗡️ of this step is to transform present existing information to be as clear-cut as Fiora's rapier; so, we add columns to specify to the model what good gameplay in ranked matches look like. This also generalizes existing data into fewer, simpler columns.

Fiora in her "Bewitching" skin (Source: Riot Games)

Add column based on a conditional

One method of identifying whether a player is performing well is seeing if they beat their lane opponent at 10 minutes, a crucial turning point of the game. Although this doesn't guarantee their victory, it gives them an edge because it grants them the capacity to use their strength to aid their teammates and pressure the enemy– which can influence whether they win or lose.

Since teamwork is hard to identify and measure, we'll simply label instances where players had an advantage over their lane opponents in the boolean (meaning true or false) column "beat_lane_opponent."

Replacing the "ten_min_gold" column with this boolean column simplifies the numerical range of gold values (the range is 2,223–6,075 gold, to be exact!) into a simple "True" or "False" for whether the player beat their opponent. This value is much easier for the model to understand and determine which factors lead to a favorable outcome.

To add a new column by comparison, we simply compare the 2 columns and return "True" if we have beat our opponent.

Extract columns from JSON

For those who have experience working with APIs and JSON (a technical term programmers use to describe groups of named textual data), we know that occasionally, we receive data as a JSON and need to write code to dig the relevant data out. If you're wondering why people store data in such a pesky format, explore this lesson about how JSON data types are actually useful in storing auxiliary information.

Rek'sai, the digging queen (Source: Riot Games)

For the scope of this guide, we are simply interested in what data from the "challenges" column that contains JSON would be useful to our goal. Here's an example what kind of information a single row of JSON contains:

Pretty print generated by carbon.now.sh

Since we want to predict whether a champion is a "carry" and fights well, all we care about is the "teamDamagePercentage."

Aggregate sum

Finally, for some Ahri-thmatic calculations!

Academy Ahri skin (Source: Riot Games)

One of the main assessments of how well you played in a game is how loaded 🤑 you are compared to your team, a value we dub "gold percentage."

To do this, we need to aggregate (fancy term for performing an operation for each group of) matches and teams to sum the amount of gold earned by the team in total. This operation is called aggregate total sum, and you can read more about it here.

We'll aggregate the "gold" column to find the team's total gold, "team_gold":

Using the same operation, we can find the sum of "ten_min_gold" and call it "team_ten_min_gold."

Divide column values

With the divisor calculated, we can now divide an individual's gold by the team's to find their "gold_percentage."

Again, we will do the same by dividing "ten_min_gold" by "team_ten_min_gold" to calculate the column, "gold_ten_min_percentage" that tells us how much a player is contributing in gold at the 10 minute mark.

Subtract column values

Finally, with the gold percentages, we're interested in seeing whether a player scaled well as the game went on. If a player had a significant gold percentage at 10 minutes, did they leverage the funds they had and "carry" their teammates to victory? Or did they "throw"?

Ahri throwing out a heart (Source: Riot Games)

To see whether the player's gold contribution rose or fell, we subtract "gold_percentage" from "gold_ten_min_percentage." I call this feature "scalability."

Last data cleaning step

We mentioned in the previous section that we will remove the columns we either finished extracting information from or created new columns that summarized the data better.

This leaves only 11 columns, which are "bans", "beat_lane_opponent", "deaths", "gold", "gold_percentage", "gold_ten_min_percentage", "kills", "picks", "scalability", "team_damage_percentage", and "win."

Now that we are (finally) done preparing our data, we get to move onto the model training step!

Model training

A recap of what we're doing with the ranking model:

We are evaluating which champions (in the "picks" column)
Have the highest likelihood of winning platinum and diamond-level ranked games
By sorting our rows of data with the wins on top, we can score each champion for how many wins they got
Using the relative positioning of each champion score in our sorted list, we can make predictions whether you'll win a ranked game playing a certain champion depending on how high or low a relevant champion placed on the list

Setup

We know this is confusing, so we created a visualization that helps guide you through the process of selecting what you're ranking and how:

Train-test split

When a machine learning model is being trained, what it means is that it takes the data we painstakingly put together and split a majority of the rows to learn how to predict whether players would win or lose their games playing a certain champion and fulfilling certain performance goals.

To fairly calculate the accuracy of the model, we use approximately 90% of the data for teaching the model how to predict which champions win games and the remaining 10% to test if the model is correct in its predictions.

Found in the Review > Statistics tab of Mage

If you’d like to know more about the process of splitting and get some Sera-tonin by exposing yourself to some K-pop idol girls, this [K-folds cross validation] lesson is perfect for you.

Pop-star Seraphine (Source: Riot Games)

Evaluation

Now that the model has been trained, we can now interpret the results of the model. Generally, we evaluate whether a model makes good predictions or not based on the accuracy, precision, and recall values. However, since those metrics are general, we also use SHAP values to gain insight on how the individual columns affect the outcome.

This is all in an effort to understand and improve model performance in the next retraining step.

General metrics

The at-a-glance values we rely on to evaluate our model performance are:

Showing accuracy (82.04%), precision (82.04%), recall (82.08%)

Now, a final test of your thresh(hold😏) of enduring my puns! 🎃
Thresh (Source: Riot Games)

During the testing step, we evaluate the metrics we just mentioned by seeing the following testing examples:

When testing the outcomes of a Thresh player’s games, 82% accuracy means that the model correctly predicted the outcome of a Thresh player’s game results 82% of the time. Including the games they lost.
Precision is where the model correctly predicts a Thresh player winning a game.

Thus, an 80% recall rate means that in the testing set, if there were 5 instances of a Thresh player winning a game, we managed to identify 4 of them.

This is a really technical term, but these three metrics are usually linked to a confusion matrix.

Although this matrix is confusing to some, I’m sure it’s useful to some data scientists out there! 😉

SHAP values

To find out which columns influenced the outcome the most, Mage displays a list of columns and the trend in relation to the outcomes in the Review > Top features tab.

This list tells us a lot:

Proof that whether players win lane is not correlated to whether they win the game -> since “beat_lane_opponent” didn’t meet the “Top features” list
Column that influences win rate the most is actually “deaths” -> indicates that decreasing your death count increases your odds of victory more than increasing kills
Only 1 or 2 correctly predicted outcomes available for each champion in the testing set -> 5000 rows of data wasn’t enough to make predictions on the 140+ champions in League of Legends

Our top features list also tells us which columns had a miniscule influence on the outcome. Since columns like “bans”, “beat_lane_opponent”, and “team_damage_percentage” did not make the list, we can choose to exclude them in our re-train step and improve our ranking model.

Deploying API

If you’re curious whether certain values of gold and deaths affect your win rate, you can deploy the API in the Predict > Playground tab to make custom predictions!

In the gif below, we entered “4 deaths” and “7000 gold” and wanted to see a ranked list of champions most likely to win. As you can see, the champions Bard, Nautilus, and Alistar are “picks” that would increase your likelihood of winning with those gold and death counts.

Conclusion

_Ghostly rider, Hecarim (Source: Riot Games

Were our questions answered?

After we ran around like headless horsemen (Hecarim) to prepare, train, and evaluate a model, are the results from our ranking model sufficient to answer the questions we asked in the Objective section?

Q: “Which champions increase my likelihood of winning plat/diamond* games?”

A: Although we don’t have enough data to be confident, tentatively, based on only 1 or 2 instances each we will say with a handful of salt:

Playing Sett, LeBlanc, Fiora, Lux, Shen, and Jax, and
Not playing Yuumi, Vex, and Yasuo will increase your chances of winning.

Q: “Which factors have the greatest impact on whether I win or lose?”

A: Deaths and gold were the most influential columns, so if you want to win your diamond or plat games, earn gold — learn computer science (CS) 😉– and don’t feed! Wow, such million dollar advice.

Q: “What’s the likelihood that I’ll win a diamond game playing this champion and earning that amount of gold by 10 minutes?”

A: Although we cannot make predictions based on champion “picks”, you can experiment with different inputs (like kills, gold, and deaths) in our Mage Playground to see which champions have the highest chance of winning with those stats.

Displayed in the playground is the assertion that Jax, Lucian, and Karthus have the highest chance of winning with 3 deaths, 7000-ish gold, 1 kill, and etc.

About myself

Since you read all the way to the end, I’d like to share a bit about the inspiration and motivation I had for writing this lengthy piece. I was a participant at the 2018 Riot Games Hackathon where I met Phreak and an amazing data scientist who introduced me to what data analysis is.

Fangirling over meeting Phreak!

Since then, I strongly believe that there are many like me in 2018 who are intimidated by big buzzwords like machine learning but have curiosity. You read through this guide, so you more or less understand the basic concepts to build and leverage a machine learning model! Unleash those cool ideas 😊

Begin your data-driven journey 🔮✨

… with Mage? Jk, unless?🥺
Although we used Mage in the above operations, we absolutely believe that there are other methods of doing data analytics. Visuals are important– I would rather display operations with a pretty UI than code to not intimidate non-programmers.

But, as shown in the guide, Mage makes the process of building models easier, so if you are just getting started, I recommend Mage Academy!

We have a plethora of interesting lessons about machine learning concepts. From beginner-friendly intros and advanced topics like this one, becoming an AI-expert has never been easier and more fun.

Whenever you feel ready, you can build your first model. Whenever you need help or want to share the cool ways you leverage data, join our AI community on Discord!

Lastly, hope you enjoyed the article, gamers! Thank you for sticking around until the end 🥰

Top comments (1)

Tommy DANGerous • Apr 1 '22

LOL this is awesome!

DEV Community