Intro
Magic: The Gathering, the Magic logo and the Mana icons are property of Wizards of the Coast.
Have you ever heard of Markov chains or Magic: The Gathering?
Magic: The Gathering
Magic is a card game where you summon creatures to fight for you. Usually each player begins with a deck of 60 cards and 20 points of lives, and each turn each one is able to cast spells (creatures/sorceries, you get the idea) in order to defeat your enemy.
It's been more than a year since I've played a game of Magic. The format I used to play is called Standard, which means that cards change quite frequently, so not playing for a year is quite unforgiving. I had to look through 3 new sets to decide what cards I wanted to play! Luckily I love black & white colors, so it reduces the number quite a bit, however there are still 660 white/black cards that I had to choose from! I obviously needed to find a better way.
Markov Chains
I've had some minor lessons on machine learning and statistics, and I used to own a chatbot startup. During that time we were playing around with text generation and learned about Markov chains.
The idea is quite simple really. In its purest form it describes a suite of events that are linked to one another by some probability. In the form of text generation these "events" are words, and the links are the probability of one word following another. I'm sure you already see where this is going.
Basic Idea
The basic idea was then to train a Markov chain model to spit out some deck combination from which I could start building my own.
First of all, I needed a set of data from which the algorithm could learn, and then it was just a matter of counting the cards and getting the odds of them appearing together. After looking for a bit and having chosen my deck type (black/white life) I realized MTGGoldfish was the site I needed. It has a collection of decks (user-submitted and from official ranked play) and a searchable db, and it allows to export the deck in text format from a simple url.
Working the Magic
Data
Using MTGGoldfish I was able to find some 20 something decks that fit the type of deck I wanted. I took the lists and made a simple tool to query the site and retrieve the card list.
This became the de-facto way of retrieving the data, and would allow for an easy way to change the dataset in the future.
The learning algorithm
Once I retrieved the list of cards, the "learning" algorithm was quite simple:
- Make sure every card in the deck appears as many times as it needs (if the deck has 4 X, have an array with 4 times X)
- Loop the deck for every card in it and maintain a count of how many times the other cards appear (obvious optimization here, that I did not do)
- Do the same for every deck the dataset has
Once this is done we end up with objects that look like
{
cardX: {
cardY: 35,
cardZ: 12,
cardA: 13
}
}
The last step is to actually compute the probabilities. I took the simplest approach and just computed a percentage of appearance for each card (times-appeared / sum-of-cards). This gives a result between 0 and 1 that's easy to exploit and ensures that the sum of all probabilities goes up to 1.
The generation algorithm
Markov chains allow to generate a new "path of events" by simply getting a random next-event following the probabilities of it appearing.
Basic random with weighted probability
This becomes an easy task if you think of it as simply the "frequency of times we've seen this card". Imagine you have 3 cards, A B and C, and you've seen A 5 times, B 2 times and C 6 times.
If you want to know what card you'll see next following theses frequencies you can create an array and pick a random card:
let frequencies = [A, A, A, A, A, B, B, C, C, C, C, C, C];
let next_card = frequencies[Math.floor(Math.random() * frequencies.length)];
Now, this is a very naive approach, but let's look at it a little bit closer. Entering [A, B, C]
on an array wouldn't be enough, since they'd each have 33% chance of appearing. The larger array works because there are 5 As before the 1st B, and 2 Bs and 5 As before the first C, which means 5/13 chances of A, 2/13 chances of B and 6/13 chances of C.
What the random picking algorithm really does is generate a random number and compare it to each probability, taking into account the others before it.
If the random number is 12, we know it's a C because 5 < 12 (5 As)and 5 + 2 < 12 (5 As and 2 Bs), but 5 + 2 + 6 > 12 (5 As, 2 Bs and 6 Cs)
Calculating the same random element is easy for us since our probabilities are already in the range 0-1. We just need to keep count of all the items we already saw and sum their probabilities, ensuring we'll see the items respecting their frequencies.
Once that's done, we just need to repeat the process for the next card. The approach I took was to have each card calculate the next one for itself (like a tree of some sort).
First run
For the first run I launched the algorithm manually for the list of decks I chose from MTGGoldfish. I quickly realized I had a problem, since land cards (mana) are much more frequent than the others (for a deck of 60 cards you usually have around 20 lands). So I ended up with 25/35 lands every time, not perfect but usable.
I decided to add an ignore
list, allowing me to tell the algorithm to ignore some cards when learning. After that I just needed to generate 40 cards instead of 60.
I also needed a way to preview the decks I was generating (for some stats), but luckily MTGGoldfish has a deck builder where you can paste text and preview the deck with a single button push.
Learning from itself
I finally asked the algorithm to follow a funnel approach and generate a certain number of decks. Re-learn from them, and generate half as many decks, and then repeat until only 1 deck was left.
This improved a little bit the odds of the cards appearing (at the same time greatly improved the lands appearing, but the ignore flag saved me) and allowed to better choose the cards and exclude some less-used cards.
Here I asked the algorithm to generate a 38-card deck, to write it into a file and
-p
to preview the deck and open the MTGGoldfish deck builder
Final Results
I compared the frequencies from the dataset to the frequencies from 1000 generated decks in this spreadsheet. As you can see, our probabilities worked wonders!
Now, the interesting part is not just the frequencies, but the links between the cards, and comparing which ones appear the most after many rounds.
In the same spreadsheet, the last couple of columns show what happened after I asked the algorithm to create 100 decks and learn from them... 50 times.
As you can see, from the 27 top cards I picked out 21. I will obviously be doing some sorting in order to get a good deck, but this is a proof of concept of a "card selector" that coud help you pick out cards for a themed deck.
The Code
You can find my repo on GitHub with all you need to try the builder yourself. I still need to write a README though.
I decided to make a CLI for a fast run on my computer, but it can easily be transformed into a web app or something.
⚠️ Beware: the code is not tidy, specially in the cli.js
file, I kept adding stuff and not refactoring anything (like the options.rounds
or options.plot
I used to get stats for the spreadsheet).
Limits
You've maybe realized this by now, but this approach still has some limits:
- While learning multiple times, frequent cards become even more frequent (lands for example)
- The algorithm is not smart enough to know that there is a limit of 4 copies of the same card per deck (although this limit is rarely broken, it happens to see 5 or 6 copies of the same card)
- Linked to the 2nd limit is the fact that you usually want to have multiple copies of the same card, so you have more chances of drawing it at some point in the game
Top comments (0)