# Magic: The Gathering Meets Data Science

###
Luciano Strika
*
Originally published at
datastuff.tech
on
*
ć»5 min read

Magic: The Gathering has been one of my hobbies for years. Its large card base and long history make it a perfect fit for Data Analysis and Machine Learning.

In case you missed my previous article, I applied K-Means Clustering (an Unsupervised Learning technique) to a Magic: The Gathering Dataset I scraped myself from mtgtop8. That article explains the technical side, but doesnāt get into the results, because I didnāt think my readers would be into it.

Since many people have stood up to voice their disagreement, I will now show you some of the things the Algorithm learned.

This will not be the first nor the last time that I say that unsupervised learning can be spooky with all it learns, even when you know how it works.

## The Data

The Dataset I used for this project contained only professional decks from last year, from the Modern format. I did not include sideboards into this analysis. All of the decks I used for training and visualizations are available, alongside the code, in this GitHub project.

If you know of any good Dataset for casual decks, Iāll be happy to know in the comments. Otherwise, I may scrape one in the future.

For this analysis, Iām looking at 777 different decks, containing a total of 642 unique cards (counting lands).

## The Results

First of all, I strongly encourage you to pull the repository and try the Jupyter Notebook yourself, as there may be some particular insights you find interesting that I may be missing.

That said, if you want to see what the Data say about a particular card (provided it is part of the competitive meta, which weāve seen is small enough) ask me in the comments if you donāt see it here!

Now, the first question weāll ask ourselves isā¦

### What does each Magic: The Gathering cluster look like?

Remember, we clustered decks, not cards, so we would expect each cluster to roughly represent an archetype, particularly one seeing play in the Modern meta.

First of all: here are the counts for each cluster. That is, how many decks fell into each.

We can see right off the bat there are two particularly small clusters, with less than 30 decks each. Letās take a closer look.

### Cards on each cluster

For cluster number 4, I got the set of 40 cards that appeared the most times for each deck in it, and then took the intersection to see what they all had in common. I repeated that procedure for cluster number 6.

```
Cluster number 4:
{'Devoted Druid', 'Horizon Canopy', 'Ezuri, Renegade Leader', 'Forest', 'Elvish Archdruid', 'Pendelhaven', "Dwynen\\'s Elite", 'Llanowar Elves', 'Collected Company', 'Windswept Heath', 'Temple Garden', 'Westvale Abbey', 'Razorverge Thicket', 'Heritage Druid', 'Elvish Mystic', 'Nettle Sentinel','Eternal Witness', 'Cavern of Souls', 'Chord of Calling', 'Vizier of Remedies', 'Selfless Spirit'}
Cluster number 6:
{'Funeral Charm', 'Liliana of the Veil', "Raven\\'s Crime", 'Fatal Push', 'Thoughtseize', 'Wrench Mind', 'Bloodstained Mire', 'Smallpox', 'Inquisition of Kozilek', 'Mutavault', 'Urborg, Tomb of Yawgmoth','Infernal Tutor', 'Swamp', 'The Rack', "Bontu\\'s Last Reckoning", 'Shrieking Affliction'}
```

It appears one of them is playing a green deck, using elves and green lands, while the other one combines milling and discarding, with cards like Liliana and Inquisition of Kozilek.

Hereās the result for the previous algorithm for all of the clusters, see if you can tell which archetype each belongs to. This also tells us about the distribution of the meta back when I got the data.

The same analysis on a more recent Dataset may even be useful in and of itself, if youāre into competitive tournaments.

### Particular Cards

Three cards stood out to me in those lists: ā*Mutavault*ā, ā*Inquisition of Kozilek*ā and ā*Llanowar Elves*ā.

I wonder if theyāre more common in other clusters? I didnāt really know *Mutavault* was so common in competitive play, and I think *Llanowar Elves* appearing on a deck tells us some stuff about it.

As always, you can generate these graphs for any of the cards, or ask me if youāre interested in a particular one.

### Versatile Cards

Lastly, Iāll define a new category of card: a cardās versatility will mean how many different clusters contain at least a deck that uses it.

I agree that that definition, admittedly, could be refined a bit more. For instance, by counting apparitions instead of just whether the card is in a deck or not.

However, the results this way are coherent enough, so I donāt think it needs any more tweaking. Hereās a list with the top 10 most versatile cards, after filtering Basic Lands out.

- Dismember
- Ghost Quarter
- Field of Ruin
- Cavern of Souls
- Thoughtseize
- Mutavault
- Sacred Foundry
- Stomping Ground
- Engineered Explosives
- Botanical Sanctum

Theyāre pretty much the ones youād expect. However, Iām surprised Lightning Bolt didnāt make the cut. I wasnāt sure whether non-Basic Lands should count, but I left them in in the end.

The fact that I have no idea which card āEngineered Explosivesā is, proves Iām out of touch with the state-of-the-meta, and maybe I should be playing more, but thatās beside the point.

## Conclusion

As we expected, Magic: The Gathering can be a fun source of Data, and I think we have all learned a bit by seeing all this.

Personally, Iām still surprised a bit of glorified linear algebra could learn all about the meta of competitive play.

Iād be even more surprised if it learned about archetypes in casual play, where decks are more diverse, though my intuition tells me with enough clusters, even that should be properly characterized.

What do you think? Would you have liked to see any other bits of information? Were you expecting the algorithm to perform well? And finally, what other domains do you think are fit for a proper Data Analysis, particularly using other Unsupervised Learning Techniques?

Please let me know any or all of that in the comments!

*Follow me on* *Medium* *or* *Twitter* *for more Articles, tutorials and analysis. Please consider* *supporting my website and my writing habit with a contribution**.*

The post Magic: The Gathering Meets Data Science appeared first on Data Stuff.