loading...

What are matrices in Python?

jmau111 profile image Julien Maury Updated on ・3 min read

Matrices are powerful. Let see why.

Why matrices are useful

Roughly speaking, matrices are groups of numbers. For example, to generate images with the red-green-blue model (RGB), three matrices are used.

The final result is the combination of three images, the red one, the green one, and the blue one. Each matrix is composed of numbers from 0 to 255 (256 digits, 8 bits).

This technique is pretty handy to apply simple transformations such as grayscale or even sophisticated filters and multiple treatments to groups of related pixels instead of modifying each pixel.

Computers calculate specific compression formats such as jpeg or png with matrices. It allows you to store collections of images on your devices without running out of RAM and, of course, host fancy pictures on your online website.

Machine learning with Python?

Machine learning deeply relies on the use of matrices. Instead of handling each data one by one, you can use numpy to process a considerable amount of data instantaneously.

Python can be slow, and it's not optimized to manipulate large lists of data, besides, there is no built-in type for matrices, but we can import the NumPy library (written in C) to make calculations.

Machine learning consists of teaching computers how to make decisions based on examples, but it won't work without massive amounts of data.

Big companies such as Spotify or Booking.com use python frameworks for their machine learning, such as scikit-learn (built with NumPy).

Getting started with matrices in Python

Make sure you have Python3 installed.
Once done, it's pretty easy. First, import the NumPy package:

import numpy as np

Then start coding.

Simple example

Let say we want to create a matrix with goals of Ronaldo and Messi during the past seven years:

Cristiano_Ronaldo_Goals = [30, 200, 80, 150, 21, 38, 71]
Lionel_Messi_Goals = [22, 11, 26, 12, 19, 31, 15]

# Matrix
Goals = np.array([Cristiano_Ronaldo_Goals, Lionel_Messi_Goals])

The np.array function combines data row by row. So put the following code in a file named test.py:

import numpy as np

Cristiano_Ronaldo_Goals = [30, 200, 80, 150, 21, 38, 71]
Lionel_Messi_Goals = [22, 11, 26, 12, 19, 31, 15]

Goals = np.array([Cristiano_Ronaldo_Goals, Lionel_Messi_Goals])
print(Goals)

and run it in your terminal:

python3 test.py

You will get:

[[ 30 200  80 150  21  38  71]
 [ 22  11  26  12  19  31  15]]

Now you can run some tests like that:

print(Goals[0][-2])

displays "38" (line 1, second item from the right). But that's not handy. It's better to use dictionaries and operations.

Dictionaries

Dictionaries allow for more convenient ways to search for data.

import numpy as np

# Lists
Cristiano_Ronaldo_Goals = [30, 200, 80, 150, 21, 38, 71]
Lionel_Messi_Goals = [22, 11, 26, 12, 19, 31, 15]

# Dictionaries
Dictionary_Players = {"Christiano Ronaldo":0, "Lionel Messi":1}
Dictionary_Years = {"2014":0, "2015":1, "2016":2, "2017":3, "2018":4, "2019":5, "2020":6}

# Matrices
Goals = np.array([Cristiano_Ronaldo_Goals, Lionel_Messi_Goals])
print(Goals[Dictionary_Players["Christiano Ronaldo"], Dictionary_Years["2019"]])

It's just a custom mapping, and here we don't have a lot of data, but the idea is to have something more readable.

Operations

We saw how to use dictionaries, but it gets better with operations. Let say we want to analyze some player stats:

import numpy as np

# Lists
Cristiano_Ronaldo_Goals = [30, 200, 80, 150, 21, 38, 71]
Lionel_Messi_Goals = [22, 11, 26, 12, 19, 31, 15]
Cristiano_Ronaldo_Games = [20, 21, 25, 18, 14, 42, 7]
Lionel_Messi_Games = [11, 13, 15, 17, 6, 31, 3]

# Dictionaries
Dictionary_Players = {"Christiano Ronaldo":0, "Lionel Messi":1}
Dictionary_Years = {"2014":0, "2015":1, "2016":2, "2017":3, "2018":4, "2019":5, "2020":6}

# Matrices
Goals = np.array([Cristiano_Ronaldo_Goals, Lionel_Messi_Goals])
Games = np.array([Cristiano_Ronaldo_Games, Lionel_Messi_Games])

# Operations and results
lineRonaldo = Dictionary_Players["Christiano Ronaldo"]
columnRonaldo = Dictionary_Years["2019"]
perMatch = np.matrix.round(Goals/Games, 2)

print(perMatch[lineRonaldo][columnRonaldo])

Displays "0.9".

Indeed, it's a basic example, not a production-ready script. We should probably refactor a lot of things. The idea is to be able to run the same operations with high amounts of data without the need to write complicated (and pretty slow) loops.

Visualization

In real life, you would have some database or API where you could fetch source data. It could be thousands of rows. It would be almost impossible to write any manual loop. That's why matrices are useful and more efficient.

Besides, it's easy to combine matrices with data visualization. There are specific packages for that, such as matpotlib.

Machine learning often needs human supervision. Visualizations are useful to sort data, and they allow for discovering valuable patterns.

Wrap up

I hope you enjoyed this short introduction to matrices in Python. It's also a key concept you need to understand before learning machine learning.

Posted on by:

jmau111 profile

Julien Maury

@jmau111

Practise what you preach.

Discussion

pic
Editor guide
 

I did a bunch of robotics and cad simulation by hand in college using transformation matrices in MATLAB. They are so powerful. Once you let the abstraction take over you can be super productive and do some really cool things.

 

I agree with you. Those visualization tools are powerful and there are pretty good documentations with nice examples.