DEV Community

Cover image for How to use MySQL with Python: A beginners guide
Sunil Kumar Dash
Sunil Kumar Dash

Posted on • Edited on

How to use MySQL with Python: A beginners guide

Introduction

SQL and Python are two of the most valuable tools for data analytics. The real-world data used in machine learning applications are usually unavailable in CSV formats, such as in Kaggle competitions, but rather in databases. The structured data usually are stored in relational databases. To pull data from these Relational Database Systems(RDBMS), SQL (Structured Query Language) is used. We can only do so much with SQL, but with the help of Python, we can do many more things, from analysis to visualization of these data.

This article will discuss how to connect to and use MySQL databases from Python. But before that, let's learn the advantages of using SQL in Python.

Why SQL with Python?

At this point, it's trivial to discuss the impact of Python on Machine Learning. Vast library support and a relatively easy learning curve make it ideal for machine learning as it lets us spend less time juggling with codes and more time optimizing algorithms. Libraries like NumPy and Pandas make data analysis much easier, while Matplotlib, Seaborn, and Plotly can make attractive plots with few lines of code. For any kind of statistical analysis, there are Scipy and Statsmodels, and for machine learning, TensorFlow, Pytorch, and Scikit Learn. On top of all these things, there is a vast, ever-growing community.

SQL, on the other hand, is imperative if you are working with structured data. Like I said before, real-world data usually gets stored in databases, so retrieving those data requires a specific scripting language, and for relational databases, it is SQL. Combining Python with SQL makes it even more flexible when working with data. For example, with Python, we can do any statistical analysis such as Hypothesis testing quite easily and also can fetch data for different Machine Learning applications directly from databases. So, let's get started.

1. Getting Started 🔗

Before proceeding to the code part, let's set up our system. In this tutorial, we will need MySQL and MySQL connectors. If you haven't already installed MySQL in your system, download the MySQL installer and take care of the entire setup. For the MySQL connector, type the following code in your Jupyter Notebook, or you can do it within the shell as well by just skipping '!'.

`!pip install mysql-connector-python`
Enter fullscreen mode Exit fullscreen mode

Import Libraries
As usual, we will be importing the required libraries at the beginning.

`import mysql.connector
import pandas as pd`
Enter fullscreen mode Exit fullscreen mode

Here, the MySQL connector will help us connect to our database and Pandas as usual will be used for some data modifications.

Connecting to MySQL

The next thing is to connect to the MySQL server. This will help us communicate with the MySQL server to work with relational databases.

`mydb = mysql.connector.connect(
  host="localhost",
  user="root",
  password="******",
)
print(mydb)`
<mysql.connector.connection_cext.CMySQLConnection object at 0x0000029B2907FBB0>
Enter fullscreen mode Exit fullscreen mode

Be sure to use the correct username and password. To make it more elegant you can always use it inside a function to make the code reusable.

Now, we will initialize a cursor object. It can be thought of as a bridge between queries and MySQL databases. They are bound to the current connection for a lifetime until the connection is terminated. This help executes MySQL queries.

mycursor = mydb.cursor(buffered=True)
The buffering argument is by default set to False but we set it to true as it will return the outcome rows after being executed.

SQL Queries
Now, let's get to the main thing. In this section, we will be executing SQL queries to create, alter and delete database elements.

We can simply create our database by executing the following code.

`mycursor.execute("create database testdb")`
Enter fullscreen mode Exit fullscreen mode

To view existing databases

`mycursor.execute("SHOW DATABASES")
`for x in mycursor:`
`  print(x)``
Enter fullscreen mode Exit fullscreen mode

To use a database run the following code

`mycursor.execute("use mydatabase")`
Enter fullscreen mode Exit fullscreen mode

To view tables in the database

`mycursor.execute("show tables")
`for x in mycursor:
  `print(x)`
Enter fullscreen mode Exit fullscreen mode

Creating a Table

We can create a table by using the following code

`mycursor.execute("create table student (SId int(10), Name varchar(20), City varchar(20) )")`
Enter fullscreen mode Exit fullscreen mode

If you are not already familiar with SQL, refer to this article for an introduction to different types of queries. The above code creates a table named student in your selected database with field or column names with a specified data type and length.
Adding Data to Table
Data into the table can easily be fed through the following code snippet

`mycursor.execute("""insert into student values ("1","Alex", 
"Patna"), ("2","Alexa", "Georgia") ,("3","Frieza", "Pearl Harbour"), ("4","Siri", "Pretoria")""")``
Enter fullscreen mode Exit fullscreen mode

Here, we used triple quotation marks for multi-line string commands. Let's now see how our data table looks like

`mycursor.execute("select * from student")
st = mycursor.fetchall()
for i in st:
    print(i)
('1', 'Alex', 'Patna')
('2', 'Alexa', 'Georgia')
('3', 'Frieza', 'Pearl Harbour')
('4', 'Siri', 'Pretoria')`
Enter fullscreen mode Exit fullscreen mode

To undo the changes made to the table, we can use the SQL rollback command.

` mydb.rollback()`
Enter fullscreen mode Exit fullscreen mode

Select and Where statements

For this section, we will be using a sample dataset of cities in different countries.

`mycursor.execute("select * from city")
cities = mycursor.fetchmall()
for city in cities:
    print(city)`
Enter fullscreen mode Exit fullscreen mode

In the above code, the fetchall() function retrieves all the rows of the table. There are other similar functions like

fetchmany(size): Fetches the number of rows specified by size. When called repeatedly, this method fetches the next set of rows of a query result
fetchone(): Fetches only one row or none

`columns = [i[0] for i in mycursor.description]
print(columns)
output: ['ID', 'Name', 'CountryCode', 'District', 'Population']`
Enter fullscreen mode Exit fullscreen mode

Let's perform some queries

`query = '''select Name from City where population>500000 limit 10 
mycursor.execute(query)
for city in mycursor:
    print(city)

output:('Kabul',)
('Amsterdam',)
('Rotterdam',)
('Alger',)
('Oran',)
('Luanda',)
('Dubai',)
('Buenos Aires',)
('La Matanza',)
('Córdoba',)`
Enter fullscreen mode Exit fullscreen mode

Querying cities in japan with a population of more than 5,000,000.

`query = '''select Name from City where population>500000 and countrycode = 'JPN' limit 5 '''
mycursor.execute(query)
s = mycursor.fetchmany(5)
for city in s:
    print(city)
output: ('Tokyo',)
('Jokohama [Yokohama]',)
('Osaka',)
('Nagoya',)
('Sapporo',)`
Enter fullscreen mode Exit fullscreen mode

Group by and Order by
Group by is an SQL clause which is used for grouping rows with similar values. Whereas order by is used to order rows selected with a select statement.

`query = '''select countrycode,sum(population) from City group by countrycode'''
mycursor.execute(query)
s = mycursor.fetchmany(5)
for city in s:
    print(city)
output: ('ABW', Decimal('29034'))
('AFG', Decimal('2332100'))
('AGO', Decimal('2561600'))
('AIA', Decimal('1556'))
('ALB', Decimal('270000'))`
Enter fullscreen mode Exit fullscreen mode

Order country codes based on descending order of population.

`query = '''select countrycode,sum(population) as s from City group by countrycode order by s desc'''
mycursor.execute(query)
s = mycursor.fetchmany(5)
for city in s:
    print(city)
output: ('CHN', Decimal('175953614'))
('IND', Decimal('123298526'))
('BRA', Decimal('85876862'))
('USA', Decimal('78625774'))
('JPN', Decimal('77965107'))`
Enter fullscreen mode Exit fullscreen mode

SQL Joins

SQL joins are one of the key concepts of SQL. Usually, we have to work with multiple tables and to query data from them requires joining. There are several join methods but more often than not we will be using an inner join. Different joins are illustrated in the below diagram.

SQL joins

source: Dofactory.com

`query = '''select co.continent,sum(ci.population) as TotalPop from country co inner join city ci on ci.countrycode = co.code group by co.continent 
order by TotalPop desc'''
mycursor.execute(query)
res = mycursor.fetchall()
for i in res:
    print(i)

output:('Asia', Decimal('697604103'))
('Europe', Decimal('241942813'))
('South America', Decimal('172037859'))
('North America', Decimal('168250381'))
('Africa', Decimal('135838579'))
('Oceania', Decimal('13886149'))`
Enter fullscreen mode Exit fullscreen mode

In the above code, we performed an inner join on two tables city and country on the common attribute country code as joining is only feasible when there's a common field or attribute. And later we ordered the outcome in descending order.

Pandas and SQL

Pandas have built-in functions for SQL queries. But with Pandas, we get data frames as output instead of tuples. All we need to do is pass the query and database connection to the function.

`query ='''select * from city limit 10
pd.read_sql(query, con = mydb)`
Enter fullscreen mode Exit fullscreen mode

Or you can simply create a dataset as usual by passing the queried data and columns.

`mycursor.execute("select * from city")
cities = mycursor.fetchmany(10)
df = pd.DataFrame(cities,columns=columns)
df.head()`
Enter fullscreen mode Exit fullscreen mode

data frame

The best thing about Pandas is it makes analysis much easier. If we wish to visualize countries with the highest population we can easily do that with the Pandas plot function.

`query = '''select countrycode,sum(population) as TotalPop 
from City 
group by countrycode 
order by TotalPop desc 
limit 10'''
pd.read_sql_query(query,con=mydb).plot(kind='bar', x = 'countrycode', y='TotalPop' )`
Enter fullscreen mode Exit fullscreen mode

bar chart

import matplotlib.pyplot as plt
query = '''select language,percentage from countrylanguage where countrycode='AFG' '''
data = pd.read_sql_query(query,con=mydb, )
data.plot(kind='pie',x='language',y='percentage', )
plt.legend(data.language)
Enter fullscreen mode Exit fullscreen mode

pie chart

Once the data is converted to a data frame object, we can use any library to get desired results.

After the job is finished the connection can be terminated by running the following command.
mydb.close()

Conclusion

MySQL and Python are two of the most popular and efficient languages at their job. MySQL makes it a lot easier to pull data efficiently from databases while Python makes it convenient to perform complicated analyses on the data. Both have their unique capabilities and by combining them we can leverage the best of both.
Before parting let's have a quick summary of the above article

  • First of all, we learned to establish a connection to the MySQL server from Python.
  • We learned to execute SQL queries from a Python environment using a cursor object.
  • Learned to implement several key SQL concepts like Select, Where, Group by, Order By, Join etc.
  • Converted queried data to Pandas data frame for analysis and visualizations.

So, this was it. Hope you liked the article.

Top comments (0)