Introduction
Are you a data analyst that has heard of Python, everything it is capable of, and would like to try it but you don’t even know where to get started? Then this article is for you. This is the second in a series of articles to help data analysts get started with using Python. In this article we will cover how to connect to data housed in an Excel file. If you do not already have Python installed, check out the first post in this series: Python for Data Analysts: Getting Started.
Before Accessing Data
Before you can access data, let’s cover a few basic concepts around Python that you need to be aware of before proceeding. This series walks you through getting started with and using Python step-by-step by providing code and Jupyter Notebooks, so you don’t have to be an expert at Python or understand everything to follow along. Take it slow and learn by example.
Libraries
Libraries are the bread and butter of Python. You can utilize libraries by importing them. For example:
import pandas as pd
If you followed the tutorial from the first post in this series, then you should already have Anaconda installed. Anaconda by default comes with a lot of useful libraries installed; however, you can also install them. There are a few ways to do this, but in this series, we will focus on accomplishing everything possible from Jupyter Notebooks. In a Notebook you would use the same command as in the Command Prompt, except you put a ‘!’
in front of it:
! pip install pandas
For more information on the pandas
library, check out the official documentation.
Terminology
Below are some key terms used throughout this post, and others in the series, that are important to be aware of. These key terms may or may not be familiar to you, depending on your previous experience with coding and data analysis; therefore, I am including a list of terms used and their definitions in relation to this post so as to aid in your development and understanding.
- Object: An object is something you assign an entity to.
- Entity: An entity is something like a list, table, string, number, etc.
-
Assign: You assign entities to objects by using
‘=’
. -
Index: In Python, indexes start at
0
. - Function: A call to Python to perform an action based on the arguments provided.
- Arguments: Information passed to a function in order for the function to work as intended.
Data in Excel
We all have Excel files with data that we want to perform more in-depth analysis on, right? Well let’s talk about how we get that data into Python in the first place. There are many ways to accomplish this, just like with anything in Python. In this post we will be covering how to get this data using the pandas
library.
Notes:
- First and foremost, you are going to need the
pandas
library. - For this I am pulling a sample Excel file from Contextures, which I will also attach in this post.
- You can utilize your own Excel file if you would like, you do not have to use the sample file I am using.
Step 1:
In order to get started, import the pandas
library into your Jupyter Notebook:
import pandas as pd
Step 2:
Assign the path for the Excel file you would like to gather data from to an object.
excel_path = r'Path\FileName.FileExtension'
A few notes on the above step:
- To test this yourself, replace the path value assigned to
excel_path
below with a path that points to your own excel file. In other words, replace‘Path\FileName.FileExtension’
below with your own path and file name. - It is important to note that the
r
in front of the path is important. It tells Python how to interpret the special characters, such as\
in the path. - It is also important to note that this will not work if you don’t include both the file name and file extension.
Step 3:
If your Excel workbook has more than one sheet you need to identify which sheet has the data you want to look at. If your Excel workbook only has one sheet, then you can ignore this step.
excel_sheet = 'SalesOrders'
Step 4:
Using a built-in pandas
function pull in the data from your Excel workbook into a data frame.
excel_df = pd.read_excel(excel_path, sheet_name = excel_sheet)
A few notes on the above step:
- If your Excel workbook only has one sheet, then you do not need to include the second argument in the function.
- In other words, if your Excel workbook only has one sheet, then delete
, sheet_name = excel_sheet
from the code above. - If your Excel workbook has multiple sheets but you do not specify which sheet should be read, then the function will look at the first sheet in the workbook.
- If you would like to know more about the arguments you can pass into the
pandas
read_excel
function, you can check out the documentation for the function.
Step 5:
Last, but certainly not least, look at your data.
excel_df
If you are using the example dataset, you should get an output that looks like this.
Conclusion
In this post we have covered how to access data housed in an Excel file but there are many different locations data can be housed, with this in mind the next few posts in this series will cover other avenues to access data. However, now that we have accessed some data, I’m sure you’re asking, “What can we do with it?” Though we did not answer that question in this post, stick around over the next few posts, and I will provide more information on what you can do with the data that you now have.
Associated Files:
Credits:
Photo by Christina Morillo
Top comments (0)