Michael_Maranan

Posted on Sep 11, 2023 • Edited on Sep 16, 2023

Introduction to Pandas: Series and DataFrames

#datascience #beginners #jupyter #python

The Pandas Series and DataFrames are some of the core elements you need for Data Analysis with Python and the Pandas library. You can both use them for data reading, storing, modifying, and more. If you want to know more about Series and DataFrames, let's jump right into it.

What are Series and What are DataFrames

Let's quickly identify first what is a series and what is a data frame. Both of them are datasets, they just have different shapes. They aren't the same, but they are very related.

The Series (pandas.Series) are datasets with 1-dimensional shapes. It is more likely an array or a list in Python. It can store any kind of object, and the cool thing is that you can customize its index with a different set of numbers (int or float) or you can also use strings (str).

On the other hand, DataFrames (pandas.DataFrames) are 2-dimensional datasets. It has rows and columns which is very useful when creating a table. As I said, series and data frames are related to each other, it is because each column in a data frame is a Series.

Using the Series

Let's import the pandas module first, then use pd.Series() to create a series.

import pandas as pd

sr = pd.Series([1, 23, 34, 24, 51, 15])

output:

Calling our variable sr will return us a Series with a default numeric index. In able to change the index, we can use the index argument when creating the Series or after the Series was made.

# customizing after a series was created
sr.index = ['q','w','e','r','t','y']
sr

output:

# customizing index while creating
sr = pd.Series(
    [1, 2, 3, 4, 5, 6],
    index=['Q', 'W', 'E', 'R', 'T', 'Y'],
    name="Number List"
)
sr

output:

You can also name your Series. A Series name acts as a column name since DataFrame columns are actually Series.

By simply calling the variable sr, we can read the whole Series. In terms of reading a specific cell of an object, we can call it using an index like how we do in ordinary Python.

>>> sr["Q"]
    1

You can also read objects using a range of indexes.

sr["Q":"T"]

output:

# other way of calling with index is using the .loc method
sr.loc["Q":"R"]

output:

We can still access object cells using a numeric index with the .iloc method.

>>> sr.iloc[2:-1]

output:

Using conditions with our Series, we can get a boolean series as an output. We can use this to read objects conditionally.

sr >= 4

output:

sr[sr >= 4]

output:

To add another object in your series, you can add an object like how we do it in Python dictionaries.

sr["U"] = 78

output:

You can also add/merge a Series with another Series using .append().

sr.append(pd.Series([33,44,55]))

output:

As you notice, the index of the older objects remains the same (strings form) and the newly merged objects have a default numeric index value. The new objects got their indices from the old Series they're in, so if we modify their index and append it to other Series, the objects will retain their old index.

sr.append(pd.Series([66,77,88], index=["I","O","P"]))

output:

However, we can reset the index of both Series by using the ignore_index argument.

sr.append(pd.Series([55,56,67], index=["a","s","d"]), ignore_index=True)

output:

Note: The modifications like append won't be saved automatically. If we call our Series, we will see there's nothing changed.

>>> sr
    Q     1
    W     2
    E     3
    R     4
    T     5
    Y     6
    U    78
    Name: Number List, dtype: int64

# saving the operations we did
sr = sr.append(pd.Series([55,56,67], index=["a","s","d"]), ignore_index=True)

output:

Using DataFrames

For dealing with a bigger set of data/objects with multiple columns, we can use DataFrames.

df = pd.DataFrame({
    "Age": [21,19,22,20,23,23,21],
    "Sex": ["M","M","F","M","F","F","M"],
    "GPA": [3.45,2.98,3.72,2.87,3.90,4.00,1.90]
}, index=["James", "Mark", "Rebecca", "David", "Lucy", "Judy", "Johnny"])

df

output:

Calling our DataFrame or Series with .head() will return the first rows of our dataset. Passing an int to this function will return the first nth rows, but will return the first 5 rows as default (if you don't pass a number). However, if you want to access the last rows, you can use the .tail() function.

# default .head() or .tail() will return 5 rows
df.head()

output:

# adding a number as an argument
df.tail(3)

output:

Like the Series, in order to access objects using a string index, we can use the .loc function, and for the numeric index, you can use .iloc.

# single accessing
df.loc["Judy"]

output:

# accessing multiple rows through range
df.iloc[1:-1]

output:

You can also specify what columns you only want to pick from them.

# return columns mentioned only
df.loc["David":"Johnny", ["Age", "Sex"]]

output:

And if we can pick a row, we can also drop a row.

df.drop(["James", "Mark"])

output:

Another thing you can use to access rows/objects is by passing a condition to the DataFrame.

# the condition we want will return a boolean value
df["GPA"] >= 3.0

output:

# the condition will return the data of only those who meet our condition
df[df["GPA"] >= 3.0]

output:

To add another column in our DataFrame, we're going to use Series. Let's create another row for our DataFrame using the condition we used earlier.

# creating new column out on an existing column
is_passed = pd.Series(df["GPA"] >= 3.0, index=df.index)

is_passed

output:

# adding the column to our DataFrame
df["Is Passed"] = is_passed

df

output:

And if you ever want to rename a column or an index, let's say you put a wrong index or want to change the name of a column, you can use the .rename() function.

df.rename(
    columns = {
        "Is Passed": "Passed GPA"
    },
    index = {
        "James": "J. Doe",
        "Mark": "M. Villar",
        "David": "D. Martinez",
    }
)

output:

Remember, all the changes won't be saved automatically unless you save it into a variable (or update its old variable).

One last thing, Pandas Series and DataFrames have more cool features. You can run pandas.Series? or pandas.DataFrame? to quickly view the documentation.

So there you have it! With Series and DataFrames in your toolkit, you've got the muscle to handle data like a pro. Whether you're diving into data for work, play, or sheer curiosity, Pandas has your back. I hope you find this blog helpful, thanks, and have a good day.

Top comments (2)

sc0v0ne • Sep 13 '23

Hey @codeitmichael, i like your post. If you accept feedback, in the images where there is code as in your explanation of pandas, place the code in the code marking and below the output image, for beginners you will be able to follow the reading and check the result.

example:
input:

df = pd.read_csv('dataset_random.csv')

output:
image

Michael_Maranan • Sep 13 '23

Hey @sc0v1n0, I'm glad you like it! Thanks for your feedback, I will update it and separate the code snippets and outputs to code blocks and pictures.

DEV Community

Introduction to Pandas: Series and DataFrames

What are Series and What are DataFrames

Using the Series

Using DataFrames

Top comments (2)

Read next

List of free Quantum Toolkits

Introduction to Amazon VPC and Its Fundamentals

Text compression & Code splitting & Modern image formats - Performance optimization

A Power-Filled IDE for Neovim with Sane Defaults