DEV Community

Cover image for Introduction to Pandas: Series and DataFrames
Michael_Maranan
Michael_Maranan

Posted on • Edited on

Introduction to Pandas: Series and DataFrames

The Pandas Series and DataFrames are some of the core elements you need for Data Analysis with Python and the Pandas library. You can both use them for data reading, storing, modifying, and more. If you want to know more about Series and DataFrames, let's jump right into it.

What are Series and What are DataFrames

Let's quickly identify first what is a series and what is a data frame. Both of them are datasets, they just have different shapes. They aren't the same, but they are very related.

The Series (pandas.Series) are datasets with 1-dimensional shapes. It is more likely an array or a list in Python. It can store any kind of object, and the cool thing is that you can customize its index with a different set of numbers (int or float) or you can also use strings (str).

On the other hand, DataFrames (pandas.DataFrames) are 2-dimensional datasets. It has rows and columns which is very useful when creating a table. As I said, series and data frames are related to each other, it is because each column in a data frame is a Series.

Using the Series

Let's import the pandas module first, then use pd.Series() to create a series.

import pandas as pd
Enter fullscreen mode Exit fullscreen mode
sr = pd.Series([1, 23, 34, 24, 51, 15])
Enter fullscreen mode Exit fullscreen mode

output:
Image description

Calling our variable sr will return us a Series with a default numeric index. In able to change the index, we can use the index argument when creating the Series or after the Series was made.

# customizing after a series was created
sr.index = ['q','w','e','r','t','y']
sr
Enter fullscreen mode Exit fullscreen mode

output:
Image description

# customizing index while creating
sr = pd.Series(
    [1, 2, 3, 4, 5, 6],
    index=['Q', 'W', 'E', 'R', 'T', 'Y'],
    name="Number List"
)
sr
Enter fullscreen mode Exit fullscreen mode

output:
Image description

You can also name your Series. A Series name acts as a column name since DataFrame columns are actually Series.

By simply calling the variable sr, we can read the whole Series. In terms of reading a specific cell of an object, we can call it using an index like how we do in ordinary Python.

>>> sr["Q"]
    1
Enter fullscreen mode Exit fullscreen mode

You can also read objects using a range of indexes.

sr["Q":"T"]
Enter fullscreen mode Exit fullscreen mode

output:
Image description

# other way of calling with index is using the .loc method
sr.loc["Q":"R"]
Enter fullscreen mode Exit fullscreen mode

output:
Image description

We can still access object cells using a numeric index with the .iloc method.

>>> sr.iloc[2:-1]
Enter fullscreen mode Exit fullscreen mode

output:
Image description

Using conditions with our Series, we can get a boolean series as an output. We can use this to read objects conditionally.

sr >= 4
Enter fullscreen mode Exit fullscreen mode

output:
Image description

sr[sr >= 4]
Enter fullscreen mode Exit fullscreen mode

output:
Image description

To add another object in your series, you can add an object like how we do it in Python dictionaries.

sr["U"] = 78
Enter fullscreen mode Exit fullscreen mode

output:
Image description

You can also add/merge a Series with another Series using .append().

sr.append(pd.Series([33,44,55]))
Enter fullscreen mode Exit fullscreen mode

output:
Image description

As you notice, the index of the older objects remains the same (strings form) and the newly merged objects have a default numeric index value. The new objects got their indices from the old Series they're in, so if we modify their index and append it to other Series, the objects will retain their old index.

sr.append(pd.Series([66,77,88], index=["I","O","P"]))
Enter fullscreen mode Exit fullscreen mode

output:
Image description

However, we can reset the index of both Series by using the ignore_index argument.

sr.append(pd.Series([55,56,67], index=["a","s","d"]), ignore_index=True)
Enter fullscreen mode Exit fullscreen mode

output:
Image description

Note: The modifications like append won't be saved automatically. If we call our Series, we will see there's nothing changed.

>>> sr
    Q     1
    W     2
    E     3
    R     4
    T     5
    Y     6
    U    78
    Name: Number List, dtype: int64
Enter fullscreen mode Exit fullscreen mode
# saving the operations we did
sr = sr.append(pd.Series([55,56,67], index=["a","s","d"]), ignore_index=True)
Enter fullscreen mode Exit fullscreen mode

output:
Image description

Using DataFrames

For dealing with a bigger set of data/objects with multiple columns, we can use DataFrames.

df = pd.DataFrame({
    "Age": [21,19,22,20,23,23,21],
    "Sex": ["M","M","F","M","F","F","M"],
    "GPA": [3.45,2.98,3.72,2.87,3.90,4.00,1.90]
}, index=["James", "Mark", "Rebecca", "David", "Lucy", "Judy", "Johnny"])

df
Enter fullscreen mode Exit fullscreen mode

output:
Create a DataFrame

Calling our DataFrame or Series with .head() will return the first rows of our dataset. Passing an int to this function will return the first nth rows, but will return the first 5 rows as default (if you don't pass a number). However, if you want to access the last rows, you can use the .tail() function.

# default .head() or .tail() will return 5 rows
df.head()
Enter fullscreen mode Exit fullscreen mode

output:
head function

# adding a number as an argument
df.tail(3)
Enter fullscreen mode Exit fullscreen mode

output:
tail function

Like the Series, in order to access objects using a string index, we can use the .loc function, and for the numeric index, you can use .iloc.

# single accessing
df.loc["Judy"]
Enter fullscreen mode Exit fullscreen mode

output:
loc function

# accessing multiple rows through range
df.iloc[1:-1]
Enter fullscreen mode Exit fullscreen mode

output:
iloc function

You can also specify what columns you only want to pick from them.

# return columns mentioned only
df.loc["David":"Johnny", ["Age", "Sex"]]
Enter fullscreen mode Exit fullscreen mode

output:
return specified columns only

And if we can pick a row, we can also drop a row.

df.drop(["James", "Mark"])
Enter fullscreen mode Exit fullscreen mode

output:
drop function

Another thing you can use to access rows/objects is by passing a condition to the DataFrame.

# the condition we want will return a boolean value
df["GPA"] >= 3.0
Enter fullscreen mode Exit fullscreen mode

output:
boolean output

# the condition will return the data of only those who meet our condition
df[df["GPA"] >= 3.0]
Enter fullscreen mode Exit fullscreen mode

output:
Conditional accessing

To add another column in our DataFrame, we're going to use Series. Let's create another row for our DataFrame using the condition we used earlier.

# creating new column out on an existing column
is_passed = pd.Series(df["GPA"] >= 3.0, index=df.index)

is_passed
Enter fullscreen mode Exit fullscreen mode

output:
creating a column out of existing column

# adding the column to our DataFrame
df["Is Passed"] = is_passed

df
Enter fullscreen mode Exit fullscreen mode

output:
adding the column to dattaframe

And if you ever want to rename a column or an index, let's say you put a wrong index or want to change the name of a column, you can use the .rename() function.

df.rename(
    columns = {
        "Is Passed": "Passed GPA"
    },
    index = {
        "James": "J. Doe",
        "Mark": "M. Villar",
        "David": "D. Martinez",
    }
)
Enter fullscreen mode Exit fullscreen mode

output:
rename function

Remember, all the changes won't be saved automatically unless you save it into a variable (or update its old variable).

Image description

One last thing, Pandas Series and DataFrames have more cool features. You can run pandas.Series? or pandas.DataFrame? to quickly view the documentation.

So there you have it! With Series and DataFrames in your toolkit, you've got the muscle to handle data like a pro. Whether you're diving into data for work, play, or sheer curiosity, Pandas has your back. I hope you find this blog helpful, thanks, and have a good day.

Top comments (2)

Collapse
 
sc0v0ne profile image
sc0v0ne

Hey @codeitmichael, i like your post. If you accept feedback, in the images where there is code as in your explanation of pandas, place the code in the code marking and below the output image, for beginners you will be able to follow the reading and check the result.

example:
input:

df = pd.read_csv('dataset_random.csv')
Enter fullscreen mode Exit fullscreen mode

output:
image

Collapse
 
codeitmichael profile image
Michael_Maranan

Hey @sc0v1n0, I'm glad you like it! Thanks for your feedback, I will update it and separate the code snippets and outputs to code blocks and pictures.