DEV Community

Cover image for Pandas Beginner's Guide
Mohammad Hasibur Rahman
Mohammad Hasibur Rahman

Posted on

Pandas Beginner's Guide

Import the libraries

import numpy as np
import pandas as pd
Enter fullscreen mode Exit fullscreen mode

Series
A Series is very similar to a NumPy array but the only difference is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location

Creating a Series

labels = ['a', 'b', 'c']
my_list = [10,20,30]
arr = np.array([10,20,30])
d = {'a':10,'b':20,'c':30}
Enter fullscreen mode Exit fullscreen mode

In the first line of code, I am calling my_list and using the variable data and it prints my_list in a column with indexes from 0 to 2.

In the same way, second line is setting variable data, index for my_list, labels respectively. This specifies the indexes to be the elements of my_list and setting the values to the indexes to labels.

Similarly, you can call the elements from the code cell above without using variables.

Image description

Using an Index

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).

In the first line of code, named the variable "ser1". Then called the Series function where we can add multiple elements. So I added index to the country names and set values to the names 1,2,3,4. In the second line, we called the ser1 variable to print the elements inside the variable.

Same way, we name the variable "ser2" and print the elements inside.

Fifth line: You can call a specific element by calling the name of the element followed by the variable name.

In the last line, you can see I have added both variables ser1 and ser2 to add the eleements together. But when it prints the output, it shows that values for the indexes Italy and USSR has NaN values because they were only elements of either the variable ser1 or ser2

Image description

DataFrames
We can think of a DataFrame as a bunch of Series objects put together to share the same index.

Import the randn to load dataset. Name variable df and call the Dataframe function. Inside the function DataFrame I specified the number of
rows(5) and columns(4) and set the indexes to 'A B C D E' and columns to 'W X Y Z'.

Here the split() method is used to split a column into multiple columns in Pandas.

Then we call " df" to print the DataFrame.

Image description

Selection and Indexing

You can select each indexes by mentioning the indexes that you want to select followed by the "df" variable. "You may name the variable to something else instead of df. Here df is short for DataFrame."

Image description

Creating a new column

Here you can see, a new column has been created where the column name is 'new'. However, the new column is addition of column "W" and "Y", added them together using "+".

Removing Columns

Use df.drop() to drop a column and set the axis to 1. Axis = 1 means you are dropping a column, not a row.

Image description

Removing Rows

Similarly, use df.drop() but set the axis = 0 to specify that you are dropping a row. Here, 'E' is the column that you are dropping.

On the second line of code, is another way how you can drop a row. here iloc is short for index location. Specified the row number 2 and dropped it.

On the third line of code is how you can select multiple rows and drop it.

On the fourth line of code, you can select multiple rows and columns. Then you can drop all of them together.

Note: When you are dropping a column, make sure that it won't affect your accuracy rate of the dataset you are working on. There are many ways of handling such columns and rows which has null values.

Image description

Conditional Selection

First line: Selected the values greater than 0 which shows the boolean (True or False) values

Second line: Returns the values that are greater than 0 and values lesser than 0 shows NaN.

Third line: Specifies column "W" and drop the values in W that are less than 0. Here you can see, the row "C" has value less than 0. So it dropped row "C".

Image description

Top comments (0)