DEV Community

GharamElhendy
GharamElhendy

Posted on

Analyzing Data Sets Using Pandas and Matplotlib

1. Looking Through Certain Values in a Column

import pandas as pd
df = pd.read_csv('file_name.csv')
df.head()


For example, if we have a column with two values (X and Y), we can create a dataframe that contains only one of the two values:

df_x = df[df['column_title'] == 'X']
df_x.head()


Indexing original data frame with "mask" to return all the rows in which the value of "mask" is true.

I.e: The rows in which the row's value is 'X'

mask = df['column_title'] == 'X'
print(mask)


2. Getting summary statistics, which includes count, mean, standard deviation, minimum, max, 25%, 50%, and 75%:

df_x['column'].describe()


3. Visual Comparisons

import matplotlib.pyplot as plt
% matplotlib ((used in notebooks to display visualizations in the notebook)

Note: While these visual representations don't give us a definite answer regarding causality, we can notice some correlations that could give us some insight into the relationships between variables.

Top comments (0)