Posted on

# Data Visualization with Python pt. ii

The notebook of this lecture is in my GitHub repo:

Part 1: What on earth are "Histograms" <?>

Suppose you're in charge that a website always load fast and one day the average page loading time in... lets say June is significantly slower than the previous 5 months.

This type of scenarios are where the histograms really shine, because they show a kind of history in their graphs.

Histograms helps you understand the distribution of a numeric value in a way that cannot with mean or median alone.

Part 2: Histograms with Matplotlib

Let's import Pandas and Matplotlib:

For this example I'm going to be using a larger dataset called "dataii.csv", let's import it:

For this part, I'll create histograms using the 'subplot()' function.

To check which continents are included within the data I'll use the 'set()' function:

And this returns the following output, showing all the continents grouped:

Now, for example, if you need to select the data of Asia and Europe in 2007, first you need to select the data for 2007:

Then select the data for Asia out of the 'data2007' variable, and then the same procedure for Europe:

Check both 'asia2007' and 'europe2007' with the 'head()' function:

To check how many countries are in these two newly created datasets let's use the 'set()' function:

If you don't want to see the complete list of countries, instead only the number of countries for reach data set, use the 'len()' function combined with the 'set()' function:

Use this combined with the 'print()' function for both datasets, ant this should be the output:

Let's now find the mean and median of GDP per Capita in Asia and Europe in 2007:

To create a histogram of GDP per capita in Asia, type:

Now, to compare this histogram of the GDP Per Capita of Asia with the GDP Per Capita of Europe, both of 2007, lets use the 'suplot()' function:

And the result is the following histogram:

Part 3: Comparing Complex Histograms

Now, let's compare Europe and America's life expectancy in 1997.

There are many ways to solve this problem, but my approach is the following:

First select only the data of 1997:

Then, from newly created dataset ('data97') extract America's and Europe's data:

Now, to check the number of countries in each new dataset:

Now to get the mean and median life expectancy of each new data set:

Now, finally to compare both datasets in histogram:

Being the final chart the following: