Numerical variables, also known as quantitative variables, are the type of data that represent something measurable or countable like frequency, measurement, etc. Another attribute of numerical variables is that they are always numbers that can be placed in a meaningful order with consistent intervals.
As examples of quantitative variables we may mention:
- Production units
- Movie Ratings
Numerical variables may be either discrete or continuous.
Discrete values are the result of counting, like when we count how many goals a football team has scored in a season. Here, the data take certain numerical values, like 60, 65, 72, and so on.
On the other hand, continuous values are the result of a measurement. For instance, we may measure the weights in kilograms of football team players, and the data will assume continuous values inside a range, like 84.1kg, 74.89483kg.
Buckets and bins are the way we may organize the numerical data collected in a meaningful order with consistent intervals to analyze and make insights from them. For example, we might collect the number of movies produced in the 20th Century and put them in buckets of 10 years, and as result, we could see the evolution of the Movie Industry in the last century.
Using pandas, we will load the dataset, but only the Rating column, which is a typical numerical variable. The users rated the Apps from 1.0 to 5.0.
import pandas as pd import plotly.express as px from collections import Counter df = pd.read_csv("./data/googleplaystore.csv", usecols=['Rating']) # Drop missing values df.dropna(axis=0, inplace=True) ratings = df.Rating # Drop a outline rating of 19.0 (from some error) ratings.drop(10472, inplace=True) # Plot a histogram fig = px.histogram(ratings, x='Rating', title='Google Play Store Apps Ratings', template="simple_white") fig.show()
The chart we see above is a Histogram, which seems like the Bar Chart we've plotted in the Categorical Variable post, but actually they have some important differences. In a Histogram there is no space between the bars, and the intervals are equally spaced, as expected to numerical values.
The shape of the histogram already gives us useful information. The histogram above is left-skewed (it has a tail to the left), so we may conclude that most Apps were well evaluated because the highest rectangles are on the right side of the histogram, where we have the highest rates (between 4.0 and 5.0).
Other shapes a histogram can have are right skew, symmetric, bimodal, uniform. Perhaps we will see more examples of histogram shapes in the next posts!
courses.lumenlearning.com | 1.2 Data: Quantitative Data & Qualitative Data 🔎
online.stat.psu.edu | 1.1.1 - Categorical & Quantitative Variables 🔎
YouTube | Brandon Foltz | Statistics 101: Descriptive Statistics, Histograms