DEV Community

Marcos
Marcos

Posted on

Understand the difference between quantitative and categorical features

Quantitative and Categorical features

Learn about the different feature types that can be part of a dataset.

In the context of data analysis using pandas DataFrames in Python, understanding the difference between quantitative and categorical characteristics is crucial. Let's break down these concepts using clear explanations and intuitive analogies.

Quantitative vs. Categorical

The columns in a DataFrame are known as features of the dataset it embodies, which can be either quantitative or categorical.

Quantitative features, like height or weight, are those that can be expressed in numbers. These are the features for which we can compute sums, averages, and other numerical values.

  1. **Continuous: **Can take on any value within a range. Example: height, weight, temperature.
  2. Discrete: Can only take on specific and distinct values. Example: number of children, number of cars.
import pandas as pd

df_quant = pd.DataFrame({
    'Height': [1.70, 1.75, 1.60, 1.80],
    'Weight': [70, 80, 60, 90],
    'Age': [25, 30, 22, 28]
})

print(df_quant)
Enter fullscreen mode Exit fullscreen mode

Categorical features, such as gender or place of birth, involve values that categorize the dataset. These are the ones we would utilize with the groupby function.

  1. Nominal: They have no intrinsic order. Example: colors (red, blue, green), genders (male, female).
  2. Ordinal: Have an intrinsic order. Example: clothing sizes (P, M, G), classifications (low, medium, high).
import pandas as pd

df_cat = pd.DataFrame({
    'Color': ['Red', 'Blue', 'Green', 'Yellow'],
    'Size': ['M', 'G', 'P', 'M'],
    'Gender': ['Female', 'Male', 'Female', 'Male']
})

print(df_cat)
Enter fullscreen mode Exit fullscreen mode

Some features can be interpreted as both quantitative or categorical, based on the context. For instance, the year of birth can be treated as a quantitative feature when calculating average birth year statistics. Alternatively, it can serve as a categorical feature to group data by birth years.

Identifying Quantitative and Categorical Features

In Pandas, you can automatically identify whether a column is quantitative or categorical by using the column's data type (dtype). Generally, columns with int64 or float64 data types are quantitative, while columns with object type are categorical. Categorical columns can be converted to the category type for optimization.

import pandas as pd

# Creating a mixed DataFrame
df = pd.DataFrame({
    'Height': [1.70, 1.75, 1.60, 1.80],
    'Weight': [70, 80, 60, 90],
    'Color': ['Red', 'Blue', 'Green', 'Yellow'],
    'Size': ['M', 'G', 'P', 'M']
})

# Identifying quantitative and categorical columns
quant_cols = df.select_dtypes(include=['int64', 'float64']).columns
cat_cols = df.select_dtypes(include=['object']).columns

print("Quantitative columns:", quant_cols)
print("Categorical columns:", cat_cols)
Enter fullscreen mode Exit fullscreen mode

  1. Quantitative: Numerical values, continuous or discrete.
  2. Categorical: Values representing categories or groups, nominal or ordinal.

Each type of feature requires specific treatment and analysis, so it's important to identify them correctly in order to apply the appropriate techniques in your data analysis and predictive modeling.

Top comments (0)