DEV Community

Cover image for Descriptive Statistics
Shlok Kumar
Shlok Kumar

Posted on

2

Descriptive Statistics

Statistics is the backbone of data science, providing the essential tools and methodologies to extract meaningful insights from raw data. Data scientists rely heavily on statistics for various critical tasks—from cleaning messy datasets to creating powerful visualizations and building predictive models that offer glimpses into the future. Without these statistical foundations, transforming raw data into actionable insights that drive business success would be impossible.

What are Descriptive Statistics?

Descriptive statistics play a vital role in summarizing and organizing data, making it more understandable. They allow us to understand the central tendencies, variability, and distribution of our datasets.

Types of Descriptive Statistics

Descriptive statistics can be classified into three primary categories, each serving different purposes:

  1. Measures of Central Tendency
  2. Measures of Variability
  3. Measures of Frequency Distribution

1. Measures of Central Tendency

These statistical values describe the central position within a dataset. The three main measures are:

  • Mean: The average of the observations, calculated as follows:
  x̄ = x / n
Enter fullscreen mode Exit fullscreen mode

Where:

  • ( x ) = Observations
  • ( n ) = Number of terms

Here’s how to find the mean using Python:

  import numpy as np

  # Sample Data
  arr = [5, 6, 11]

  # Mean
  mean = np.mean(arr)
  print("Mean = ", mean)
Enter fullscreen mode Exit fullscreen mode

Output: Mean = 7.333333333333333

  • Mode: The most frequently occurring value in the dataset, useful for categorical data.
  import scipy.stats as stats

  # Sample Data
  arr = [1, 2, 2, 3]

  # Mode
  mode = stats.mode(arr)
  print("Mode = ", mode)
Enter fullscreen mode Exit fullscreen mode

Output: Mode = ModeResult(mode=array([2]), count=array([2]))

  • Median: The middle value that divides the dataset into two halves. If the number of elements is odd, the median is the center element; if even, it’s the average of the two central elements.
  import numpy as np

  # Sample Data
  arr = [1, 2, 3, 4]

  # Median
  median = np.median(arr)
  print("Median = ", median)
Enter fullscreen mode Exit fullscreen mode

Output: Median = 2.5

These measures form the foundation for understanding data distribution and identifying anomalies.

2. Measure of Variability: Understanding Data Dispersion

Understanding how data spreads out is crucial. Measures of variability quantify this spread, which is important for identifying outliers and assessing model assumptions. Key measures include:

  • Range: The difference between the largest and smallest data points.
  import numpy as np

  # Sample Data
  arr = [1, 2, 3, 4, 5]

  Maximum = max(arr)
  Minimum = min(arr)

  Range = Maximum - Minimum
  print("Maximum = {}, Minimum = {} and Range = {}".format(Maximum, Minimum, Range))
Enter fullscreen mode Exit fullscreen mode

Output: Maximum = 5, Minimum = 1 and Range = 4

  • Variance: The average squared deviation from the mean.
  import statistics

  # Sample Data
  arr = [1, 2, 3, 4, 5]
  print("Var = ", (statistics.variance(arr)))
Enter fullscreen mode Exit fullscreen mode

Output: Var = 2.5

  • Standard Deviation: A measure that indicates the extent of variation or dispersion in data, calculated as the square root of the variance.
  import statistics

  arr = [1, 2, 3, 4, 5]
  print("Std = ", (statistics.stdev(arr)))
Enter fullscreen mode Exit fullscreen mode

Output: Std = 1.5811388300841898

3. Measures of Frequency Distribution

A frequency distribution table summarizes how data points are distributed across different categories or intervals. It helps identify patterns, outliers, and the overall structure of the dataset. Key components include:

  • Data intervals or categories
  • Frequency counts
  • Relative frequencies (percentages)
  • Cumulative frequencies

Understanding these measures lays the groundwork for more advanced analytical methods and visualizations such as histograms or pie charts.

For more content, follow me at —  https://linktr.ee/shlokkumar2303

Hot sauce if you're wrong - web dev trivia for staff engineers

Hot sauce if you're wrong · web dev trivia for staff engineers (Chris vs Jeremy, Leet Heat S1.E4)

  • Shipping Fast: Test your knowledge of deployment strategies and techniques
  • Authentication: Prove you know your OAuth from your JWT
  • CSS: Demonstrate your styling expertise under pressure
  • Acronyms: Decode the alphabet soup of web development
  • Accessibility: Show your commitment to building for everyone

Contestants must answer rapid-fire questions across the full stack of modern web development. Get it right, earn points. Get it wrong? The spice level goes up!

Watch Video 🌶️🔥

Top comments (0)

AWS Q Developer image

Your AI Code Assistant

Automate your code reviews. Catch bugs before your coworkers. Fix security issues in your code. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay