DEV Community

Joseph D. Marhee
Joseph D. Marhee

Posted on • Originally published at Medium on

Sampling Methods in Python


Photo by Clay Banks on Unsplash

Sampling is a fundamental concept in statistics and data analysis. It involves selecting a subset of individuals or data points from a larger population for analysis. In Python, there are various sampling methods available that allow us to extract representative samples from a population.

In this document, we will explore different sampling methods in Python and provide code examples for each method. We will cover the following sampling methods:

  1. Simple Random Sampling
  2. Stratified Sampling
  3. Cluster Sampling
  4. Systematic Sampling

Let’s dive into each method and see how it can be implemented in Python.

1. Simple Random Sampling

Simple random sampling involves randomly selecting individuals from a population without any specific criteria. Each individual has an equal chance of being selected. This method is commonly used when the population is homogenous.

import random

population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sample_size = 5

sample = random.sample(population, sample_size)
print(sample)
Enter fullscreen mode Exit fullscreen mode

Output:

[3, 5, 9, 7, 1]
Enter fullscreen mode Exit fullscreen mode

2. Stratified Sampling

Stratified sampling involves dividing the population into distinct subgroups or strata based on certain characteristics. Then, a random sample is selected from each stratum proportionate to its size. This method ensures that each subgroup is adequately represented in the final sample.

from sklearn.model_selection import train_test_split

data = [...] # Your dataset
labels = [...] # Labels for each data point

# Stratified sampling using train_test_split from scikit-learn
train_data, test_data, train_labels, test_labels = train_test_split(data, labels, test_size=0.2, stratify=labels)
Enter fullscreen mode Exit fullscreen mode

3. Cluster Sampling

Cluster sampling involves dividing the population into clusters or groups. Instead of selecting individuals, entire clusters are chosen randomly. This method is useful when it is difficult or expensive to access individual elements of the population.

import random

clusters = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
cluster_sample_size = 2

selected_clusters = random.sample(clusters, cluster_sample_size)
sample = []

for cluster in selected_clusters:
    sample.extend(cluster)

print(sample)
Enter fullscreen mode Exit fullscreen mode

4. Systematic Sampling

Systematic sampling involves selecting individuals from a population at regular intervals after an initial random start. This method is useful when the population is large and ordered in some way.

population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sample_size = 3

start_index = random.randint(0, sample_size - 1)
sample = [population[i] for i in range(start_index, len(population), sample_size)]

print(sample)
Enter fullscreen mode Exit fullscreen mode

These are just a few examples of sampling methods available in Python. Depending on your specific requirements and the characteristics of your data, you may need to adapt or combine these methods to achieve the desired sampling outcome.

Remember that sampling is a powerful technique for analyzing large datasets and making inferences about a population. It allows us to draw meaningful conclusions while reducing computational costs and time.

Top comments (0)