DEV Community

3slee
3slee

Posted on • Updated on

Achieving Highly Random Test Data in Python Based Load Testing with Faker

Introduction

Recently I worked with a customer in Video Game Industry on Proof of Concepts (POC) that measures Queries per second (QPS) for database that includes tables with each record at 1MB or more.

Initially, I relied on random methods and custom methods to mimic the test data. At around 1k QPS for up to the first half an hour demonstrated stable environment but quickly things started going out of order after a while. QPS dropped quickly, and Read Amplification spiked (as shown in the pictures below) that has dragged the system to below 100 QPS.

Image description

Image description

Image description

What happened was the custom test data created in this POC created HotSpot in CockroachDB which slowed the whole cluster down.

I realized that I had to look for a new way to achieve highly dispersed and distributed data that will not cause HotSpot. There are recommendations to avoid HotSpot in CockroachDB but I also had additional requirements (more details on the requirement to follow at the bottom of the page)

How to use the Standard Providers in Python

Support for Miscellaneous Data Types

Performance testing for a company in Video Game Industry required support for additional data types to store "game assets" and "map info" in JSON format and many other.

Faker supports over 17 types of miscellaneous data types including CSV, Fixed Width, Image, JSON, MD5, SHA256, TAR, UUID4, ZIP and more. Link

>>> Faker.seed(0)
>>> for _ in range(5):
...     fake.json(data_columns=[('Name', 'name'), ('Points', 'pyint', {'min_value':50, 'max_value':100})], num_rows=1)
...
'{"Name": "Norma Fisher", "Points": 66}'
'{"Name": "Katelyn Hull", "Points": 69}'
'{"Name": "Heather Snow", "Points": 63}'
'{"Name": "Diane Campos", "Points": 89}'
'{"Name": "Victoria Patel", "Points": 95}'
Enter fullscreen mode Exit fullscreen mode

Support for Multi-Locale and Multi-Language

The customer released mobile games that had over millions of downloads in many of the leading mobile game stores in over a dozen of countries. In order to accurately reflect the real world test scenario, this POC had to include test data in multi-locales and multi-languages.

Faker includes over 80 localized providers to cover broad types of addresses, banking information, currencies, date, names, phone numbers, SSNs, time in each locale's unique representation in its language. Link

Example:

  • class faker.providers.person.ko_KR.Provider(generator: Any)
  • Bases: faker.providers.person.Provider
>>> Faker.seed(0)
>>> for _ in range(5):
...     fake.first_name()
...
'정희'
'유진'
'지후'
'영호'
'미경'
Enter fullscreen mode Exit fullscreen mode

Reference

The original Python script, which I have modified in this POC has been written by Glenn Fawcett. Additional details can be found here

Top comments (0)