DEV Community

Cover image for JR, quality Random Data from the Command line, part II
ugo landini
ugo landini

Posted on

JR, quality Random Data from the Command line, part II

In the first part of this series, we have seen how to use JR in simple use cases to stream random data from predefined templates to standard out and Apache Kafka on Confluent Cloud.

In this follow-up, we'll have a closer look at the JR data generation process and how you can use it to generate data which is usable to streaming applications.

Smart functions

We defined quality data across 2 dimensions:

  1. things that must be realistic "in themselves", like an IP address, or a credit card number
  2. things that are realistic if coherent to other data, like names, companies, emails, cities, zip codes, mobile phones, locale, etc.

Some JR template functions are “smart”, so let's talk a bit about type 2 data. Let's look at the predefined user template for example:

> jr template show user

{
  "guid": "{{uuid}}",
  "isActive": {{bool}},
  "balance": "{{amount 100 10000 "€"}}",
  "picture": "http://placehold.it/32x32",
  "age": {{integer 20 60}},
  "eyeColor": "{{randoms "blue|brown|green"}}",
  "name": "{{name}} {{surname}}",
  "gender": "{{gender}}",
  "company": "{{company}}",
  "work_email": "{{email_work}}",
  "email": "{{email}}",
  "about": "{{lorem 20}}",
  "country": "{{country}}",
  "address": "{{city}}, {{street}} {{building 2}}, {{zip}}",
  "phone_number": "{{phone}}",
  "mobile": "{{mobile_phone}}",
  "latitude": {{latitude}},
  "longitude": {{longitude}}
}
Enter fullscreen mode Exit fullscreen mode

the user template doesn't contain any logic to correlate type 2 data, but if you try to run the template, you'll see that everything works as expected. Let's run the template with IT localisation for example:

jr run --locale IT user

{
  "guid": "3c37f1d2-c4d4-4a10-ac9e-eefa0d0a4fc1",
  "isActive": false,
  "balance": "€8106.36",
  "picture": "http://placehold.it/32x32",
  "age": 21,
  "eyeColor": "green",
  "name": "Maria Rizzo",
  "gender": "F",
  "company": "Evil Partners",
  "work_email": "maria.rizzo@evilpartners.com",
  "email": "maria.rizzo@hotmail.com",
  "about": "Lorem ipsum dolor sit amet, laoreet ligula. Curabitur id nisl ut Lorem sit amet justo pulvinar aliquet accumsan sit amet",
  "country": "IT",
  "address": "Lodi, Piazza dei Miracoli 80, 26900",
  "phone_number": "0371 95903936",
  "mobile": "3899578232",
  "latitude": -22.4702,
  "longitude": -4.6067
}
Enter fullscreen mode Exit fullscreen mode

As you can see, name, gender, email, country, address, zip code and phones are all coeherent. That's because JR, under the hood, keep track of everything and reuse data previously generated in the template. So, if you generate a work_email, the function will reuse name, surname and company.
Zip code is a reverse regex pattern which is valid for the city, mobile phone is valid for the country, and so on. At the moment some JR localisations are in progress, so pls contribute if you want to help us!

This is pretty simple and straightforward, so let's look now at relations between data.

Emitters

So far we have seen simple generation use cases. If you need to generate related data, you need more tools. JR comes preconfigured with some example emitters:

jr emitter list

List of JR emitters:

shoe
shoe_customer
shoe_order
shoe_clickstream
Enter fullscreen mode Exit fullscreen mode

What's an emitter? It's basically a preconfigured jr job, and it's really helpful when you have to generate different entities with different generation parameters and relations between them.

Let's study the preconfigured shoe example:

jr emitter show shoe

Name:shoe
Locale: us
Num: 0
Frequency: 0s
Duration: 0s
Preload: 100
Output: stdout
Topic: shoes
Kcat: false
Oneline: false
Key Template: null
Value Template: shoe
Output Template: {{.V}}
Enter fullscreen mode Exit fullscreen mode

this will generate just 10 shoes in preload phase (i.e. before the generation phase), and no more: frequency and duration are both at 0. So this is useful for more static "table-like" stuff.

jr emitter show shoe_customer

Name:shoe_customer
Locale: us
Num: 1
Frequency: 1s
Duration: 10s
Preload: 20
Output: stdout
Topic: shoe_customers
Kcat: false
Oneline: false
Key Template: null
Value Template: shoe_customer
Output Template: {{.V}}
Enter fullscreen mode Exit fullscreen mode

For shoe_customer we have a preload of 20, but it will also generate a customer per second for 10 seconds. So it's static, but less than the shoes, which is reasonable. You don't have a new product to sell every second, but you may have new customers.

jr emitter show shoe_clickstream

Name:shoe_clickstream
Locale: us
Num: 1
Frequency: 100ms
Duration: 10s
Preload: 0
Output: stdout
Topic: shoe_clickstream
Kcat: false
Oneline: false
Key Template: null
Value Template: shoe_clickstream
Output Template: {{.V}}
Enter fullscreen mode Exit fullscreen mode

shoe_clickstream is much more dynamic, it emits 1 click every 100ms, with no preload.

jr emitter show shoe_order

Name:shoe_order
Locale: us
Num: 1
Frequency: 500ms
Duration: 10s
Preload: 0
Output: stdout
Topic: shoe_orders
Kcat: false
Oneline: false
Key Template: null
Value Template: shoe_order
Output Template: {{.V}}
Enter fullscreen mode Exit fullscreen mode

shoe_order is similar, no preload and a lower frequency.

But wait, this is just a way to simplify the command line and differentiate frequency, duration, preload and other parameters for every template: where are the relations?

Let's look at the show template:

jr template show shoe

{{$id:=uuid}}{{add_v_to_list "shoes_id_list" $id}}{
  "id": "{{$id}}",
  "sale_price": "{{amount 200 2000 ""}}",
  "brand": "{{from "sport_brand"}}",
  "name": "{{randoms "Pro|Cool|Soft|Air|Perf"}} {{from "cool_name"}} {{integer 1 20}}",
  "rating": "{{format_float "%.2f" (floating 1 5)}}"
}
Enter fullscreen mode Exit fullscreen mode

Here you can see that a random uuid is assigned to a $id variable, and then added to a shoes_id_list with the add_v_to_list command.
The list is automatically shared with all the running templates, so to have a working relationship you just need to get random ids from this list instead of generating them.

jr template show shoe_clickstream
{
  "product_id": "{{random_v_from_list "shoes_id_list"}}",
  "user_id": "{{random_v_from_list "customers_id_list"}}",
  "view_time": {{integer 10 120}},
  "page_url": "https://www.acme.com/product/{{random_string 4 5}}",
  "ip": "{{ip "10.1.0.0/16"}}",
  "ts": {{counter "ts" 1609459200000 10000 }}
}
Enter fullscreen mode Exit fullscreen mode

In the shoe_clickstream template that's pretty clear: product_id and user_id are not random but come from shoes_id_list and customers_id_list, so there is full referential integrity.

If you need to have more than 1 value from a list, you can use random_n_v_from_list function instead of random_v_from_list. This function is guaranteed to peek n different values form the list, so is ideal for 1:many relationships.

to start all the emitters, just type:

jr emitter run
Enter fullscreen mode Exit fullscreen mode

A goroutine per emitter will start producing random data, but not too random: coherency and integrity are important for your streaming applications!

Conclusions

We have seen how to use JR in more advanced use cases, streaming quality random data with referential integrity.
In the next part of this series, we will see how to use REST apis with JR.
In the meanwhile, happy streaming!

Top comments (0)