DEV Community

Cover image for How to prepare realistic test data via OpenAI API in Postman
Natalia Demianenko
Natalia Demianenko

Posted on

How to prepare realistic test data via OpenAI API in Postman

How to generate diverse test data? And what if you need a realistic dataset for a product demonstration? Test data generation can be a challenging task. However, with the advent of AI, it has become much easier. This article demonstrates how to generate test data using the OpenAI API in Postman and automatically send it to your server. But let's start with an introduction.

If we have an environment designed specifically for testing, the data often consists of identical values with the word "test". It is quite difficult to navigate through such a set. We can use random data generators, but in this case, we lose the relevance to the subject area of the tested product. The realism of test data is especially important for a product demo. Manual creation of test data is an option, but if the schema of one data object is complex and a large number of objects are needed, it becomes time consuming and impractical.

So, how to generate realistic and diverse test data without high time costs? To obtain such result AI generation can be used. Let's assume we have a web application with a catalog of the most popular laptops. And in our test environment, the products themselves don't exist yet, so we need to create them. To do this, we will use the public API https://restful-api.dev/ which provides the ability to write our data via POST request in the following format:

{
   "name": String,
   "data": Object
}

Enter fullscreen mode Exit fullscreen mode

That is, according to the API documentation, the structure should contain a name and any data as an object. Let’s see how to create test data for our application in a few simple steps.

Step 1. Determine test data object structure

Let each product in our application contain the following data:

{
   "name": "Apple MacBook Pro 16",
   "data": {
      "year": 2019,
      "price": 1849.99,
      "currency": "USD"
      "CPU model": "Intel Core i9",
      "Hard disk size": "1 TB"
   }
}
Enter fullscreen mode Exit fullscreen mode

So the structure of the items we should generate is the following:

{
   "name": "String",
   "data": {
      "year": "Number",
      "price": "Number",
      "currency": "String",
      "CPU model": "String",
      "Hard disk size": "String"
   }
}
Enter fullscreen mode Exit fullscreen mode

Agreed, manually creating a dozen different and realistic items for subsequent testing can be challenging. This is where the OpenAI API comes to our aid.

Let's create a collection called "Create Test Data" and the first request called "Generate Test Data". In Pre-request script let's define JSON structure (to be able to easily change it before converting to string) and store it as a collection variable.

Define json structure

Step 2. Generate test data object via OpenAI API

To generate the items, we need to send a request to the following endpoint: https://api.openai.com/v1/chat/completions, which allows us to obtain a response for the given chat conversation. Let's construct the body of the request to the OpenAI API.

Create test data request in Postman

In the content field, we specify what we want to receive from the model in response, with the exact expected JSON structure defined on the Step 1.

The temperature parameter allows us to adjust the predictability of the model's responses. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. It accepts values from 0 to 2.

If the API of your application can accept an array of products for creation, you can add the n parameter, which allows you to configure the number of responses. Another way to generate a dataset is to change request on something like: Generate array (length 2) of objects describing a laptop items using the following structure… The way to generate a set affects on how to extract the information for the next request.

Since our test API only accepts one object at a time, we use the request for single object generation and do not pass the n parameter, and by default, it is set to 1.

To try the example you can import the following curl to your Postman workspace.

curl --location 'https://api.openai.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $OPENAI_API_KEY' \
--data '{
    "model": "gpt-3.5-turbo",
    "messages": [
        {
            "role": "user",
            "content": "Generate object describing a single laptop item using the following JSON structure \"{\"name\": String,\"data\": {\"year\": Number,\"price\": Number,\"currency\": String,\"CPU model\": String,\"Hard disk size\": String}}\""
        }
    ],
    "temperature": 0.7
}'
Enter fullscreen mode Exit fullscreen mode

OPENAI_API_KEY is your authentication key. Please visit your API Keys page to retrieve the API key you will use for your request.

You can play with the parameters to make the result suitable for your purposes. See OpenAI API documentation for more information.

The response to our request will be in the following:

{
    "id": "chatcmpl-7SB7rzphTMzpB9VrLDtUknZyPQjez",
    "object": "chat.completion",
    "created": 1686950307,
    "model": "gpt-3.5-turbo-0301",
    "usage": {
        "prompt_tokens": 51,
        "completion_tokens": 70,
        "total_tokens": 121
    },
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "{\n  \"name\": \"Dell Inspiron 15\",\n  \"data\": {\n    \"year\": 2021,\n    \"price\": 800,\n    \"currency\": \"USD\",\n    \"CPU model\": \"Intel Core i5-1135G7\",\n    \"Hard disk size\": \"512 GB SSD\"\n  }\n}"
            },
            "finish_reason": "stop",
            "index": 0
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

So, in the content field, we receive the test data that we wanted to generate - realistic and matching our structure. Now, let's store them in a collection variable for subsequent sending in the body of the next request.

Step 3. Store generated data as collection variable

To do this, in the "Test" tab of "Generate Test Data" request, we will write the following script:

var jsonData = pm.response.json();
const item = jsonData.choices[0].message.content;
pm.collectionVariables.set("item", item);
Enter fullscreen mode Exit fullscreen mode

Now, after sending the request, the script will execute and store the received response in the collection variable named "item".

The test data is ready, and it's time to use them.

Step 4. Send data to your server

Let's create another request in the collection called "Send Data".

curl --location 'https://api.restful-api.dev/objects' \
--header 'Content-Type: application/json' 
}'
Enter fullscreen mode Exit fullscreen mode

As the request body, we will send the JSON that was saved in the collection variable.

Send extracted data to app server

The collection is ready. To generate multiple products, let's setup the configuration to run multiple iterations.

Run collection

As a result, our application now has various real test data. Quickly and easily!
Please share in the comments how do you utilize the OpenAI API for testing purposes?

Top comments (6)

Collapse
 
phlash profile image
Phil Ashby • Edited

This is a nice tutorial, and always an interesting topic (test data), thank you!

I would be interested in your thoughts on the following:

  • Reproducibility / coverage. Would test data generated this way provide enough confidence in the system under test? What about comparisons (A/B testing, competitor comparison)?
  • Related to the above, boundary conditions (eg: from a specification), should these be included as manually added data?
  • Privacy - much of the information coming out of 'AI' (let's call them LLMs please!) is derived from the real information it has been trained on, what guarantees can be provided that people's privacy is not being violated?
Collapse
 
n_demia profile image
Natalia Demianenko

Thanks Phil for your feedback and insightful questions!

The article serves as a starting point to understand the process, but it's essential to improve upon it based on your own requirements.

Upgrading the prompt to consider boundary conditions and combining generated data with other sources, including manually added data, is an excellent way to enhance the coverage and accuracy of the testing process. I think it's a good topic for the next articles.

You raise a valid concern about privacy. But as I know OpenAI takes privacy seriously and has guidelines in place to protect user privacy. And it's important to understand that when using the OpenAI API, it's essential to handle sensitive or personal data with caution and ensure compliance with privacy regulations.

Thank you once again for your valuable feedback and contributions to the discussion.

Collapse
 
renanfranca profile image
Renan Franca

Great work on this tutorial, Natalia! πŸ’ͺπŸ‘©β€πŸ’» Your clear explanations and step-by-step guide made it easy to understand. It's impressive how you've utilized the OpenAI API to generate realistic test data, simplifying a task that can often be daunting. πŸš€πŸ”₯ Keep innovating and sharing your knowledge with us! πŸŒŸπŸ“š

Collapse
 
n_demia profile image
Natalia Demianenko

Thanks, Renan 😊 Nice to share useful info πŸš€

Collapse
 
igorboky profile image
Igor Boky

Nice approach, sometimes OpenAI responds differently, with a bit different structure of JSON or with no JSON at all.
Is it possible to prevent this during test data generation?

Actually it could be helpful to build a small web-site which would be a realistic test data generator based on the JSON structure, could be used by all QA engineers

Collapse
 
n_demia profile image
Natalia Demianenko • Edited

Thanks Igor, good point, we can define the json-schema and add the validator in Pre-request script of 'Send Data' request and cancel it if validation fails.
And good idea about web site, there are many tools for random test data generation but I haven't run across such tools for realistic data