DEV Community


Posted on

Generating negative tests with GPT-4

As developers, we strive to write efficient and error-free code, but let's face it - mistakes happen. In order to catch those pesky bugs before they wreak havoc on our applications, we rely on testing. While positive tests ensure our code works as intended, negative tests play a crucial role in validating that our applications are robust enough to handle unexpected input and edge cases. In this blog post, we'll explore the importance of negative tests and how we can leverage the power of GPT-4 to generate them.

I'll use Pythagora to generate positive test data which we will feed into GPT to create negative tests.


Negative tests

Negative tests, while sometimes overlooked, are an essential part of ensuring that our applications are robust and reliable.

So, what exactly are negative tests? These tests are designed to evaluate how well an application can handle unexpected inputs and conditions. They aim to make sure that the application won’t crash, produce incorrect results, or return errors when faced with inputs that deviate from what is expected. For example, if you make an API request to add -5 t-shirts to a cart. This looks quite irrelevant but exactly these kinds of requests will be the ones that crash your entire server.

The importance of negative tests cannot be overstated. They help us identify potential issues in our code, which might not be apparent during positive testing. By uncovering these issues, we can improve the performance, security, and overall quality of our applications, leading to increased customer satisfaction. In addition, negative tests can help us pinpoint areas of our code that may need improvement or refactoring.

Creating an extensive suite of negative tests can be quite time-consuming and requires a deep understanding of the application and its potential weak points. This is why the task often falls on the shoulders of the QA team. As expert bug hunters, they dedicate their time and attention to considering all the ways they can break the server and identify vulnerabilities.

Looking at different types

Types of negative tests

Our primary focus here is on generating negative integration tests for APIs. To do this, we will manipulate various API request parameters, such as the request body and the URL path. By altering these values, we can create a range of negative test scenarios that can help us uncover potential issues in our applications.

When creating negative tests, it’s essential to consider the different values that could potentially cause problems. For instance, we might try using:

  • Empty or missing required fields
  • Invalid data types for fields (eg. number instead of a string and vice versa)
  • Invalid field values (e.g., exceeding character limits, malformed data)
  • Duplicated data in arrays
  • Extra or irrelevant keys in the payload
  • Potential security vulnerabilities (e.g., XSS, SQL injection)
  • Invalid or missing content-type headers
  • Missing or invalid authentication headers (e.g., Bearer token)
  • Incorrect data structure (e.g., array instead of an object, or vice versa)
  • JSON formatting issues (e.g., unmatched brackets, invalid Unicode characters)
  • I actually worked with GPT to list out a much bigger list of types of negative tests which I fully listed in this post.

By systematically exploring these variations, we can identify potential weak points in our application and ensure that it can handle unexpected input and conditions gracefully.

Now, imagine having 50 endpoints and going through all of these trying to create hundreds and hundreds of negative tests for all of them.

Thankfully, GPT-4 can do quite an amazing job with this!

How can GPT help create negative tests

By using GPT-4, we can create a range of negative test scenarios for a given API request, saving time and effort typically spent on brainstorming and manually designing these tests.

For this experiment we’ll try creating negative tests that completely break the server. So, the idea is to create an API request that will make the server return the status code higher than 500.

If you think about it, we’ll need to send a whole bunch of API requests that might do this so we’re going for the volume here.

Since GPT API is not cheap or fast, we want GPT to send as optimized data as it can. Each Pythagora test has ~3000 tokens so we’d be paying 9 cents per test so to get 1000 tests generated, we’d need to pay $90. So, my idea is to get GPT to send us only a list of parameters we want to change and we’ll create function that will augment the existing test data with the negative value GPT responded.

Creating integration tests with Pythagora

Pythagora is super easy to use so, we’ll just take a minute to create the data for a couple of integration tests. You can simply install it with npm i pythagora and then just run the command which will capture the API request data.

npx pythagora --init-script "npm run start"
Enter fullscreen mode Exit fullscreen mode

Now, I’ll just make a couple of API requests. I used the browser and clicked around so that my frontend code makes API requests. It literally took me 5 seconds to capture 7 integration tests.

Ok, now that I have those, I’ll use the Pythagora test data to send it to GPT so it can create negative tests.

Tuning machine

Tuning GPT to generate negative test data

First, we’ll tune the GPT-4 model by providing it with a prompt and examples of an API request along with a corresponding negative test, similar to those mentioned earlier.

First, the system message:

You are a QA engineer and your main goal is to find ways to break the application you’re testing.

Then, the actual prompt:

I will give you an API request data and I want you to give me the negative test data. Negative test data is a JSON object with an array of possible values for parameters in the API request data that might break the server – they will be used to create negative tests.
For example, if an API request data looks like this:

    "endpoint": "/api/category/add",
    "httpMethod": "POST",
    "body": {
      "name": "sdcvdvc",
      "description": "ccbvc",
      "products": [
      "headers": {
            "authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6IjY0MjU2ZDljNDA4NzgzMzE0NGQ3MWMxNiIsImlhdCI6MTY4MDE3NDU0MiwiZXhwIjoxNjgwNzc5MzQyfQ.hy-D-AdsWSuUg3f1CH03FBHQFMFtwmklRwDCckvbzHk",
        "accept": "application/json, text/plain, */*"

You would return something like this:

  "": [...],
  "body.description": [...],
  "body.products": [...],
  "headers.authorization": [...],
  "headers.accept": [...],
  "endpoint": [...],
  "httpMethod": [...]

Make sure that each array consists only of values that could break the server and, when you answer, that you don’t say anything else except the JSON object.

Also, here are types of negative tests to help you think:
— here I added the list from the blog post I showed above which is too big to paste here —

First, respond only with “Got it” to process this message.

GPT responded with “Got it“. So, In the next message, I sent it the test data:

Here is API request data:

  "endpoint": "/api/address/add",
  "method": "POST",
  "body": {
    "isDefault": true,
    "address": "asasd",
    "city": "asdasd",
    "state": "asdasd",
    "country": "asdasd",
    "zipCode": "12312"
  "query": {},
  "params": {},
  "headers": {
    "content-length": "106",
    "authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6IjY0MjU2ZDljNDA4NzgzMzE0NGQ3MWMxNiIsImlhdCI6MTY4MDE3NDU0MiwiZXhwIjoxNjgwNzc5MzQyfQ.hy-D-AdsWSuUg3f1CH03FBHQFMFtwmklRwDCckvbzHk",
    "content-type": "application/json",
    "accept": "application/json, text/plain, */*",
    "accept-encoding": "gzip, deflate, br",
    "cookie": "Webstorm-a7738556=37894940-6565-40a7-9ec5-b3a59a3f74e6; connect.sid=s%3AYt5uQ-ZhdJwLJja_sd814I2iAKkGpwzM.B%2F%2Btxr1eUYCZhBOzqQOsBsbTR8j%2FZn%2BbD1VFiJBRzAc"

Give the values for negative tests but make sure that you don’t give values that might NOT break the server but only something that might trigger an error. Don’t start the response with “json" and don't end it with "” but return ONLY the valid JSON. Focus on adding proper brackets and quotes so that the response is a valid JSON. Make sure that you respond with only a valid, executable JSON object and nothing else (eg. you can’t write [...] because that cannot be executed). Also, try to give, at least, 10 values for each parameter.

Finally, after this message, GPT responds with the actual result that we’re looking for:

  "endpoint": [
  "method": [
  "body.isDefault": [
    "{\"prop\": \"value\"}",
    "[true, true]",
  "body._id": [
    "{\"id\": \"64256deb4087833144d71c25\"}",
    "64256deb 4087833144d71c25"
  "body.address": [
    "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliqua.",
    "{\"address\": \"123 Street\"}",
    "하늘을 날아올라 나무에 눈을 맞추다",
    "SELECT * FROM users;"
  "headers.content-length": [
    "{\"content-length\": 221}",
  "headers.authorization": [
    "Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==",
    "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IiIsImlhdCI6MTY4MDE3NDU0MiwiZXhwIjoxNjgwNzc5MzQyfQ.hy-D-AdsWSuUg3f1CH03FBHQFMFtwmklRwDCckvbzHk",
    "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.invalid_base64.encode",
    "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6IjY0MjU2ZDljNDA4NzgzMzE0NGQ3MWMxNiIsImlhdCI6MTY4MDE3NDU0MiwiZXhwIjoxNjgwNzc5MzQyfQ==.hy-D-AdsWSuUg3f1CH03FBHQFMFtwmklRwDCckvbzHk",
    "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6IjY0MjU2ZDljNDA4NzgzMzE0NGQ3MWMxNiIsImlhdCI6MTY4MDE3NDU0MiwiZXhwIjoxNjgwNzc5MzQyfQ.",
    "apikey 1234567890",
    "Digest username=\"Mufasa\", realm=\"\", nonce=\"dcd98b7102dd2f0e8b11d0f600bfb0c093\", uri=\"/dir/index.html\", qop=auth, nc=00000001, cnonce=\"0a4f113b\", response=\"6629fae49393a05397450978507c4ef1\", opaque=\"5ccc069c403ebaf9f0171e9517f40e41\""
  "headers.content-type": [
  "headers.accept": [
  "headers.accept-encoding": [

This gives me the data which I can then process and create one test from each of the items in these arrays. I created a short function that takes in the original test and then iterates through the data GPT responded with. From each of the values in all these arrays, the function creates an entire test so that we try every single value GPT responded with.

Looking at the results


After setting everything up and running our experiment, the results were quite impressive. With this approach, we were able to generate 2001 tests without any manual effort. By using the power of GPT-4 with Pythagora, we were able to quickly create a comprehensive suite of negative tests for our API server.

Upon analyzing the generated tests, I discovered that there were three endpoints in the MERN e-commerce Github repo that could potentially break it. The reasons for the errors were:

  1. Sending an array instead of an object for an added address
  2. Sending emojis as a password field (this was pretty interesting one)
  3. Sending an invalid Mongo object id in the URL

It’s worth noting that the process of generating these tests took around 30 minutes to generate all 2001 tests. But, you know – if you start the process before taking a break or moving on to another task, and when you return, you’ll have a vast array of tests ready to help you uncover potential issues in your system. So, I was pretty satisfied.

As for the cost, generating these tests using GPT-4 amounted to $1.5 in OpenAI tokens. Considering the amount of time and effort saved by automating the creation of these negative tests, this is quite reasonable IMO.

Try it yourself

Finally, if you want to try it out yourself, you can open negative_tests branch on the Pythagora Github repo. We didn’t want to put it to main until we test it out thoroughly.

Everything here is completely open sourced so you will have to use your own OpenAI API key. Make sure that you have OPENAI_API_KEY variable in your config. It will be used from the process environment variables like this:

const configuration = await new Configuration({
    apiKey: process.env.OPENAI_API_KEY,
Enter fullscreen mode Exit fullscreen mode


This was a pretty cool thing to explore for me personally. You now understand how GPT-4 can be used to create negative integration tests for your API server by systematically altering various parameters and generating a wide range of test scenarios.

I introduced Pythagora, an open-source tool that captures server activity and generates automated integration tests, which we then sent to GPT-4 to create negative tests.

I’m working on Pythagora and I’m hoping to help developers spend less time on writing tests and more on creating the core codebase. It would mean the world to me if you could show support by starring the Pythagora Github repo.

If you try Pythagora, I’d be happy to hear your thoughts on it and, of course, help you out if you get stuck somewhere. Just let me know at

Top comments (0)