Richard Forshaw

Posted on Apr 24, 2023 • Originally published at developdeploydeliver.com

ChatTDD: Using AI as your Programming Buddy

#programming #ai #development #tdd

Previously I wrote about transforming an idea into working code using ChatGPT as an assistant. I took an idea and put ChatGPT through a BDD process and an Example Mapping process.

I was very impressed with the results, but also a bit disappointed from its lack of detail and also lack of determinism. It was not enough to rely on, but I do think it made a useful tool to simply speed this process up.

What about the second step - transforming an example scenario into code?

Picking up from Requirements

In the last post, ChatGPT helped with creating BDD-type scenarios for a simple email parser. Now we will try to convert some of those scenarios into tests and code, using a typical TDD process. We will use one failure case and one success case. The cases are not exactly how ChatGPT gave them to us because as we learnt it was not very good at going down to the detail level. So in between that step and this step there needs to be some manual review and refinement of the cases.

[In this post we will look at the failure case. The full article has details of implementing a success case.]

The failure case:

Scenario: Email with no file attachments returns error
  Given an email with no image file attachments
  When the email is sent to 'The System'
  Then the system should return an error code with the message "One image file attachment must be present."

Setup

Configure ChatGPT

To turn ChatGPT into a developer, I gave it the following system command:

You are an expert python software developer. You know how to understand BDD scenarios of the form:

Scenario: scenario title
Given (an initial context)
When (an event occurs)
Then (a testable outcome is satisfied)

You know TDD and how to write tests first using pytest and pytest-bdd. When writing code, you favour functional programming and avoid side-effects.

You have been asked to assist another python software developer to develop tests and code for a small library to parse email content according to the given BDD specifications.

Test 1: Simple failure case

Writing the test case

I prompted ChatGPT with this prompt:

Given the following scenario:

<scenario from above>

Write a pytest-bdd test which satisfies this scenario.

and I received the following code:

import pytest
from email_parser import parse_email

@pytest.mark.parametrize("email_content, expected_output", [
    ("Hello World", "One image file attachment must be present."),
    ("", "One image file attachment must be present."),
    ("image.jpg", "One image file attachment must be present."),
])
def test_parse_email_no_file_attachments(email_content, expected_output):
    # Given an email with no image file attachments
    email = {
        "content": email_content,
        "attachments": []
    }

    # When the email is sent to 'The System'
    output = parse_email(email)

    # Then the system should return an error code with the message "One image file attachment must be present."
    assert output == (False, expected_output)

Not quite the start I was expecting. Some things to note:

It assumed a name for the module and for the function under test, which is fine. We will see if it remembers this
It assumed an input format to the function under test as a single json structure
It assumed an output from the function under test as a tuple.
It has not used the pytest-bdd library as we requested in the system message
The parameterization of the test is not really necessary. The test really only needs one example of having no attachments.

The second item is interesting, and I like it. The design that initially came into my head was that the function in question would parse the entire email message and handle the extraction of body text and attachments. When I initially saw the output from ChatGPT I was annoyed but then it made me think: this is actually a better solution. Perhaps it was even following my direction to program in a functional way?

I like writing things in a functional way. Typically it is good to write simple functions that do only one thing with no side-effects, which means they are easier to tests. In the long run I probably would have refactored the code to do this, but the ChatGPT output made me take this decision earlier, and I changed the design in my head to first parse the email into its sub-parts and then process the parts according to my requirements.

To fix the other issues, I instructed ChatGPT to refine the solution, by asking it to:

use the pytest-bdd library format, providing it with a scenario file name
return a json structure instead of a tuple

The updated test function was this:

import pytest
from pytest_bdd import given, when, then, scenario
from email_parser import parse_email

@scenario('../features/email-parsing.feature', 'Email with no file attachments returns error')
def test_parse_email_no_file_attachments():
    pass

@given("an email with no image file attachments")
def email_with_no_file_attachments():
    email = {
        "content": "Hello World",
        "attachments": []
    }
    return email

@when("the email is sent to 'The System'")
def send_email_to_system(email_with_no_file_attachments):
    return parse_email(**email_with_no_file_attachments)

@then("the system should return an error code with the message \"One image file attachment must be present.\"")
def check_error_message(send_email_to_system):
    expected_output = {
        "success": False,
        "error_message": "One image file attachment must be present."
    }
    assert send_email_to_system == expected_output

That looks like a pretty good bdd-style test to me.

Implementing Code

When I ran the code, I got the usual errors, as you would expect from real TDD:

No module
No function
No scenario file

I fixed these up and then asked ChatGPT to write the code.

Now write the code which satisfies the test function.

The response was this (line-splitting done by me):

def parse_email(content, attachments):
    # Check if there are any image file attachments
    has_image_attachment = False
    for attachment in attachments:
        if attachment.get("name", "").lower().endswith(".jpg") \
            or attachment.get("name", "").lower().endswith(".jpeg") \
            or attachment.get("name", "").lower().endswith(".png"):
            has_image_attachment = True
            break

    # If no image file attachments are present, return an error message
    if not has_image_attachment:
        return {
            "success": False,
            "error_message": "One image file attachment must be present."
        }

    # If one or more image file attachments are present, return a success message
    return {"success": True, "message": "Email parsed successfully."}

The interesting thing for this first implementation is that because we had been talking about image attachments, it gave me code which already handled different types of image extensions. This was good but it is against the TDD philosophy of writing just enough code to pass the test. I decided to leave it in however.

Personally I don't find this code readable, and for the future it is likely the code will accept some extensions and not others, so we can ask ChatGPT to refactor it to use the split command and the in operator.

Running the test

The moment of truth... does this work? I ran pytest and got the following error:

E       fixture 'email_with_no_file_attachments' not found

Hmmm... why? Well, it turns out that the test code generated by ChatGPT had used the pytest 'fixture' style to write the tests, but it had been implemented incorrectly. It's possible that this style is too new for ChatGPT or it just didn't know how to do it. But it is something I had to dig into and fix, which took about an hour.

After fixing that, hey presto, test passed! The question remaining is: considering the time it took to fix the error, would it have been faster for me to write this myself?

Summary

This was a valuable experiment with many new things learnt about using ChatGPT in this way.

The main benefit I found was that it prompted me to think more about design decisions - it did almost feel like I was 'pair programming'. However I was hoping that I would be able to continue thinking at a high level, but that didn't happen; because some of the output had mistakes or was poorly written (in my opinion), I had to jump between design-level and code-level thinking.

This raised an interesting question though - does this in fact fit into the TDD 'Red-Green-Refactor' process? Maybe it does and I have just been executing the 'refactor' stage a bit too early? I admit I sometimes neglected to wait to see if the code that was produced passed, thus following the 'write the minimum code to turn the test green' rule.

Other associated points are:

defining the expected inputs/outputs for unit tests is a bit laborious
Some responses introduced assumptions in the details (e.g. I asked it to use URL-encoded-strings and it used outdated formats which I had to fix)
The output is often verbose, leading to lengthy, duplicated or redundant code. This was especially true when parameterizing tests.

Some of these may be improved in GPT4, we will have to see.

What about writing the more complex success-case of the experiment? You can read that at the original post

DEV Community

ChatTDD: Using AI as your Programming Buddy

Picking up from Requirements

Setup

Configure ChatGPT

Test 1: Simple failure case

Writing the test case

Implementing Code

Running the test

Summary

Top comments (0)

Read next

Github's Top 36 items of Dec 19, 2024

Understanding Java Strings: A Comprehensive Guide

Gemini vs. Copilot: A Tale of Two AIs – Where Gemini Falls Short

# Azure Functions Memory Issues: How to Handle Large Files (Complete Guide 2024)