Benoit Averty for Zenika

Posted on Nov 3 • Originally published at benoitaverty.com

Writing Good Automated Tests with AI: A Case Study

#ai #testing #programming

The use of AI is spreading among developers. For better or worse, more and more AI-generated code will make its way into production... and into the tests for this production code. In this article, I present an experiment I conducted on writing automated tests using AI, along with the conclusions I drew from it.

Read this article in french on my website

Genesis
Experimentation
Results
- Tips for Using AI Effectively
- My Opinion
Appendices
- The Complete ContentSelectionPipeline Class
- The Tests Generated with the Poor Prompt

Genesis

A while ago, I came across this post on LinkedIn.

The post is in french, here's a traduction:

🦾 Developers, generate your tests using AI

⁉️ But why generate tests?

⚠️ Because some sellers of AI proxies, or even fake AI, make you believe that testing is a tedious activity.

🤝 It can be tempting for a manager, who doesn't know much about it, to be swayed. Especially if their teams don't do testing and perpetuate this idea of its difficulty. The promise then becomes enticing!

⛔️ Spoiler: the generated tests are just a copy of the code: faulty code = green test (when it compiles). However, writing tests requires a deep understanding of the context and product requirements.

🔄 What if we approached things differently: tests as a valuable tool.

👉 Learn to do TDD (Test-Driven Development), rather than coding haphazardly and wanting to generate tests afterward. Use tests as a guide, and you’ll get living documentation, non-regression, and more.

Although I agree with Christophe that tests should not be seen as a tedious or secondary activity, his comments on the quality of AI-generated tests surprised me. My intuition was quite the opposite: I felt that writing tests was one of the more promising use cases for AI, similar to writing a SQL query, a pure function, or a slightly complex TypeScript type. So I decided to try it out myself, beyond intuition, to see what the reality was.

My goal in this experiment was to answer the following questions:

Is it possible to write good automated tests with AI?
What methods help achieve this?
Is it interesting, worthwhile in terms of time and effort?

Experimentation

The Class to Test

One problem with experiments of this kind is that it's hard to find a realistic scenario. Successfully coding tests with AI wouldn't mean much on a single class and without a real project. So I chose to pull out a project from a client of mine and generate tests for one of the core domain classes.

The code for this class is presented in the appendix, and here is a version without the method bodies and private members.

/**
 * Represents the successive steps taken to select the best content from a `bundle` returned by the indexing engine.
 *
 * The content selection pipeline as currently implemented only supports text content.
 */
class ContentSelectionPipeline(
    private val parameters: ContentSelectionParameters,
    private val blacklist: Blacklist,
    private val queryingEditor: Editor,
    private val relatedEditors: List<Editor>,
    private val fetchContentMetadata: suspend (List<DocumentIndexMetadataWithScore>) -> List<ContentMetadata>,
    private val clock: Clock
) {
    // Private attributes omitted. But the class is indeed stateful.

    suspend fun initializeForDocuments(relatedDocuments: RelatedDocumentsBundle) {
        // ...
    }

    fun execute() {
        // ...
    }
    fun isSuccess() : Boolean {
        // ...
    }
    fun getFinalContents(): List<ContentMetadata> {
        // ...
    }

    // Private methods omitted. The real file is 130 lines long.
}

This class comes from a project aimed at sharing Content (Articles...) between multiple Editors (Websites with press articles, videos...). The role of this class is to select three pieces of content from a set of documents (RelatedDocumentsBundle) returned by an IndexationEngine (think Elasticsearch). To fulfill this role, the class must fetch content metadata (ContentMetadata: URL, word count...) corresponding to the documents, and then apply a number of configurable business rules.

The typical use case can be described as follows:

Instantiate the pipeline with several parameters (the group of editors involved, business parameters, an exclusion list...)
Initialize the pipeline with a set of documents from the indexing engine (initializeForDocuments method)
Execute the pipeline (execute method, which applies all business rules to select the content)
Check the operation status (isSuccess) and retrieve the selected content (getFinalContents)

Steps 2 to 4 can be executed multiple times for a single instance, but of course, they cannot be executed out of order.

Poor Tests

The first step of my experiment was to get a sense of what happens when you put in no effort. I opened my chat agent (Jetbrains AI) and asked it to write tests:

Here is a class for which I want to generate tests:
class ContentSelectionPipeline {
 [...]
}
Write the tests for this class.

Again, the full code is in the appendix, but here’s the beginning of the test class and the list of test cases generated:

package fr.sipaof.flink.domain.content.selection

class IaContentSelectionPipelineTests {

    private lateinit var parameters: ContentSelectionParameters
    private lateinit var blacklist: Blacklist
    private lateinit var queryingEditor: Editor
    private lateinit var relatedEditors: List<Editor>
    private lateinit var fetchContentMetadata: suspend (List<DocumentIndexMetadataWithScore>) -> List<ContentMetadata>
    private lateinit var clock: Clock
    private lateinit var contentSelectionPipeline: ContentSelectionPipeline

    @BeforeEach
    fun setUp() {
        parameters = mockk<ContentSelectionParameters>()
        blacklist = mockk()
        queryingEditor = mockk()
        relatedEditors = listOf()
        fetchContentMetadata = mockk()
        clock = mockk()

        contentSelectionPipeline = ContentSelectionPipeline(
            parameters,
            blacklist,
            queryingEditor,
            relatedEditors,
            fetchContentMetadata,
            clock
        )
    }

    @Test
    fun `initializeForDocuments should initialize contents and initialBundle`() = runBlocking {
        // ...
    }

    @Test
    fun `initializeForDocuments should throw IllegalStateException for unsupported content`(): Unit = runBlocking {
        // ...
    }

    @Test
    fun `execute should populate steps and selectedDocuments`() = runBlocking {
        // ...
    }

    @Test
    fun `isSuccess should return true when at least 2 documents are selected`() = runBlocking {
        // ...
    }

    @Test
    fun `isSuccess should return false when less than 2 documents are selected`() = runBlocking {
        // ...
    }

    @Test
    fun `getFinalContents should return selected content metadata`() = runBlocking {
        // ...
    }

    @Test
    fun `getExecutionSummary should return pipeline execution summary`() = runBlocking {
        // ...
    }

}

Unsurprisingly, it’s a disaster. These tests are a textbook example of what not to do.

First, there is a list of attributes in the class that correspond to all the data provided to the class at instantiation. Most of these attributes are initialized by mocks (mockk) before each test, even though most are Kotlin data classes that would be much simpler to instantiate directly (especially since we prefer testing their behavior, as they are used by the ContentSelectionPipeline class we are testing). There’s even a mock for Clock, even though Jetbrains AI surely knows Clock.fixed. In short, too many mocks.

Secondly, if you look at the names of the generated test cases, you’ll find a lot of vocabulary that I didn’t mention when I presented the class to be tested. And rightly so, because it corresponds to private classes, functions, and attributes of ContentSelectionPipeline. Indeed, the tests do not compile without making several of these attributes public, which is another bad practice. In short, the tests are coupled to the implementation of the class.

Lastly, let’s look at one of these test cases in full:

    @Test
    fun `initializeForDocuments should initialize contents and initialBundle`() = runBlocking {
        val relatedDocumentsBundle = mockk<RelatedDocumentsBundle>()
        val textContentMetadata = mockk<TextContentMetadata>()
        val documentIndexMetadataWithScore = mockk<DocumentIndexMetadataWithScore>()

        every { relatedDocumentsBundle.allDocuments() } returns listOf(documentIndexMetadataWithScore)
        coEvery { fetchContentMetadata(any()) } returns listOf(textContentMetadata)

        // --
        contentSelectionPipeline.initializeForDocuments(relatedDocumentsBundle)
        // --

        assertNotNull(contentSelectionPipeline.contents)
        assertNotNull(contentSelectionPipeline.initialBundle)
        assertNull(contentSelectionPipeline.steps)
        assertNull(contentSelectionPipeline.selectedDocuments)
    }

The beginning and end of the test illustrate what I mentioned earlier:

We mock classes that should be included in the system under test.
We make assertions on private attributes.

But the middle part (which I marked with // --) shows another problem: calling only the initializeForDocuments method does not correspond to any real use case of the class. The test is doing something that doesn’t make sense, there is nothing to observe, forcing us to make assertions on the private state of the class.

Why Is It So Bad?

Should we conclude that AI is bad at writing tests? No. It’s just that we cannot expect an AI to succeed at anything with the little information I provided in the prompt.

The AI had access only to the ContentSelectionPipeline class, but I did not give it the code for the supporting classes, which are essential to the service provided by ContentSelectionPipeline.
The AI had to guess the boundaries of the system under test (I did not specify them), and it couldn't make relevant guesses because, again, I did not provide the code for the related classes. These classes could be entirely independent, justifying a mock rather than their use in the test.
The AI had no information on how the class is used in the rest of the application, so it couldn't tell which scenarios made sense and which didn’t (although in this regard, a more powerful model like OpenAI o1 could do much better, even with no additional information).

It's easy to conclude that an AI cannot perform certain tasks based on examples like the ones I just showed. But it is important to remember that an LLM is a computational tool, not magic. Like any tool, it requires a minimum amount of effort to be used properly, and most importantly, one must learn to use it. Typing a one-line prompt required no effort and is the first thing any developer who has not tried to improve their LLM usage would do.

To do better, we need to improve at least two aspects:

Provide more information to the AI.
Improve our prompt.

A Slightly More Advanced Attempt

First Prompt

We are going to write unit tests for the class #file:ContentSelectionPipeline.kt and its associated classes (#symbol:ContentSelectionParameters, #file:Blacklist.kt, #file:Editor.kt, #symbol:DocumentIndexMetadataWithScore, #file:ContentMetadata.kt, #file:RelatedDocumentsBundle.kt, #file:TextContentMetadata.kt, #symbol:ContentAndDocumentMetadataWithScore).

Private methods must never appear in the tests. Mocks should be avoided unless I explicitly mention them.

First, we will write a utility method that will help us create input data for the tests. The method will take a list of editors as a parameter and the path to a CSV file with the following columns: editorSlug;contentId;title;url;numberOfWords;publicationDate;indexationId;returnedScore;blacklisted. Each line corresponds to a document present in the Bundle and the associated ContentMetadata.

What to note from this prompt:

I included the complete code of all the classes belonging to the tested system as I envisioned it (that’s what is done with the special syntax #file:xyz.kt available in Jetbrains AI, but if you’re using something else, copying and pasting the file will do just fine). This means that I had thought about the system under test in advance, the AI did not do it for me.
I was explicit about the desired testing practices: no mocks, no private attributes in the tests.
I broke down the task. Although I explained initially that the goal was to write a test, I started by writing a utility method to generate input data, as I knew it would be laborious to instantiate (many attributes and parameters). My knowledge of the system under test allowed me to identify this first step; the AI did not think for me.

In response to this prompt, the AI provided a function matching this signature:

data class TestDocument(
    val editor: Editor,
    val contentMetadata: TextContentMetadata,
    val documentIndexMetadata: DocumentIndexMetadataWithScore
)

fun createTestDocumentsFromCsv(editors: List<Editor>, csvFilePath: String): List<TestDocument> {
    // ...
}

Adjustment

It didn’t suit me because the result of this method was not directly usable as input to my system under test. So I clarified:

The method should return the input data as a RelatedDocumentsBundle and a function corresponding to the fetchContentMetadata parameter of the content selection pipeline.

This time, I decided to keep the result. It wasn’t perfect, but a quick println() test at least convinced me that it worked, and there would always be time to refactor later.

data class TestInput(
    val relatedDocumentsBundle: RelatedDocumentsBundle,
    val fetchContentMetadata: suspend (List<DocumentIndexMetadataWithScore>) -> List<TextContentMetadata>
)

fun createTestInputFromCsv(editors: List<Editor>, csvFilePath: String): TestInput {
    val csvFile = File(csvFilePath)
    val csvLines = csvFile.readLines()

    // Assumes the first line is the header
    val header = csvLines.first().split(";")
    require(header == listOf("editorSlug", "contentId", "title", "url", "numberOfWords", "publicationDate", "indexationId", "returnedScore", "blacklisted"))

    val editorMap = editors.associateBy { it.slug }
    val dateFormatter = DateTimeFormatter.ISO_ZONED_DATE_TIME

    val documentIndexMetadataMap: MutableMap<Collection, MutableList<DocumentIndexMetadataWithScore>> = mutableMapOf()
    val contentMetadataMap = mutableMapOf<InternalContentId, TextContentMetadata>()

    for (line in csvLines.drop(1)) {
        val columns = line.split(";")
        val editorSlug = columns[0]
        val contentId = columns[1]
        val title = columns[2]
        val url = columns[3]
        val numberOfWords = columns[4].toInt()
        val publicationDate = ZonedDateTime.parse(columns[5], dateFormatter)
        val indexationId = UUID.fromString(columns[6])
        val returnedScore = columns[7].toDouble()
        val blacklisted = columns[8].toBoolean()

        val editor = editorMap[editorSlug] ?: throw IllegalArgumentException("Editor not found: $editorSlug")

        val contentMetadata = TextContentMetadata(
            id = OpaqueId(contentId),
            editor = editor,
            title = title,
            url = Url(url),
            publicationDate = publicationDate,
            numberOfWords = numberOfWords,
            hashedText = null // or compute based on the text if available
        )

        val documentMetadata = DocumentIndexMetadataWithScore(
            documentIndexMetadata = DocumentIndexMetadata(indexationId),
            score = returnedScore
        )

        documentIndexMetadataMap.computeIfAbsent(editor.collection) { mutableListOf() }.add(documentMetadata)
        contentMetadataMap[contentMetadata.internalId] = contentMetadata()
    }

    val relatedDocumentsBundle = RelatedDocumentsBundle(documentIndexMetadataMap)
    val fetchContentMetadata: suspend (List<DocumentIndexMetadataWithScore>) -> List<TextContentMetadata> = { documentMetadataList ->
        documentMetadataList.mapNotNull { contentMetadataMap[it.documentIndexMetadata.internalId] }
    }

    return TestInput(relatedDocumentsBundle, fetchContentMetadata)
}

Generating Data for the First Test Case

With this function ready to generate a lot of test data, it was time to write a test case.

Now let’s write the unit test for the happy path of the content selection pipeline. The scenario is as follows:

There are three editors: fake-editor-1, other-editor, and awesome-editor. They each have a freshnessCoefficient of 0.97, 0.97, and 0.90, respectively.

The input parameters for the pipeline are the same as in the production code (in #file:application.yml).

Each editor has three documents in the input document set.

Please write the CSV file in the format defined earlier that corresponds to this scenario.

Again, instead of directly asking for the test, I asked for a CSV with several lines of data. And the AI responded with a CSV matching the specifications I gave in the prompt. It correctly generated scores between the threshold and ceiling specified in application.yml. It even set the blacklisted column to false throughout, explaining that it was for simplicity. Pretty relevant for the happy path I asked for.

Generated CSV. Don’t feel the need to understand every detail...

editorSlug;contentId;title;url;numberOfWords;publicationDate;indexationId;returnedScore;blacklisted
fake-editor-1;content-1-1;Title 1-1;http://example.com/1-1;500;2023-01-01T10:00:00Z;00000000-0000-0000-0000-000000000001;0.75;false
fake-editor-1;content-1-2;Title 1-2;http://example.com/1-2;1000;2023-01-02T10:00:00Z;00000000-0000-0000-0000-000000000002;0.82;false
fake-editor-1;content-1-3;Title 1-3;http://example.com/1-3;750;2023-01-03T10:00:00Z;00000000-0000-0000-0000-000000000003;0.90;false
other-editor;content-2-1;Title 2-1;http://example.com/2-1;600;2023-02-01T11:00:00Z;00000000-0000-0000-0000-000000000004;0.70;false
other-editor;content-2-2;Title 2-2;http://example.com/2-2;1100;2023-02-02T11:00:00Z;00000000-0000-0000-0000-000000000005;0.78;false
other-editor;content-2-3;Title 2-3;http://example.com/2-3;700;2023-02-03T11:00:00Z;00000000-0000-0000-0000-000000000006;0.85;false
awesome-editor;content-3-1;Title 3-1;http://example.com/3-1;550;2023-03-01T12:00:00Z;00000000-0000-0000-0000-000000000007;0.65;false
awesome-editor;content-3-2;Title 3-2;http://example.com/3-2;1200;2023-03-02T12:00:00Z;00000000-0000-0000-0000-000000000008;0.80;false
awesome-editor;content-3-3;Title 3-3;http://example.com/3-3;800;2023-03-03T12:00:00Z;00000000-0000-0000-0000-000000000009;0.88;false

Writing the Test Code

Finally:

Now, let’s write the unit test that uses this file. The test should use the CSV and the utility method to create the input data, use Clock.fixed, create the parameters according to the real values, then instantiate the pipeline. Then, call the methods in the correct order. Finally, verify that the pipeline is successful and that the correct content has been selected.

Once again, I didn’t hesitate to explain exactly what I expected.

The result:

class ContentSelectionPipelineTest {

    private lateinit var csvFilePath: String

    @BeforeEach
    fun setup() {
        // Create a temporary CSV file with the required data
        val csvContent = """
            editorSlug;contentId;title;url;numberOfWords;publicationDate;indexationId;returnedScore;blacklisted
            fake-editor-1;content-1-1;Title 1-1;http://example.com/1-1;500;2023-01-01T10:00:00Z;00000000-0000-0000-0000-000000000001;0.75;false
            fake-editor-1;content-1-2;Title 1-2;http://example.com/1-2;1000;2023-01-02T10:00:00Z;00000000-0000-0000-0000-000000000002;0.82;false
            fake-editor-1;content-1-3;Title 1-3;http://example.com/1-3;750;2023-01-03T10:00:00Z;00000000-0000-0000-0000-000000000003;0.90;false
            other-editor;content-2-1;Title 2-1;http://example.com/2-1;600;2023-02-01T11:00:00Z;00000000-0000-0000-0000-000000000004;0.70;false
            other-editor;content-2-2;Title 2-2;http://example.com/2-2;1100;2023-02-02T11:00:00Z;00000000-0000-0000-0000-000000000005;0.78;false
            other-editor;content-2-3;Title 2-3;http://example.com/2-3;700;2023-02-03T11:00:00Z;00000000-0000-0000-0000-000000000006;0.85;false
            awesome-editor;content-3-1;Title 3-1;http://example.com/3-1;550;2023-03-01T12:00:00Z;00000000-0000-0000-0000-000000000007;0.65;false
            awesome-editor;content-3-2;Title 3-2;http://example.com/3-2;1200;2023-03-02T12:00:00Z;00000000-0000-0000-0000-000000000008;0.80;false
            awesome-editor;content-3-3;Title 3-3;http://example.com/3-3;800;2023-03-03T12:00:00Z;00000000-0000-0000-0000-000000000009;0.88;false
        """.trimIndent()

        val tempFile = createTempFile(suffix = ".csv")
        Files.writeString(tempFile, csvContent)
        csvFilePath = tempFile.toAbsolutePath().toString()
    }

    @Test
    fun `test content selection pipeline happy path`() = runBlocking {
        // Define editors
        val editorsList = listOf(
            Editor(UUID.randomUUID(), "fake-editor-1", 0.97, UUID.randomUUID()),
            Editor(UUID.randomUUID(), "other-editor", 0.97, UUID.randomUUID()),
            Editor(UUID.randomUUID(), "awesome-editor", 0.90, UUID.randomUUID())
        )

        // Create test input using the factory method
        val testInput = createTestInputFromCsv(editorsList, csvFilePath)

        // Define Content Selection Parameters according to application.yml
        val parameters = ContentSelectionParameters(
            scoreThreshold = 0.48,
            scoreCeiling = 0.95,
            freshnessScoreThreshold = 0.3
        )

        // Fixed Clock
        val fixedClock = Clock.fixed(Instant.parse("2023-04-01T00:00:00Z"), ZoneOffset.UTC)

        // Blacklist implementation (no items blacklisted for simplicity)
        val blacklist = object : Blacklist {
            override val allItems = emptyList<BlacklistItem>()
            override fun contains(content: InternalContentId): Boolean = false
        }

        // Instantiate the pipeline
        val pipeline = ContentSelectionPipeline(
            parameters = parameters,
            blacklist = blacklist,
            queryingEditor = editorsList.first(),
            relatedEditors = editorsList.drop(1),
            fetchContentMetadata = testInput.fetchContentMetadata,
            clock = fixedClock
        )

        // Initialize the pipeline
        pipeline.initializeForDocuments(testInput.relatedDocumentsBundle)

        // Execute the pipeline
        pipeline.execute()

        // Check if the pipeline succeeded
        assertTrue(pipeline.isSuccess())

        // Get the final contents
        val finalContents = pipeline.getFinalContents()

        // Assert the correct contents have been selected
        val expectedContents = listOf(
            "Title 1-1", "Title 1-2", "Title 1-3", 
            "Title 2-1", "Title 2-2", "Title 2-3", 
            "Title 3-1", "Title 3-2", "Title 3-3"
        )
        val finalContentTitles = finalContents.map { it.title }

        assertEquals(expectedContents.sorted(), finalContentTitles.sorted())
    }
}

I find this result quite satisfactory (apart from the CSV inline in the class, but that’s trivial to extract). I still had a few adjustments to make to the test scenario itself: change one or two values in the CSV, change the Clock date, and change the assertion, as the AI didn’t understand that in the happy path, only two or three pieces of content could be selected (the only real disappointment).

However, adjusting the scenario this way is clearly the most complex task. Determining, without executing it, what the result of the pipeline will be based on the input data is precisely what gives the test all its value. It seems normal to me to leave this job to the developer once the AI has quickly generated all this code.

And the final result of the experiment, of course, is that this test:

Compiles;
Passes;
Fails when we introduce bugs into the class.

For those who are interested, the complete log of my conversation with Jetbrains AI is available at this link.

End of the Experiment

At this point, I stopped my experiment. The first reason is that this class is already well-tested in reality, so the exercise is somewhat pointless. But mostly, I had formed a much clearer opinion on the original question that interested me.

To continue the experiment, I would need to:

Refactor the CSV parsing method, which is a bit difficult to read, in my opinion.
Create other test scenarios, especially cases that exclude content based on different business rules.¹

Results

Tips for Using AI Effectively

This experiment was conducted on automated tests, but I hypothesize that the lessons I learned apply to other tasks as well. It would need verification, but my other experiments (although less rigorous) seem to confirm this.

AI Works When the Generated Code is Localized

AI works well for generating code that is localized to one or two places.

Generating a test class and a utility method (as I did for this article)
Generating SQL queries
Generating complex TypeScript types
...

Needing context from many files is not a problem thanks to IDE features that quickly pass the code from several files to the LLM.

On the other hand, modifying a large number of files was more challenging and too laborious to save time.

You Must Give the AI a Lot of Context

Providing a single class to generate tests is not enough. You also need to provide the code of the classes used as parameters and attributes to get a correct result.

Think of it like the input data of a problem, like you would expect for solving a math problem. Similarly, if you want to generate an algorithmic function, the LLM must know the code of any classes used as parameters or return types. If you design a web controller, the LLM needs to know what framework you're using, etc.

Be Explicit About Expectations

By default, the LLM wrote tests with many mocks and assertions on the private state of the class. But this is not an inherent limitation of the model. It simply needed me to ask not to do so.

Similarly, if you request a somewhat complex SQL query, do not hesitate to specify the desired output columns, tables to use, indexes that you want the query to prioritize, etc.

My Opinion

I am convinced that it is possible to write excellent tests with the help of an AI. You need to have realistic expectations: the AI will not relieve you from thinking about the system under test, the test cases, or the technical architecture of your testing framework. These are questions you need to ask yourself anyway if you write your tests manually.

I spent about two hours writing this first test. What took the most time was starting over several times to perfect my approach and offer you a summary of a not-too-long discussion session with the AI. It is clear that writing the following test cases would take less time, and in the end, I think this approach would have saved time in writing the tests.

Will I adopt AI for all my future automated tests? I don't know. The environmental cost is high, as is the financial cost, given that AI is the new golden goose for companies worldwide. Despite the rapid progress in these technologies, time savings are not yet miraculous (although they are real in some cases).

On the other hand, it seems important to continue experimenting. In my opinion, using LLMs to produce code is a useful skill for a developer in a context where the rush towards AI shows no sign of stopping.

Appendices

The Complete `ContentSelectionPipeline` Class

/**
 * Represents the successive steps taken to select the best content from a bundle returned by the indexing engine.
 *
 * The ContentSelectionPipeline as currently implemented only supports Text Content.
 */
class ContentSelectionPipeline(
    private val parameters: ContentSelectionParameters,
    private val blacklist: Blacklist,
    private val queryingEditor: Editor,
    private val relatedEditors: List,
    private val fetchContentMetadata: suspend (List) -&gt; List,
    private val clock: Clock
) {
    private var initialBundle: RelatedDocumentsBundleWithContentMetadata? = null
    private var contents: MutableSet? = null
    private var steps: MutableList? = null
    private var selectedDocuments: MutableList? = null

    suspend fun initializeForDocuments(relatedDocuments: RelatedDocumentsBundle) {
        // Keep only TextContentMetadata (the only one that exists in the DB for now)
        val fetchedContentMetadata = fetchContentMetadata(relatedDocuments.allDocuments())
        if(fetchedContentMetadata.any { it !is TextContentMetadata }) {
            throw IllegalStateException("The content selection pipeline was initialized with unsupported content." +
                    " Pipeline can only handle text content.")
        }

        contents = HashSet(
            fetchedContentMetadata.filterIsInstance()
        )
        initialBundle = associateContentToBundle(relatedDocuments, contents!!)
        steps = null
        selectedDocuments = null
    }

    fun execute() {
        val execBundle = initialBundle ?: throw IllegalStateException("Executing non-initialized pipeline")
        steps = mutableListOf()

        steps!!.add(PipelineStep(BLACKLIST, execBundle.applyBlacklist(blacklist)))
        steps!!.add(PipelineStep(SCORE, steps!!.last().bundle.filter(::hasScoreBetweenThresholdAndCeiling)))
        steps!!.add(PipelineStep(FRESHNESS, steps!!.last().bundle.applyFreshnessCoeff(relatedEditors, Instant.now(clock))))
        steps!!.add(PipelineStep(EXCLUDE_FRESHNESS, steps!!.last().bundle.filter(::hasScoreAboveFreshnessThreshold)))

        selectContents()
    }
    fun isSuccess() : Boolean {
        return selectedDocuments?.let {
            it.size &gt;= 2
        } ?: false
    }
    fun getFinalContents(): List {
        val selectedDocumentsCopy = selectedDocuments ?: throw IllegalStateException("Cannot get content on a pipeline that hasn't been executed")

        return selectedDocumentsCopy.map(ContentAndDocumentMetadataWithScore::contentMetadata)
    }

    private fun hasScoreBetweenThresholdAndCeiling(doc: ContentAndDocumentMetadataWithScore): Boolean =
        doc.score &gt; parameters.scoreThreshold &amp;&amp; doc.score &lt; parameters.scoreCeiling

    private fun hasScoreAboveFreshnessThreshold(doc: ContentAndDocumentMetadataWithScore): Boolean =
        doc.score &gt; parameters.freshnessScoreThreshold

    private fun selectContents() {
        selectedDocuments = mutableListOf()
        val finalBundle = steps!!.last().bundle

        // Select the first item : priority to the querying editor
        selectedDocuments!!.add(
            finalBundle.longestContentFromCollection(queryingEditor.collection)
                ?: finalBundle.longestContent()
                ?: return
        )

        // select the second and third item : priority to another editor
        selectedDocuments!!.add(
            finalBundle.longestContent(
                excludeCollections = getCollectionsFromMetadata(selectedDocuments!!),
                excludeDocuments = selectedDocuments!!
            )
                ?: finalBundle.longestContent(excludeDocuments = selectedDocuments!!)
                ?: return
        )
        selectedDocuments!!.add(
            finalBundle.longestContent(
                excludeCollections = getCollectionsFromMetadata(selectedDocuments!!),
                excludeDocuments = selectedDocuments!!
            )
                ?: finalBundle.longestContent(excludeDocuments = selectedDocuments!!)
                ?: return
        )
    }

    fun getExecutionSummary(): PipelineExecutionSummary {
        if(selectedDocuments == null) throw IllegalStateException("Cannot get summary of a pipeline that hasn't been executed")

        return PipelineExecutionSummary(
            initialBundle!!,
            steps!!,
            getFinalContents(),
        )
    }

    private fun getCollectionsFromMetadata(content: List): List {
        val editorsTids = content.map { it.documentIndexMetadata.id.editorTid }
        return relatedEditors.filter { editorsTids.contains(it.tid) }.map(Editor::collection)
    }
}

class PipelineStep(
val name: PipelineStepName,
val bundle: RelatedDocumentsBundleWithContentMetadata,
)

enum class PipelineStepName {
BLACKLIST, SCORE, FRESHNESS, EXCLUDE_FRESHNESS
}

The Tests Generated with the Poor Prompt

package fr.sipaof.flink.domain.content.selection

class IaContentSelectionPipelineTests {

    private lateinit var parameters: ContentSelectionParameters
    private lateinit var blacklist: Blacklist
    private lateinit var queryingEditor: Editor
    private lateinit var relatedEditors: List
    private lateinit var fetchContentMetadata: suspend (List) -&gt; List
    private lateinit var clock: Clock
    private lateinit var contentSelectionPipeline: ContentSelectionPipeline

    @BeforeEach
    fun setUp() {
        parameters = mockk()
        blacklist = mockk()
        queryingEditor = mockk()
        relatedEditors = listOf()
        fetchContentMetadata = mockk()
        clock = mockk()

        contentSelectionPipeline = ContentSelectionPipeline(
            parameters,
            blacklist,
            queryingEditor,
            relatedEditors,
            fetchContentMetadata,
            clock
        )
    }

    @Test
    fun `initializeForDocuments should initialize contents and initialBundle`() = runBlocking {
        val relatedDocumentsBundle = mockk()
        val textContentMetadata = mockk()
        val documentIndexMetadataWithScore = mockk()

        every { relatedDocumentsBundle.allDocuments() } returns listOf(documentIndexMetadataWithScore)
        coEvery { fetchContentMetadata(any()) } returns listOf(textContentMetadata)

        contentSelectionPipeline.initializeForDocuments(relatedDocumentsBundle)

        assertNotNull(contentSelectionPipeline.contents)
        assertNotNull(contentSelectionPipeline.initialBundle)
        assertNull(contentSelectionPipeline.steps)
        assertNull(contentSelectionPipeline.selectedDocuments)
    }

    @Test
    fun `initializeForDocuments should throw IllegalStateException for unsupported content`(): Unit = runBlocking {
        val relatedDocumentsBundle = mockk()
        val contentMetadata = mockk() // Not TextContentMetadata

        every { relatedDocumentsBundle.allDocuments() } returns listOf(mockk())
        coEvery { fetchContentMetadata(any()) } returns listOf(contentMetadata)

        assertThrows {
            runBlocking {
                contentSelectionPipeline.initializeForDocuments(relatedDocumentsBundle)
            }
        }
    }

    @Test
    fun `execute should populate steps and selectedDocuments`() = runBlocking {
        val relatedDocumentsBundle = mockk()
        val textContentMetadata = mockk()
        val documentIndexMetadataWithScore = mockk()

        every { relatedDocumentsBundle.allDocuments() } returns listOf(documentIndexMetadataWithScore)
        coEvery { fetchContentMetadata(any()) } returns listOf(textContentMetadata)

        contentSelectionPipeline.initializeForDocuments(relatedDocumentsBundle)

//        every { blacklist.filter(any()) } returns mockk()
        every { clock.instant() } returns Instant.now()
//        every { relatedDocumentsBundle.longestContentFromCollection(any()) } returns mockk()
//        every { relatedDocumentsBundle.longestContent(excludeCollections = any(), excludeDocuments = any()) } returns mockk()
        every { queryingEditor.collection } returns mockk()

        contentSelectionPipeline.execute()

        assertNotNull(contentSelectionPipeline.steps)
        assertNotNull(contentSelectionPipeline.selectedDocuments)
    }

    @Test
    fun `isSuccess should return true when at least 2 documents are selected`() = runBlocking {
        val relatedDocumentsBundle = mockk()
        val textContentMetadata = mockk()
        val documentIndexMetadataWithScore = mockk()

        every { relatedDocumentsBundle.allDocuments() } returns listOf(documentIndexMetadataWithScore)
        coEvery { fetchContentMetadata(any()) } returns listOf(textContentMetadata)

        contentSelectionPipeline.initializeForDocuments(relatedDocumentsBundle)
        contentSelectionPipeline.execute()

        // Assume the pipeline selects 2 documents
        contentSelectionPipeline.selectedDocuments?.add(mockk())
        contentSelectionPipeline.selectedDocuments?.add(mockk())

        assertTrue(contentSelectionPipeline.isSuccess())
    }

    @Test
    fun `isSuccess should return false when less than 2 documents are selected`() = runBlocking {
        val relatedDocumentsBundle = mockk()
        val textContentMetadata = mockk()
        val documentIndexMetadataWithScore = mockk()

        every { relatedDocumentsBundle.allDocuments() } returns listOf(documentIndexMetadataWithScore)
        coEvery { fetchContentMetadata(any()) } returns listOf(textContentMetadata)

        contentSelectionPipeline.initializeForDocuments(relatedDocumentsBundle)
        contentSelectionPipeline.execute()

        // Assume the pipeline selects 1 document
        contentSelectionPipeline.selectedDocuments?.add(mockk())

        assertFalse(contentSelectionPipeline.isSuccess())
    }

    @Test
    fun `getFinalContents should return selected content metadata`() = runBlocking {
        val relatedDocumentsBundle = mockk()
        val textContentMetadata = mockk()
        val documentIndexMetadataWithScore = mockk()
        val contentAndDocumentMetadataWithScore = mockk()

        every { relatedDocumentsBundle.allDocuments() } returns listOf(documentIndexMetadataWithScore)
        coEvery { fetchContentMetadata(any()) } returns listOf(textContentMetadata)
        every { contentAndDocumentMetadataWithScore.contentMetadata } returns textContentMetadata

        contentSelectionPipeline.initializeForDocuments(relatedDocumentsBundle)
        contentSelectionPipeline.execute()

        // Assume the pipeline selects 1 document
        contentSelectionPipeline.selectedDocuments?.add(contentAndDocumentMetadataWithScore)

        val finalContents = contentSelectionPipeline.getFinalContents()
        assertEquals(listOf(textContentMetadata), finalContents)
    }

    @Test
    fun `getExecutionSummary should return pipeline execution summary`() = runBlocking {
        val relatedDocumentsBundle = mockk()
        val textContentMetadata = mockk()
        val documentIndexMetadataWithScore = mockk()
        val contentAndDocumentMetadataWithScore = mockk()
        val pipelineStep = mockk()

        every { relatedDocumentsBundle.allDocuments() } returns listOf(documentIndexMetadataWithScore)
        coEvery { fetchContentMetadata(any()) } returns listOf(textContentMetadata)
        every { contentAndDocumentMetadataWithScore.contentMetadata } returns textContentMetadata
        every { pipelineStep.bundle } returns mockk()

        contentSelectionPipeline.initializeForDocuments(relatedDocumentsBundle)
        contentSelectionPipeline.execute()

        // Assume the pipeline selects 1 document and steps are populated
        contentSelectionPipeline.selectedDocuments?.add(contentAndDocumentMetadataWithScore)
        contentSelectionPipeline.steps?.add(pipelineStep)

        val summary = contentSelectionPipeline.getExecutionSummary()
        assertNotNull(summary)
    }

}

I must say that the AI can probably help with this task as well, although my quick tests were not very convincing: https://chatgpt.com/share/67083a01-e200-8000-bfa2-37b94113fa4c ↩

DEV Community

Writing Good Automated Tests with AI: A Case Study

Contents

Genesis

Experimentation

The Class to Test

Poor Tests

Why Is It So Bad?

A Slightly More Advanced Attempt

First Prompt

Adjustment

Generating Data for the First Test Case

Writing the Test Code

End of the Experiment

Results

Tips for Using AI Effectively

AI Works When the Generated Code is Localized

You Must Give the AI a Lot of Context

Be Explicit About Expectations

My Opinion

Appendices

The Complete `ContentSelectionPipeline` Class

The Tests Generated with the Poor Prompt

Top comments (0)

Read next

Brain debugging. Interview with Anders Schau Knatten, author of "C++ Brain Teasers: Exercise Your Mind"

AI System Learns How Objects Interact in Images to Generate New Scenes with Same Relationships

Beauty Tech Startups: The AI Innovations Reshaping the Industry

Must-Know Ruby on Rails Gems for Improved Productivity

Contents

Genesis

Experimentation

The Class to Test

Poor Tests

Why Is It So Bad?

A Slightly More Advanced Attempt

First Prompt

Adjustment

Generating Data for the First Test Case

Writing the Test Code

End of the Experiment

Results

Tips for Using AI Effectively

AI Works When the Generated Code is Localized

You Must Give the AI a Lot of Context

Be Explicit About Expectations

My Opinion

Appendices

The Complete ContentSelectionPipeline Class

The Tests Generated with the Poor Prompt

Read next

Brain debugging. Interview with Anders Schau Knatten, author of "C++ Brain Teasers: Exercise Your Mind"

AI System Learns How Objects Interact in Images to Generate New Scenes with Same Relationships

Beauty Tech Startups: The AI Innovations Reshaping the Industry

Must-Know Ruby on Rails Gems for Improved Productivity

The Complete `ContentSelectionPipeline` Class