DEV Community

Cover image for Semantic Tests for SemanticKernel Plugins using skUnit
Mehran Davoudi
Mehran Davoudi

Posted on • Updated on

Semantic Tests for SemanticKernel Plugins using skUnit

Exploring SemanticKernel

This week, I had the chance to explore the SemanticKernel code base, particularly the core plugins. SemanticKernel comes equipped with these built-in plugins:

  • ConversationSummaryPlugin
  • FilePlugin
  • HttpPlugin
  • MathPlugin
  • TextPlugin
  • TimePlugin
  • WaitPlugin

When I looked at the Plugins.UnitTests project, I noticed that all the unit tests are passing. But there's something interesting:

Each plugin has a corresponding test file, except for ConversationSummaryPlugin.

You might wonder why!?

Here's the thing. All the other plugins have outputs that can be tested because they're deterministic. But ConversationSummaryPlugin is a different story.

For instance, it has a function called SummarizeConversation that does exactly what it says - it summarizes a conversation. But how do you test something like that? You need to check the meaning of the output, not just if the strings are identical.

Let's consider this test case:

USER: Is Eiffel tall?
AGENT: Yes it is
USER: What about Everest mountain?
AGENT: Yes it is tall too
USER: What about a mouse?
AGENT: No it is not tall.
Enter fullscreen mode Exit fullscreen mode

If you call SummarizeConversation with this input, you should get something like:

Expected output: The conversation is about the heights of different things. Both the Eiffel Tower and Mount Everest are considered tall, while a mouse is not.

But how do you write a test for that? You need to use semantic assertions, something like:

SemanticAssert.HasCondition(
   output, 
   "It mentions that both the Eiffel Tower and Mount Everest are tall.")
Enter fullscreen mode Exit fullscreen mode

While you can do this now with the SemanticValidation library, I'm going to introduce an even simpler way in this post: using the skUnit library for semantic unit testing. Sounds exciting, right?

Let's Dive into Testing with skUnit

With skUnit, you can whip up scenarios in markdown files. Here's an example:

# SCENARIO Height Discussion

## PARAMETER input
USER: Is Eiffel tall?
AGENT: Yes it is
USER: What about Everest mountain?
AGENT: Yes it is tall too
USER: What about a mouse?
AGENT: No it is not tall.


## ANSWER
The conversation revolves around the heights of different things. Both the Eiffel Tower and Mount Everest get the tall vote, while a mouse doesn't.

## CHECK SemanticCondition
It mentions that both the Eiffel Tower and Mount Everest are tall.

## CHECK SemanticCondition
It mentions that a mouse isn't tall.
Enter fullscreen mode Exit fullscreen mode

As you can see, in a scenario, you can set the parameters and the expected answer. Then, you can specify the semantic conditions that the output should meet. The best part? skUnit can run this test for you automatically, and you can see it acing the test. How cool is that?

Test result

What’s great is that these scenarios are valid .md files. This means they’re not just for the tech-savvy among us - anyone can read and understand them! Isn’t that neat?

Markdown

Finally

I enjoy writing semantic tests for SemanticKernel plugins, and I have created a repository to share some of them: https://github.com/mehrandvd/semantic-kernel-skunit-tests

You can see an example of a test scenario for the SummarizeConversationPlugin here.

Top comments (3)

Collapse
 
slgshahryar profile image
shahryar slg

I loved it.
Here is a question, is all the semantic kernel thing, about Azure users? Or could it be used without using azure?

The last times our software broke down, was due to the fact that how LLM models tended to respond, had changed! We solved it by utilizing the "json mode" feature of OpenAI, so that the structure of the response was a bit more deterministic. We couldn't go further in terms of validating the response.
Your posts are tempting my curiosity.

Collapse
 
mehrandvd profile image
Mehran Davoudi

SemanticKernel is not tied to Azure. You can integrate it with any OpenAI service that you have access to. You are right about the importance of unit testing for OpenAI projects. Since the outputs of these projects can vary significantly depending on the prompts and the models, you need a reliable way to test and verify them. That’s where skUnit comes in. It is a testing tool that lets you write and run scenarios for SemanticKernel units, such as plugin functions and kernels.

Collapse
 
slgshahryar profile image
shahryar slg

Thank you Mehran.
It helped alot. I'll dive into it, as soon as I convince others to bring "Validation" tasks up.