DEV Community

Cover image for Building an AI-Powered Equation Solver with GPT-4o, AutoGen.Net and StepWise
Xiaoyun Zhang
Xiaoyun Zhang

Posted on

Building an AI-Powered Equation Solver with GPT-4o, AutoGen.Net and StepWise

Building an AI-Powered Equation Solver with AutoGen, GPT-4, and StepWise

In this post, we'll walk through building an AI-powered equation solver that can extract equations from images and solve them using the power of GPT-4. We'll be using AutoGen, OpenAI's GPT-4o, and the StepWise framework to create a workflow to resolve equation from user input image.

The complete source code for this project is available on GitHub

The Problem

Mathematical equations are everywhere, from academic papers to whiteboards in meeting rooms. But what if you could simply take a picture of an equation and have an AI solve it for you? That's exactly what we're going to build!

The Solution

We'll create a .NET application that:

  1. Accepts an image input containing an equation
  2. Uses GPT-4's vision capabilities to extract the equation
  3. Converts the equation to LaTeX format
  4. Solves the equation using GPT-4

Let's dive into the components and workflow of our solution.

Key Components

  1. AutoGen: A framework for building AI agents and workflows
  2. OpenAI's GPT-4o: A powerful language model with vision capabilities
  3. StepWise: A framework for creating, visualizing and executing workflows

The Workflow

Our equation solver follows these steps:

  1. Image Input: Accept an image containing an equation
  2. API Key Validation: Ensure we have a valid OpenAI API key
  3. Image Validation: Confirm the image contains exactly one equation
  4. Equation Extraction: Extract the equation from the image and convert it to LaTeX
  5. Equation Solving: Solve the extracted equation

Overview

overview

Let's look at each step in more detail.

1. Image Input

We use StepWise's StepWiseUIImageInput attribute to create a user interface for image input:

[StepWiseUIImageInput(description: "Please provide the image of the equation")]
public async Task<StepWiseImage?> InputImage()
{
    return null;
}
Enter fullscreen mode Exit fullscreen mode

2. API Key Validation

We provide two options for the OpenAI API key: environment variable or manual input:

[StepWiseUITextInput(description: "Please provide the OpenAI API key if env:OPENAI_API_KEY is not set, otherwise leave empty and submit")]
public async Task<string?> OpenAIApiKey()
{
    return null;
}

[Step(description: "Validate the OpenAI API key")]
public async Task<string> ValidateOpenAIApiKey(
    [FromStep(nameof(OpenAIApiKey))] string apiKey)
{
    // ... (key validation logic)
}
Enter fullscreen mode Exit fullscreen mode

3. Image Validation

We use GPT-4 to confirm that the image contains exactly one equation:

[Step(description: "Valid image input to confirm it contains exactly one equation")]
public async Task<bool> ValidateImageInput(
    [FromStep(nameof(InputImage))] StepWiseImage image)
{
    // ... (image validation logic)
}
Enter fullscreen mode Exit fullscreen mode

4. Equation Extraction

GPT-4's vision capabilities are used to extract the equation and convert it to LaTeX:

[Step(description: "Extract the equation from the image into LaTeX format")]
public async Task<string?> ExtractEquationFromImage(
    [FromStep(nameof(ValidateImageInput))] bool valid,
    [FromStep(nameof(InputImage))] StepWiseImage image)
{
    // ... (equation extraction logic)
}
Enter fullscreen mode Exit fullscreen mode

5. Equation Solving

Finally, we use GPT-4 to solve the extracted equation:

[Step(description: "Solve the equation")]
public async Task<string?> SolveEquation(
    [FromStep(nameof(InputImage))] StepWiseImage image,
    [FromStep(nameof(ExtractEquationFromImage))] string equation)
{
    // ... (equation solving logic)
}
Enter fullscreen mode Exit fullscreen mode

Putting It All Together

The magic happens in the EquationSolver class, which combines all these steps into a cohesive workflow. We use AutoGen to create an AI agent with GPT-4, and StepWise to manage the workflow:

public class EquationSolver
{
    private string _apiKey;
    private IAgent _agent;

    // ... (steps implementation)
}
Enter fullscreen mode Exit fullscreen mode

The main application sets up a web server and initializes the workflow:

var host = Host.CreateDefaultBuilder()
    .ConfigureWebHostDefaults(webBuilder =>
    {
        webBuilder.UseUrls("http://localhost:5123");
    })
    .UseStepWiseServer()
    .Build();

await host.StartAsync();

var stepWiseClient = host.Services.GetRequiredService<StepWiseClient>();

var instance = new EquationSolver();
var workflow = Workflow.CreateFromInstance(instance);
stepWiseClient.AddWorkflow(workflow);

await host.WaitForShutdownAsync();
Enter fullscreen mode Exit fullscreen mode

Conclusion

By combining the power of AutoGen, GPT-4o, and StepWise, we've created an equation solver prototype that can extract and solve equations from images. This approach demonstrates the potential of AI in simplifying complex tasks and opens up possibilities for similar applications in various fields.

Top comments (0)