DEV Community

Cover image for Structured: Extract Data from Unstructured Input with LLM
Mateusz Charytoniuk
Mateusz Charytoniuk

Posted on • Originally published at github.com

Structured: Extract Data from Unstructured Input with LLM

The Structured project started as a Go conversion of Instructor,
but it is a more general-purpose library. It is designed to be extremely easy to use and set up.

It also features a language-agnostic HTTP server that you can set up in front of llama.cpp.

Same features, Go-like API. Model agnostic - maps data from arbitrary JSON schema to arbitrary Go struct (or just plain JSON).

It is focused on llama.cpp. Support for other vendor APIs (like OpenAI or Anthropic) might be added in the future.

HTTP API

Start a server and point it to your local llama.cpp instance:

./structured \
    --llamacpp-host 127.0.0.1 \
    --llamacpp-port 8081 \
    --port 8080
Enter fullscreen mode Exit fullscreen mode

Structured server connects to llama.cpp to extract the data.

Now, you can issue requests. Include schema and data in your POST body.
The server will respond with JSON matching your schema:

POST http://127.0.0.1:8080/extract/entity
{
  "schema": {
    "type": "object",
    "properties": {
      "hello": {
        "type": "string"
      }
    },
    "required": ["hello"]
  },
  "data": "Say 'world'"
}

Response:
{
  "hello": "world"
}
Enter fullscreen mode Exit fullscreen mode

Programmatic Usage (Optional)

API can change with time until all features are implemented.

Initializing the Mapper

Point it to your local llama.cpp instance:

import (
    "fmt"
    "net/http"
    "testing"

    "github.com/distantmagic/structured/structured"
    "github.com/distantmagic/paddler/llamacpp"
    "github.com/distantmagic/paddler/netcfg"
)

var entityExtractor *EntityExtractor = &structured.EntityExtractor{
    LlamaCppClient: &llamacpp.LlamaCppClient{
        HttpClient: http.DefaultClient,
        LlamaCppConfiguration: &llamacpp.LlamaCppConfiguration{
            HttpAddress: &netcfg.HttpAddressConfiguration{
                Host:   "127.0.0.1",
                Port:   8081,
                Scheme: "http",
            },
        },
    },
    MaxRetries: 3,
}
Enter fullscreen mode Exit fullscreen mode

Extracting Structured Data from String

import "github.com/distantmagic/structured/structured"

responseChannel := make(chan structured.EntityExtractorResult)

go entityExtractor.ExtractFromString(
    responseChannel,
    map[string]any{
        "type": "object",
        "properties": map[string]any{
            "name": map[string]string{
                "type": "string",
            },
            "surname": map[string]string{
                "type": "string",
            },
            "age": map[string]string{
                "description": "Age in years.",
                "type":        "integer",
            },
        },
    },
    "I am John Doe - living for 40 years and I still like to play chess.",
)

for result := range responseChannel {
    if result.Error != nil {
        panic(result.Error)
    }

    // map[name:John, surname:Doe, age:40]
    fmt.Print(result.Result)
}
Enter fullscreen mode Exit fullscreen mode

Mapping Extracted Result onto an Arbitrary Struct

Once you obtain the result:

import "github.com/distantmagic/structured/structured"

type myTestPerson struct {
    Name    string `json:"name"`
    Surname string `json:"surname"`
    Age     int    `json:"age"`
}

func DoUnmarshalsToStruct(result structured.EntityExtractorResult) {
    var person myTestPerson

    err := structured.UnmarshalToStruct(result, &person)

    if nil != err {
        panic(err)
    }

    person.Name // John
    person.Surname // Doe
}
Enter fullscreen mode Exit fullscreen mode

Summary

That's it! :) You can use it as a language-agnostic server. Visit the repository and leave a star to show your support and get notified about new developments.

Top comments (0)