Tackling JSON Perplexity in LLM Outputs: A Weekend Project

#llm #perplexity #json #yup

This weekend, I dove deep into a problem we often encounter in natural language processing (NLP): ensuring the accuracy and reliability of JSON outputs from large language models (LLMs), particularly when dealing with key/value pairs.

The Challenge

We frequently face the issue of not having direct methods to measure perplexity or log probabilities on function calls from LLMs. This makes it tough to trust the reliability of the JSON generated by these models, especially when it's critical to ensure that each key and value in our outputs not only makes sense but is also based on predictable patterns.

My Solution

To address this, I developed a robust JSON parser. The goal was to extract JSON directly from the stream of log probabilities provided by OpenAI when generating text outputs that contain JSON elements. This parser isn't just about pulling JSON out of the text—it's smart enough to calculate the perplexity and probabilities for each key/value, ensuring that what we get is as accurate as it can be. While JSON parsing can get a bit complex, and my solution isn't flawless, it has passed all my tests and is proving quite robust for my needs.

For example: for a given JSON object generated by an LLM, such as:

{ formalName: 'Josiah Bryan', nickname: 'Joey', ageGuess: 28 }

For that same object, my parser can generate a metadata object with the following data:

{
  formalName: {
    key: 'formalName',
    value: 'Josiah Bryan',
    keyProb: 0.999996,
    valueProb: 0.999957,
    keyPerplexity: 1.000001,
    valuePerplexity: 1.000014,
    finished: true
  },
  nickname: {
    key: 'nickname',
    value: 'Joey',
    keyProb: 0.999996,
    valueProb: 0.872926,
    keyPerplexity: 1.000004,
    valuePerplexity: 1.070314,
    finished: true
  },
  ageGuess: {
    key: 'ageGuess',
    value: 28,
    keyProb: 0.999994,
    valueProb: 0.594872,
    keyPerplexity: 1.000003,
    valuePerplexity: 1.681035,
    finished: true
  }
}

(The finished prop in this example is useful when parsing a stream of chunks. When parsing JSON from a firehose like that, the finished prop is false while the parser is still consuming more tokens for the value. Once the parser hits an end token (e.g. , or ", etc), it flips finished to true so you know the value is final.)

Why It's Cool

This is made practically useful part with a custom yup decorator to actively manage the model's output. If the parser detects that the perplexity of a generated content goes above our comfort threshold, it can automatically tweak the prompt or inject additional grounding into the model’s inputs. This ensures that the generated JSON is not only precise but also deeply rooted in factual accuracy.

For example, here's how the schema is specified with custom max perplexity values per field:

const schema = yup.object().shape({
    formalName: yup
        .string()
        .required()
        .description('Formal name')
        .perplexity({ max: 1.125 }),
    nickname: yup
        .string()
        .required()
        .description('Generated nickname')
        .perplexity({ max: 1.5 }),
    ageGuess: yup
        .number()
        .required()
        .description('Generated age guess')
        .perplexity({ max: 99 }),
});

Then, when passing that to the coaxLLm method, we can also include a callback to add more grounding when perplexity is too high on a given field:

const { content, object, objectWithMetadata, failure } = await coaxLlm({
    prompt,
    schema,
    logger,
    langfuseTrace,
    cacheMode: 'save',
    failureInjectCallback: async ({ type, path }) => {
        if (
            type === 'perplexity' && 
            ['nickname', 'formalName'].includes(path)
        ) {
            return [`My name is: "${authorization.user.name}"`];
        }

        return [];
    },
});

Just in time for a busy upcoming week, this tool has become an indispensable asset in my toolkit, enhancing the grounding of LLM outputs and significantly speeding up JSON generation—a win-win for any developer.

Check Out the Code

Interested in seeing this in action or integrating it into your own projects? Here’s the link to the full code on how to coax and re-ground the LLM effectively: coax-llm.js.

Bonus: Real-Time Streaming

This parser also works seamlessly with streaming outputs from LLMs. This means we can fetch JSON objects and log probabilities in real-time, without waiting for the entire text generation to complete. It’s efficient and allows for immediate adjustments or error handling, boosting both performance and reliability.

Dive Deeper

For those who love digging into the nuts and bolts, here’s a direct link to the parser itself: logprobsToAnnotatedJson.js.

While I haven’t made the underlying detailed benchwork public, the gists provided are self-contained and full of actionable insights. They're not just theoretical but are primed for real-world application, and I'm using them personally in production (pushing them to my k8s cluster tonight, even as I type.)

Looking forward to your thoughts and any feedback you might have!