DEV Community

Cover image for Fine-tuning gpt: Building a better and cheaper assistant?
Amal Shaji
Amal Shaji

Posted on • Originally published at amal.sh

Fine-tuning gpt: Building a better and cheaper assistant?

From OpenAI docs

Fine-tuning lets you get more out of the models available through the API by providing:

  1. Higher quality results than prompting
  2. Ability to train on more examples than can fit in a prompt
  3. Token savings due to shorter prompts
  4. Lower latency requests

In this post, we will look at how to build an assistant with and without fine-tuning in this experiment and compare the results.

Code

https://github.com/amalshaji/finetune-gpt

Use-case

Let's build an assistant capable of generating template messages for engaging with our users. The assistant will generate messages based on the required tone and use pre-defined template tags wherever necessary.

e.g.:

Happy Birthday {{ first_name }}! We're thrilled to celebrate this special day with you. As a valued customer, your happiness is our top priority. Enjoy this day to the fullest, and here's to another year of great experiences with us. Cheers to you on your birthday!
Enter fullscreen mode Exit fullscreen mode

Assume the following tags are valid: first_name, email, last_name

Without fine-tuning

Generate a message in less than 50 words using the following parameters:
    occasion: {occasion}
    tone: {tone}

Use the following template tags as placeholders wherever necessary
| tags | description |
|---|---|
| {{ first_name }} | User's first name |
| {{ last_name }} | User's last name |
| {{ email }} | User's email |
Enter fullscreen mode Exit fullscreen mode

The program will set the occasion and tone before passing to the OpenAI endpoint.

import asyncio
import openai
import os
from string import Template

openai.api_key = os.getenv("OPENAI_API_KEY")

prompt = Template(
    """
Generate a message in less than 50 words using the following parameters:
    occasion: $occasion
    tone: $tone

Use the following template tags as placeholders wherever necessary
| tags | description |
|---|---|
| {{ first_name }} | User's first name |
| {{ last_name }} | User's last name |
| {{ email }} | User's email |
"""
)


async def run(occasion: str, tone: str):
    response = await openai.ChatCompletion.acreate(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": prompt.substitute({"occasion": occasion, "tone": tone}),
            }
        ],
    )
    print(response)


asyncio.run(run(occasion="birthday", tone="texas accent"))
Enter fullscreen mode Exit fullscreen mode
  • The prompt is set to generate a birthday message in a Texas accent
  • Chat completion API used for accessing the gpt3.5 model
  • Using gpt-3.5 as fine-tuning does not support the gpt-4 model

Output

{
  "id": "chatcmpl-817dhqQoNvTmBHlDMKDkiQIYezgRn",
  "object": "chat.completion",
  "created": 1695278385,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Happy birthday, {{ first_name }}! Hope y'all have a rootin' tootin' day filled with joy and good ol' Texas charm. Yeehaw! \ud83e\udd20"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 99,
    "completion_tokens": 39,
    "total_tokens": 138
  }
}
Enter fullscreen mode Exit fullscreen mode
  • Time to complete: ~3s
  • Total cost for this execution: (99/1000 x 0.0015 + 39/1000x0.002) = $0.0002265.

Fine-tuning

Preparing the dataset

OpenAI fine-tuning requires a dataset of at least ten examples to work. It accepts the input as a jsonl (json lines) file, where each line is a valid json. e.g.:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."},{"role": "user", "content": "What's the largest planet in our solar system?"},{"role": "assistant", "content": "Jupiter, but don't worry, it's not like it's taking up half the solar system or anything."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."},{"role": "user", "content": "Who discovered penicillin?"},{"role": "assistant", "content": "Alexander Fleming. But don't get too excited; it's just one of the most important discoveries in medical history."}]}
Enter fullscreen mode Exit fullscreen mode

Here is the training data we will use to fine-tune our model.

Create a new file

async def create_training_file():
    file_create_output = await openai.File.acreate(
        file=open("data.jsonl", "rb"), purpose="fine-tune"
    )
    print(f"Created training file: {file_create_output}")
    return file_create_output
Enter fullscreen mode Exit fullscreen mode

The response should contain the file id and status. If the status is not processed, keep checking the status manually await openai.File.aretrieve(file_id)

Fine-tune

We can schedule a fine-tuning job with OpenAI using the uploaded file ID.


async def start_finetune_job(training_file: str):
    ft_job = await openai.FineTuningJob.acreate(
        training_file=training_file, model="gpt-3.5-turbo"
    )
    print(f"Created fine tuning job: {ft_job}")
    return ft_job
Enter fullscreen mode Exit fullscreen mode

Navigate to https://platform.openai.com/finetune to see the job status. Once complete, you'll see the relevant metrics, including the tokens used for training.

fine tune dashboard

  • Tokens used for training: 12,350
  • Total cost: (12350 / 1000 x 0.0080) = $0.0988.

Working with the new model

We run something similar to the non-fine-tuned version, replacing the model with our custom model.

async def run(occasion: str, tone: str):
    start = timeit.default_timer()
    response = await openai.ChatCompletion.acreate(
        model="ft:gpt-3.5-turbo-0613:personal::819PHd1U",
        messages=[
            {
                "role": "system",
                "content": prompt,
            },
            {
                "role": "user",
                "content": f"occasion: {occasion}, tone: {tone}",
            },
        ],
    )
    print(response)
    print(f"Time taken: {timeit.default_timer() - start}s")


asyncio.run(run(occasion="birthday", tone="informal"))
Enter fullscreen mode Exit fullscreen mode

imports and other stuff are the same as the non-fine-tuned version

{
  "id": "chatcmpl-81AzImR0c0f9h3Kud3sHeHo8rmRyl",
  "object": "chat.completion",
  "created": 1695291256,
  "model": "ft:gpt-3.5-turbo-0613:personal::819PHd1U",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Happy birthday, {{ first_name }}! Hope your special day is filled with lots of cake, presents, and non-stop fun."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 96,
    "completion_tokens": 26,
    "total_tokens": 122
  }
}
Enter fullscreen mode Exit fullscreen mode

There's a slight reduction in the number of tokens, and the total cost is around $0.000196.

Comparison

non fine-tuned fine-tuned
Cost If you need to pass examples in the system commands, it will incur a higher cost cheaper, as the model can be fine-tuned on the examples
Availability Always available After training, the API raised a lot of server overload exceptions before giving the correct response.
The inference speed is slow; on average, it takes ~3 seconds faster, and most queries are finished in less than a second. Some outliers took more than 10 seconds.
Ease of use Straightforward, write prompt and query Prepare training and validation data, create files, start training, and wait for training to finish (Obviously, you can automate all these).
Limitations No known limitations (Using this as the base for comparison) gpt-4, function calling not available (later this year)

Conclusion

In this experiment, we improved the output of our prompt by fine-tuning gpt-3.5 on a small dataset. You must pass examples with each prompt in the non-fine-tuned version to achieve the same. Imagine the cost of doing this for 100,000-1,000,000 API calls. Although this experiment did not show any difference in results(as the task was easy for gpt), a complex task would've benefited from fine-tuning.

Fine-tuning GPT-3 models can significantly improve their performance on complex tasks, making them ideal for building low-cost, highly efficient assistants. While the process of fine-tuning requires some additional effort, the benefits in terms of cost and speed make it a worthwhile investment. As support for GPT-4 and function call is added, the potential applications of fine-tuned models will only continue to expand. However, it's essential to carefully consider whether fine-tuning is the right approach for your specific use case. By researching and weighing the pros and cons, you can decide whether fine-tuning is the best way to achieve your goals.

References

Top comments (0)