Sasmitha Manathunga

Posted on Jun 11, 2023

ChatGPT Plugin Development Tutorial

#chatgpt #openai #ai #python

In this tutorial, we'll make a simple ChatGPT plugin that generates QR codes.

Before starting, you should have some familiarity with Python, FastAPI, and Poetry. If you aren't familiar with them, you can learn the basics here: Python Tutorial, FastAPI tutorial, and Poetry Tutorial. Also, developer access to ChatGPT is required. If you don't have access yet, you can join the waitlist. OpenAI will send you an email when you receive access. You can confirm your access by going to the ChatGPT plugin store and check if you have the "Develop your own plugin" option.

You can find the final source code for this plugin here.

How ChatGPT plugins work

OpenAI plugins let you connect ChatGPT to third-party applications by interacting with APIs defined by developers to perform a wide range of actions.

Given instructions on how to use an API, ChatGPT intelligently calls an API backend to perform actions such as looking up products in a search catalog, retrieving information from a knowledge base, fetching real-time information and analyzing it—basically, anything that you have defined in your API can be integrated with ChatGPT.

To make a ChatGPT plugin you'll need three things:

API Backend: We're using FastAPI for this tutorial
Manifest File: Contains metadata and instructions for ChatGPT on how to interact with your plugin
OpenAPI Schema: Specifies information on all the endpoints of your API backend including natural language descriptions of the endpoints and requests that ChatGPT can understand

The manifest file, named ai-plugin.json, contains essential information about the plugin such as name, description, URL of the API backend, URL of the OpenAPI schema file, authentication information, contact information, etc.

We'll use the ChatGPT QR Code Plugin Starter Template for our plugin so we can primarily focus on plugin development rather than backend development. So let's dive right in!

Setting up the development environment

Let's quickly set up our development environment to get up and running.

We'll manage our dependencies with Poetry, so install it if you already haven't:

pip install poetry

Clone the starter template and navigate to the project directory:

git clone -b starter-template --single-branch https://github.com/mmz-001/chatgpt-qr-code-plugin
cd chatgpt-qr-code-plugin

Note: The starter template is in the starter-template branch, not the main branch

Create and activate a virtual environment and install the dependencies:

poetry shell
poetry install

Run the API backend locally:

poetry run start

Now, the API backend should be accessible at http://localhost:8000. You can view the interactive documentation at http://localhost:8000/docs (which is empty right now). Also, the OpenAPI schema file is located at http://localhost:8000/openapi.json. We'll need this later.

In the next section, let's go through the starter template and explore its contents.

Exploring the starter template

You'll see several files and folders in the starter template. Let's go through the most important files one by one.

`ai_plugin.json`

This is the manifest file for our plugin. It contains metadata and instructions for ChatGPT on how to interact with our plugin.

{
  "schema_version": "v1",
  "name_for_human": "QR Code",
  "name_for_model": "qr_code",
  "description_for_human": "Generate QR codes.",
  "description_for_model": "Generate QR codes. The generated QR code link should be displayed as a markdown image.",
  "auth": {
    "type": "none"
  },
  "api": {
    "type": "openapi",
    "url": "$host/openapi.json",
    "is_user_authenticated": false
  },
  "logo_url": "$host/.well-known/logo.png",
  "contact_email": "support@example.com",
  "legal_info_url": "http://www.example.com/legal"
}

Let's go through the most important fields.

schema_version: The version of the manifest schema. Currently, only v1 is supported.
name_for_human: The name of the plugin shown to the user.
name_for_model: The name used by the model to refer to the plugin. No spaces allowed, only letters, numbers, and underscores
description_for_human: The description of the plugin shown to the user.
description_for_model: The description of the plugin shown to the model for providing instructions on how and when to use the plugin.
auth: Authentication information for the plugin. The auth type is set to none since we don't require authentication for this simple plugin. I'll talk about authentication in a future tutorial.
api: Information about the API backend. We'll dynamically replace $host with the URL of our API backend.

The other fields are pretty much self-explanatory. You can find more information about the manifest file here.

`routers/well-known.py`

We have defined a router in this file that handles requests to the /.well-known endpoint serving our logo and ai-plugin.json manifest file. ChatGPT looks for the manifest file at /.well-known/ai-plugin.json. So if your plugin is hosted at http://example.com, ChatGPT looks for the manifest file at http://example.com/.well-known/ai-plugin.json. Since we're running our plugin locally, the manifest file is served at http://localhost:8000/.well-known/ai-plugin.json.

import json
from string import Template
from fastapi import APIRouter, Request
from fastapi.responses import FileResponse, Response

well_known = APIRouter(prefix="/.well-known", tags=["well-known"])


def get_host(request: Request):
    host_header = request.headers.get("X-Forwarded-Host") or request.headers.get("Host")
    protocol = request.headers.get("X-Forwarded-Proto") or request.url.scheme
    return f"{protocol}://{host_header}"


def get_ai_plugin():
    with open("ai-plugin.json", encoding="utf-8") as file:
        return json.loads(file.read())


@well_known.get("/logo.png", include_in_schema=False)
async def logo():
    return FileResponse("logo.png", media_type="image/png")


@well_known.get("/ai-plugin.json", include_in_schema=False)
async def manifest(request: Request):
    ai_plugin = get_ai_plugin()
    return Response(
        content=Template(json.dumps(ai_plugin)).substitute(host=get_host(request)),
        media_type="application/json",
    )

Note that we're dynamically replacing the $host string in the manifest file with the URL of our API backend.

`services/qr.py`

In this file, we'll define utility functions for QR code generation using the qrcode library.

import qrcode
from io import BytesIO


def generate_qr_code_from_string(string: str) -> BytesIO:
    """Generate a QR code from a string and return a BytesIO object"""
    img = qrcode.make(string)
    img_buffer = BytesIO()
    img.save(img_buffer, format="PNG")

    return img_buffer

`server/main.py`

The most important parts of our API backend are here. The start() function initiates the server and launches the API backend at http://localhost:8000.

from fastapi import FastAPI, Request, Query
from pydantic import BaseModel, Field
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
from routers.well_known import well_known, get_ai_plugin, get_host
from hashlib import md5
from io import BytesIO

from services.qr import generate_qr_code_from_string

ai_plugin = get_ai_plugin()

app = FastAPI(
    title=ai_plugin["name_for_human"],
    description=ai_plugin["description_for_human"],
    version="0.1.0",
)

app.include_router(well_known)

# We'll add our API endpoints here

def start():
    import uvicorn

    uvicorn.run("server.main:app", host="localhost", port=8000, reload=True)

I've defined a Poetry script in pyproject.toml to easily run the server with the shell command poetry run start.

Now, start the server and go to http://localhost:8000/docs to view the interactive documentation. You'll see that there are no endpoints defined yet but the name and description of the plugin are shown as we read them from ai-plugin.json and pass them to the FastAPI constructor, which generates the OpenAPI schema file.

To view the OpenAPI schema file, go to http://localhost:8000/openapi.json. You'll see the following JSON file:

{
  "openapi": "3.0.2",
  "info": {
    "title": "QR Code",
    "description": "Generate QR codes.",
    "version": "0.1.0"
  },
  "paths": {}
}

ChatGPT uses the OpenAPI schema file to retrieve information about the available API endpoints. This information guides how ChatGPT interacts with our backend. Due to the potential size of the OpenAPI schema file, only the essential information about endpoints will be sent to the plugin prompt. We'll exactly see how the plugin prompt is generated later.

The amazing thing about FastAPI is that it automatically generates the OpenAPI schema file and you can test the API endpoints using the interactive documentation (as we'll see later). Also, the interactive documentation is more readable than the OpenAPI schema file, making plugin development much, much easier.

`logo.png`

This logo will be shown to users when they open the plugin store.

Creating the API endpoints

Our simple plugin does only one thing: Generating QR codes from text. The diagram below shows the architecture of our plugin.

When the user asks ChatGPT to generate a QR code, ChatGPT calls the /generate endpoint of our API passing the string to be encoded as a parameter. The endpoint returns a link to the generated QR code image and ChatGPT displays the QR code image to the user as a markdown image.

All the generated images are stored in a simple Python dictionary. The dictionary key is the image's hash, and the value is the image data itself, represented as bytes in PNG format. I'm keeping things simple and storing the images in memory as this is just an introductory tutorial. In a real-world application, you'll want to store the images in a suitable database.

Now, let's add the API endpoints to server/main.py.



_IMAGE_CACHE: dict[str, BytesIO] = dict()


class GenerateQRCodeRequest(BaseModel):
    string: str


@app.get("/image/{img_hash}.png")
def get_image(img_hash: str):
    img_bytes = _IMAGE_CACHE[img_hash]
    img_bytes.seek(0)  # Reset the buffer to the beginning
    return StreamingResponse(img_bytes, media_type="image/png")


@app.post("/generate")
def generate_qr_code(data: GenerateQRCodeRequest, request: Request):
    img_bytes = generate_qr_code_from_string(data.string)
    img_hash = md5(img_bytes.getvalue()).hexdigest()
    _IMAGE_CACHE[img_hash] = img_bytes
    return {"link": f"{get_host(request)}/image/{img_hash}.png"}

Note that the /image/{img_hash}.png endpoint fetches the image from the cache and returns it as a PNG image. When ChatGPT displays the image, it will use this endpoint to fetch the image.

Testing the API endpoints with SwaggerUI

One of the most useful features of FastAPI is automatic OpenAPI schema generation with interactive documentation for manual testing.

Remember, you can view the interactive documentation at http://localhost:8000/docs.

Click the "Try it out" button and enter a string to be encoded in the QR code. Then click the "Execute" button. You'll see the response from the API endpoint.

Now, the above URL will point to the image generated by the API endpoint.

Excellent! Our API endpoints are working as expected, we can move on to the next step.

Running the plugin locally in ChatGPT

Before we jump into ChatGPT to install the plugin, we just need to do one more thing: Make our API backend accessible to ChatGPT. We can do this easily with FastAPI's CORSMiddleware middleware. Add the following lines to server/main.py just after defining the app object.

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://chat.openai.com"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Now, to install the plugin, navigate to the plugin store and select "Develop your own plugin". Then enter localhost:8000 as the domain and continue.

Note: Make sure the backend is running before continuing.

After the manifest file and OpenAPI spec are validated, install the plugin.

Start a new chat session and enable the QR Code plugin. Now, let's see our plugin in action by generating a QR code.

As you can see, the plugin generated a QR code from the string we entered. You can view the requests and responses by clicking the down arrow next to the plugin name.

Interestingly, ChatGPT used the /image/{img_hash}.png endpoint which returns the image itself as a PNG image. But ChatGPT isn't supposed to directly send requests to this endpoint. It should use the link provided by the first request to display the image as a markdown image.

The issue is that the information in the OpenAPI spec doesn't fully describe how ChatGPT should interact with the API backend. We need to add a couple of descriptions for the routes and the requests to the OpenAPI spec to tell ChatGPT how to interact with the API backend. Also, we need to hide the /image/{img_hash}.png endpoint from the OpenAPI spec since this will not be used directly by ChatGPT.

Adding natural language descriptions to the endpoints and requests

When a user makes a query, ChatGPT looks at the descriptions of the endpoints in the OpenAPI specification along with the description_for_model field in the manifest file to figure out how to interact with the API backend.

As we discussed earlier, the whole OpenAPI spec isn't passed to the plugin prompt due to context size limitations. The plugin prompt is a TypeScript-like specification generated from the information in the manifest file and the OpenAPI spec.

Let's explore how to view the plugin prompt that's generated and passed to ChatGPT.

Using the plugin devtool

You can use the developer console to view the manifest, OpenAPI spec, and the plugin prompt (TypeScript spec) when developing on localhost. The plugin devtool can be opened by going to "Settings" and toggling "Open plugin devtool."

Currently, the prompt for our plugin looks like this:

// Generate QR codes. The generated QR code link should be displayed as a markdown image.
namespace qr_code {

// Get Image
type get_image_image__img_hash__png_get = (_: {
img_hash: string,
}) => any;

// Generate Qr Code
type generate_qr_code_generate_post = (_: {
string: string,
}) => any;

} // namespace qr_code

As you can see, the generated schema doesn't provide much information about the endpoints. The first comment at the top of the schema is taken from the description_for_model field in the manifest file and the descriptions for the endpoints are the function names we gave to our endpoints. Since this is a simple API, ChatGPT can figure out how to use these endpoints but for more complex APIs we'll need to provide detailed descriptions and meaningful names for the endpoints and the request types.

An extremely useful feature of FastAPI is that it provides various ways to add descriptions to the OpenAPI spec, so you don't need to manually add them.

Let's see how we can add these descriptions using FastAPI in the next section.

Note: When you make changes to your OpenAPI schema file or ai-plugin.json manifest file, you'll need to refresh the plugin from the plugin devtools and start a new chat session for the changes to take effect.

Adding descriptions to the endpoints using docstrings

The docstrings of the endpoints are automatically added to the OpenAPI spec as a description field. Let's add descriptions to the endpoints with docstrings.


@app.get("/image/{img_hash}.png")
def get_image(img_hash: str):
    """Get an image from the cache."""
    img_bytes = _IMAGE_CACHE[img_hash]
    img_bytes.seek(0)  # Reset the buffer to the beginning
    return StreamingResponse(img_bytes, media_type="image/png")


@app.post("/generate")
def generate_qr_code(data: GenerateQRCodeRequest, request: Request):
    """Generate a QR code from a string and return a
    link to the image."""
    img_bytes = generate_qr_code_from_string(data.string)
    img_hash = md5(img_bytes.getvalue()).hexdigest()
    _IMAGE_CACHE[img_hash] = img_bytes
    return {"link": f"{get_host(request)}/image/{img_hash}.png"}

Now, if you view the interactive documentation you'll see that the descriptions are added to the endpoints.

Note that the interactive documentation is generated by the OpenAPI spec. So if you go to http://localhost:8000/openapi.json you'll see the same thing under the description of the endpoints.

We'll use the interactive documentation to view the information in the OpenAPI spec since it's more readable.

After you refresh the plugin, the plugin prompt will be updated with the new descriptions.

// Generate QR codes. The generated QR code link should be displayed as a markdown image.
namespace qr_code {

// Get an image from the cache.
type get_image_image__img_hash__png_get = (_: {
img_hash: string,
}) => any;

// Generate a QR code from a string and return a
// link to the image.
type generate_qr_code_generate_post = (_: {
string: string,
}) => any;

} // namespace qr_code

Adding descriptions to the requests using Pydantic models

Since this is a simple QR code app, most of our request types are self-explanatory. But in a real-world application, you want to add descriptions to the requests so ChatGPT can understand them.

Let's add a description to the GenerateQRCodeRequest model's string field.

class GenerateQRCodeRequest(BaseModel):
    string: str = Field(
        ...,
        description="The string to encode in the QR code",
    )

We can view the added description in the schemas section of the interactive documentation.

Now, the plugin prompt will show the added description.

// Generate QR codes. The generated QR code link should be displayed as a markdown image.
namespace qr_code {

// Get an image from the cache.
type get_image_image__img_hash__png_get = (_: {
img_hash: string,
}) => any;

// Generate a QR code from a string and return a
// link to the image.
type generate_qr_code_generate_post = (_: {
// The string to encode in the QR code
string: string,
}) => any;

} // namespace qr_code

Hiding endpoints from the OpenAPI spec

ChatGPT doesn't know anything about our API other than what's defined in the OpenAPI spec. When building larger plugins or integrating with existing APIs, you may want to hide certain endpoints from the OpenAPI spec.

Remember that ChatGPT erroneously sent a request directly to the /image/{img_hash}.png endpoint which should only be used to display the image. So let's hide it from the OpenAPI spec by adding the include_in_schema=False parameter to the endpoint's decorator.

@app.get("/image/{img_hash}.png", include_in_schema=False)
def get_image(img_hash: str):
  # Endpoint code here

Now, you will no longer see the endpoint in the interactive documentation but you can still access it. Now, let's see if our plugin works as expected.

Great! our plugin works perfectly! Make sure to test several times with different prompts just to make sure everything is working correctly.

Understanding ChatGPT's request mechanism

When I was developing this plugin, one burning question I had was how did ChatGPT send requests to my API? So after a little bit of experimenting, I found out some interesting things.

The plugin prompt is the only source of information ChatGPT has about our API and, if you have noticed, does not contain details about the endpoints or on how to transmit the request data. When ChatGPT wants to send a request, it outputs a special structured format that the ChatGPT backend system interprets as a command.

For example, if ChatGPT wants to send a request to create a QR code for https://chat.openai.com/, it produces a structured output given below. This output is parsed and processed by the ChatGPT backend system and the request is sent to our API. Note that I've omitted some special tokens that are used to differentiate between a conversation response and a structured response.

  assistant to=qr_code.generate_qr_code_generate_post
  {
    "string": "https://chat.openai.com/"
  }

Here we have the name_for_model which we defined in the manifest file and the operationId of the endpoints in the OpenAPI spec. All communication to the ChatGPT backend happens through a JSON object and the response is also a JSON object.

Wrapping up

And that's it. You've created your first ChatGPT plugin for generating QR codes.

The starter template we've used in this tutorial is great for getting up and running quickly, but in a real-world application, you'll need to follow best practices such as modularizing your codebase, handling authentication, validating requests, handling errors, writing tests, and much more.

I hope you enjoyed this article, and if you have any questions feel free to leave a comment below.

Happy coding!

🙌 Hey! If you enjoy my content and want to show some love, feel free to buy me a coffee. Each cup helps me create more useful content for incredible developers like you!

DEV Community