DEV Community

Cover image for ✨ Train ChatGPT on your Documentation 🪄 ✨
Eric Allam for Trigger.dev

Posted on • Originally published at trigger.dev

✨ Train ChatGPT on your Documentation 🪄 ✨

TL;DR

ChatGPT is trained until 2022.

But what if you want it to give you information specifically about your website? Most likely, it’s not possible, but not anymore!

OpenAI introduced their new feature - assistants.

You can now easily index your website and then ask ChatGPT questions about it. In this tutorial, we will build a system that indexes your website and lets you query it. We will:

  • Scrape the documentation sitemap.
  • Extract the information from all the pages on the website.
  • Create a new assistant with the new information.
  • Build a simple ChatGPT frontend interface and query the assistant.

Assistant


Your background job platform 🔌

Trigger.dev is an open-source library that enables you to create and monitor long-running jobs for your app with NextJS, Remix, Astro, and so many more!

 

GiveUsStars

Please help us with a star 🥹.
It would help us to create more articles like this 💖

Star the Trigger.dev repository ⭐️


Let’s get started 🔥

Let’s set up a new NextJS project.

 npx create-next-app@latest
Enter fullscreen mode Exit fullscreen mode

💡 We use NextJS new app router. Please make sure you have a node version 18+ before installing the project

Let's create a new database to save the assistant and the scraped pages.
For our example, we will use Prisma with SQLite.

It is super easy to install, just run:

npm install prisma @prisma/client --save
Enter fullscreen mode Exit fullscreen mode

And then add a schema and a database with

npx prisma init --datasource-provider sqlite
Enter fullscreen mode Exit fullscreen mode

Go to prisma/schema.prisma and replace it with the following schema:

// This is your Prisma schema file,
// learn more about it in the docs: https://pris.ly/d/prisma-schema

generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "sqlite"
  url      = env("DATABASE_URL")
}

model Docs {
  id         Int      @id @default(autoincrement())
  content    String
  url        String @unique
  identifier String
  @@index([identifier])
}

model Assistant {
  id         Int      @id @default(autoincrement())
  aId        String
  url        String   @unique
}
Enter fullscreen mode Exit fullscreen mode

And then run

npx prisma db push
Enter fullscreen mode Exit fullscreen mode

That will create a new SQLite database (local file) with two main tables: Docs and Assistant

  • The Docs contains all the scraped pages
  • The Assistant contains the URL of the docs and the internal ChatGPT assistant ID.

Let’s add our Prisma client.

Create a new folder called helper and add a new file called prisma.ts and the following code inside:

import {PrismaClient} from '@prisma/client';

export const prisma = new PrismaClient();
Enter fullscreen mode Exit fullscreen mode

We can later use that prisma variable to question our database.


ScrapeAndIndex

Scrape & Index

Create a Trigger.dev account

Scraping and indexing the pages is a long-running task. We need to:

  • Scrape the main website meta URL for the sitemap.
  • Extract all the pages inside the sitemap.
  • Go to each page and extract the content.
  • Save everything to the ChatGPT assistant.

For that, let’s use Trigger.dev!

Sign up for a Trigger.dev account.

Once registered, create an organization and choose a project name for your job.

pic1

Select Next.js as your framework and follow the process for adding Trigger.dev to an existing Next.js project.

pic2

Otherwise, click Environments & API Keys on the sidebar menu of your project dashboard.

pic3

Copy your DEV server API key and run the code snippet below to install Trigger.dev.

Follow the instructions carefully.

npx @trigger.dev/cli@latest init
Enter fullscreen mode Exit fullscreen mode

Run the following code snippet in another terminal to establish a tunnel between Trigger.dev and your Next.js project.

npx @trigger.dev/cli@latest dev
Enter fullscreen mode Exit fullscreen mode

Install ChatGPT (OpenAI)

We will use OpenAI assistant, so we must install it on our Project.

Create a new OpenAI account and generate an API Key.

pic4

Click View API key from the dropdown to create an API Key.

pic5

Next, install the OpenAI package by running the code snippet below.

npm install @trigger.dev/openai
Enter fullscreen mode Exit fullscreen mode

Add your OpenAI API key to the .env.local file.

OPENAI_API_KEY=<your_api_key>
Enter fullscreen mode Exit fullscreen mode

Create a new directory, helper and add a new file, open.ai.tsx with the following content:

import {OpenAI} from "@trigger.dev/openai";

export const openai = new OpenAI({
    id: "openai",
    apiKey: process.env.OPENAI_API_KEY!,
});
Enter fullscreen mode Exit fullscreen mode

That’s our OpenAI client wrapped by Trigger.dev integration.

Building the background jobs

Let’s go ahead and create a new background job!

Go to jobs and create a new file called process.documentation.ts. Add the following code:

import { eventTrigger } from "@trigger.dev/sdk";
import { client } from "@openai-assistant/trigger";
import {object, string} from "zod";
import {JSDOM} from "jsdom";
import {openai} from "@openai-assistant/helper/open.ai";

client.defineJob({
  // This is the unique identifier for your Job; it must be unique across all Jobs in your project.
  id: "process-documentation",
  name: "Process Documentation",
  version: "0.0.1",
  // This is triggered by an event using eventTrigger. You can also trigger Jobs with webhooks, on schedules, and more: https://trigger.dev/docs/documentation/concepts/triggers/introduction
  trigger: eventTrigger({
    name: "process.documentation.event",
    schema: object({
      url: string(),
    })
  }),
  integrations: {
    openai
  },
  run: async (payload, io, ctx) => {
  }
});
Enter fullscreen mode Exit fullscreen mode

We have defined a new job called process.documentation.event, and we added a required parameter called URL - that’s our documentation URL to be sent later.

As you can see, the job is empty, so let’s add the first task to it.

We need to grab the website sitemap and return it.
Scraping the website will return an HTML that we need to parse.
To do it, let’s install JSDOM.

npm install jsdom --save
Enter fullscreen mode Exit fullscreen mode

And import it at the top of our file:

import {JSDOM} from "jsdom";
Enter fullscreen mode Exit fullscreen mode

Now, we can add our first task.

It’s important to wrap our code with runTask, which lets Trigger.dev separate it from the other tasks. Trigger special architecture splits the tasks into different processes so Vercel serverless timeout does not affect them. Here is the code for the first task:

const getSiteMap = await io.runTask("grab-sitemap", async () => {
  const data = await (await fetch(payload.url)).text();
  const dom = new JSDOM(data);
  const sitemap = dom.window.document.querySelector('[rel="sitemap"]')?.getAttribute('href');
  return new URL(sitemap!, payload.url).toString();
});
Enter fullscreen mode Exit fullscreen mode
  • We grab the entire HTML from the URL with an HTTP request.
  • We convert it into a JS object.
  • We find the sitemap URL.
  • We parse it and return it.

Going forward, we need to scrape the sitemap, extract all the URLs and return them.
Let’s install Lodash - special functions for array structures.

npm install lodash @types/lodash --save
Enter fullscreen mode Exit fullscreen mode

Here is the code of the task:

export const makeId = (length: number) => {
    let text = '';
    const possible = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';

    for (let i = 0; i < length; i += 1) {
        text += possible.charAt(Math.floor(Math.random() * possible.length));
    }
    return text;
};

const {identifier, list} = await io.runTask("load-and-parse-sitemap", async () => {
    const urls = /(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])/g;
    const identifier = makeId(5);
    const data = await (await fetch(getSiteMap)).text();
    // @ts-ignore
    return {identifier, list: chunk(([...new Set(data.match(urls))] as string[]).filter(f => f.includes(payload.url)).map(p => ({identifier, url: p})), 25)};
});
Enter fullscreen mode Exit fullscreen mode
  • We create a new function called makeId to generate a random identifier for all our pages.
  • We create a new task and add a Regex to extract every possible URL
  • We send an HTTP request to load the sitemap and extract all its URLs.
  • We chunk the URL into arrays of 25 elements (if we have 100 elements, we will have four arrays of 25 elements)

Next, let’s create a new job to process each URL.

Here is the complete code:

function getElementsBetween(startElement: Element, endElement: Element) {
    let currentElement = startElement;
    const elements = [];

    // Traverse the DOM until the endElement is reached
    while (currentElement && currentElement !== endElement) {
        currentElement = currentElement.nextElementSibling!;

        // If there's no next sibling, go up a level and continue
        if (!currentElement) {
            // @ts-ignore
            currentElement = startElement.parentNode!;
            startElement = currentElement;
            if (currentElement === endElement) break;
            continue;
        }

        // Add the current element to the list
        if (currentElement && currentElement !== endElement) {
            elements.push(currentElement);
        }
    }

    return elements;
}

const processContent = client.defineJob({
  // This is the unique identifier for your Job; it must be unique across all Jobs in your project.
  id: "process-content",
  name: "Process Content",
  version: "0.0.1",
  // This is triggered by an event using eventTrigger. You can also trigger Jobs with webhooks, on schedules, and more: https://trigger.dev/docs/documentation/concepts/triggers/introduction
  trigger: eventTrigger({
    name: "process.content.event",
    schema: object({
      url: string(),
      identifier: string(),
    })
  }),
  run: async (payload, io, ctx) => {
    return io.runTask('grab-content', async () => {
        // We first grab a raw html of the content from the website
        const data = await (await fetch(payload.url)).text();

        // We load it with JSDOM so we can manipulate it
        const dom = new JSDOM(data);

        // We remove all the scripts and styles from the page
        dom.window.document.querySelectorAll('script, style').forEach((el) => el.remove());

        // We grab all the titles from the page
        const content = Array.from(dom.window.document.querySelectorAll('h1, h2, h3, h4, h5, h6'));

        // We grab the last element so we can get the content between the last element and the next element
        const lastElement = content[content.length - 1]?.parentElement?.nextElementSibling!;
        const elements = [];

        // We loop through all the elements and grab the content between each title
        for (let i = 0; i < content.length; i++) {
            const element = content[i];
            const nextElement = content?.[i + 1] || lastElement;
            const elementsBetween = getElementsBetween(element, nextElement);
            elements.push({
                title: element.textContent, content: elementsBetween.map((el) => el.textContent).join('\n')
            });
        }

        // We create a raw text format of all the content
        const page = `
        ----------------------------------
        url: ${payload.url}\n
        ${elements.map((el) => `${el.title}\n${el.content}`).join('\n')}

        ----------------------------------
        `;

        // We save it to our database
        await prisma.docs.upsert({
            where: {
                url: payload.url
            }, update: {
                content: page, identifier: payload.identifier
            }, create: {
                url: payload.url, content: page, identifier: payload.identifier
            }
        });
    });
  },
});
Enter fullscreen mode Exit fullscreen mode
  • We grab the content from the URL (previously extracted from the sitemap)
  • We parse it with JSDOM
  • We remove every possible <script>or <style> that exists on the page.
  • We grab all the titles on the page (h1, h2, h3, h4, h5, h6)
  • We iterate over the titles and take the content between them. We don’t want to take the entire page content because it might contain irrelevant content.
  • We create our version of the raw text of the page and save it to our database.

Now, let’s run this task for every sitemap URL.
Trigger introduces something called batchInvokeAndWaitForCompletion.
It allows us to send batches of 25 items to process, and it will simultaneously process all of them. Here are the next lines of codes:

let i = 0;
for (const item of list) {
    await processContent.batchInvokeAndWaitForCompletion(
        'process-list-' + i,
        item.map(
            payload => ({
            payload,
        }),
        86_400),
    );
    i++;
} 
Enter fullscreen mode Exit fullscreen mode

We manually trigger the previously created job in a batch of 25.

Once that’s completed, let’s take all the content we have saved to our database and connect it:

const data = await io.runTask("get-extracted-data", async () => {
    return (await prisma.docs.findMany({
        where: {
            identifier
        },
        select: {
            content: true
        }
    })).map((d) => d.content).join('\n\n');
});
Enter fullscreen mode Exit fullscreen mode

We use the identifier we have specified before.

Now, let’s create a new file in ChatGPT with the new data:

const file = await io.openai.files.createAndWaitForProcessing("upload-file", {
  purpose: "assistants",
  file: data
});
Enter fullscreen mode Exit fullscreen mode

createAndWaitForProcessing is a task created by Trigger.dev to upload files to the assistant. If you manually use openai without the integration, you must stream the files.

Now let’s create or update our assistant:

const assistant = await io.openai.runTask("create-or-update-assistant", async (openai) => {
    const currentAssistant = await prisma.assistant.findFirst({
        where: {
            url: payload.url
        }
    });
    if (currentAssistant) {
        return openai.beta.assistants.update(currentAssistant.aId, {
            file_ids: [file.id]
        });
    }
    return openai.beta.assistants.create({
        name: identifier,
        description: 'Documentation',
        instructions: 'You are a documentation assistant, you have been loaded with documentation from ' + payload.url + ', return everything in an MD format.',
        model: 'gpt-4-1106-preview',
        tools: [{ type: "code_interpreter" }, {type: 'retrieval'}],
        file_ids: [file.id],
    });
});
Enter fullscreen mode Exit fullscreen mode
  • We first check if we have an assistant for that specific URL.
  • If we have one, let’s update the assistant with the new file.
  • If not, let’s create a new assistant.
  • We pass the instruction of “you are a documentation assistant.”, it’s essential to notice that we want the final output to be in MD format so we can display it nicer later.

For the final piece of the Puzzle, let’s save the new assistant into our database.

Here is the code:

await io.runTask("save-assistant", async () => {
    await prisma.assistant.upsert({
        where: {
            url: payload.url
        },
        update: {
            aId: assistant.id,
        },
        create: {
            aId: assistant.id,
            url: payload.url,
        }
    });
});
Enter fullscreen mode Exit fullscreen mode

If the URL already exists, we can try to update it with the new assistant ID.

Here is the full code of the page:

import { eventTrigger } from "@trigger.dev/sdk";
import { client } from "@openai-assistant/trigger";
import {object, string} from "zod";
import {JSDOM} from "jsdom";
import {chunk} from "lodash";
import {prisma} from "@openai-assistant/helper/prisma.client";
import {openai} from "@openai-assistant/helper/open.ai";

const makeId = (length: number) => {
    let text = '';
    const possible = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';

    for (let i = 0; i < length; i += 1) {
        text += possible.charAt(Math.floor(Math.random() * possible.length));
    }
    return text;
};

client.defineJob({
  // This is the unique identifier for your Job; it must be unique across all Jobs in your project.
  id: "process-documentation",
  name: "Process Documentation",
  version: "0.0.1",
  // This is triggered by an event using eventTrigger. You can also trigger Jobs with webhooks, on schedules, and more: https://trigger.dev/docs/documentation/concepts/triggers/introduction
  trigger: eventTrigger({
    name: "process.documentation.event",
    schema: object({
      url: string(),
    })
  }),
  integrations: {
    openai
  },
  run: async (payload, io, ctx) => {

    // The first task to get the sitemap URL from the website
    const getSiteMap = await io.runTask("grab-sitemap", async () => {
      const data = await (await fetch(payload.url)).text();
      const dom = new JSDOM(data);
      const sitemap = dom.window.document.querySelector('[rel="sitemap"]')?.getAttribute('href');
      return new URL(sitemap!, payload.url).toString();
    });

    // We parse the sitemap; instead of using some XML parser, we just use regex to get the URLs and we return it in chunks of 25
    const {identifier, list} = await io.runTask("load-and-parse-sitemap", async () => {
        const urls = /(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])/g;
        const identifier = makeId(5);
        const data = await (await fetch(getSiteMap)).text();
        // @ts-ignore
        return {identifier, list: chunk(([...new Set(data.match(urls))] as string[]).filter(f => f.includes(payload.url)).map(p => ({identifier, url: p})), 25)};
    });

    // We go into each page and grab the content; we do this in batches of 25 and save it to the DB
    let i = 0;
    for (const item of list) {
        await processContent.batchInvokeAndWaitForCompletion(
            'process-list-' + i,
            item.map(
                payload => ({
                payload,
            }),
            86_400),
        );
        i++;
    }

    // We get the data that we saved in batches from the DB
    const data = await io.runTask("get-extracted-data", async () => {
        return (await prisma.docs.findMany({
            where: {
                identifier
            },
            select: {
                content: true
            }
        })).map((d) => d.content).join('\n\n');
    });

    // We upload the data to OpenAI with all the content
    const file = await io.openai.files.createAndWaitForProcessing("upload-file", {
      purpose: "assistants",
      file: data
    });

    // We create a new assistant or update the old one with the new file
    const assistant = await io.openai.runTask("create-or-update-assistant", async (openai) => {
        const currentAssistant = await prisma.assistant.findFirst({
            where: {
                url: payload.url
            }
        });
        if (currentAssistant) {
            return openai.beta.assistants.update(currentAssistant.aId, {
                file_ids: [file.id]
            });
        }
        return openai.beta.assistants.create({
            name: identifier,
            description: 'Documentation',
            instructions: 'You are a documentation assistant, you have been loaded with documentation from ' + payload.url + ', return everything in an MD format.',
            model: 'gpt-4-1106-preview',
            tools: [{ type: "code_interpreter" }, {type: 'retrieval'}],
            file_ids: [file.id],
        });
    });

    // We update our internal database with the assistant
    await io.runTask("save-assistant", async () => {
        await prisma.assistant.upsert({
            where: {
                url: payload.url
            },
            update: {
                aId: assistant.id,
            },
            create: {
                aId: assistant.id,
                url: payload.url,
            }
        });
    });
  },
});

export function getElementsBetween(startElement: Element, endElement: Element) {
    let currentElement = startElement;
    const elements = [];

    // Traverse the DOM until the endElement is reached
    while (currentElement && currentElement !== endElement) {
        currentElement = currentElement.nextElementSibling!;

        // If there's no next sibling, go up a level and continue
        if (!currentElement) {
            // @ts-ignore
            currentElement = startElement.parentNode!;
            startElement = currentElement;
            if (currentElement === endElement) break;
            continue;
        }

        // Add the current element to the list
        if (currentElement && currentElement !== endElement) {
            elements.push(currentElement);
        }
    }

    return elements;
}

// This job will grab the content from the website
const processContent = client.defineJob({
  // This is the unique identifier for your Job; it must be unique across all Jobs in your project.
  id: "process-content",
  name: "Process Content",
  version: "0.0.1",
  // This is triggered by an event using eventTrigger. You can also trigger Jobs with webhooks, on schedules, and more: https://trigger.dev/docs/documentation/concepts/triggers/introduction
  trigger: eventTrigger({
    name: "process.content.event",
    schema: object({
      url: string(),
      identifier: string(),
    })
  }),
  run: async (payload, io, ctx) => {
    return io.runTask('grab-content', async () => {
        try {
            // We first grab a raw HTML of the content from the website
            const data = await (await fetch(payload.url)).text();

            // We load it with JSDOM so we can manipulate it
            const dom = new JSDOM(data);

            // We remove all the scripts and styles from the page
            dom.window.document.querySelectorAll('script, style').forEach((el) => el.remove());

            // We grab all the titles from the page
            const content = Array.from(dom.window.document.querySelectorAll('h1, h2, h3, h4, h5, h6'));

            // We grab the last element so we can get the content between the last element and the next element
            const lastElement = content[content.length - 1]?.parentElement?.nextElementSibling!;
            const elements = [];

            // We loop through all the elements and grab the content between each title
            for (let i = 0; i < content.length; i++) {
                const element = content[i];
                const nextElement = content?.[i + 1] || lastElement;
                const elementsBetween = getElementsBetween(element, nextElement);
                elements.push({
                    title: element.textContent, content: elementsBetween.map((el) => el.textContent).join('\n')
                });
            }

            // We create a raw text format of all the content
            const page = `
            ----------------------------------
            url: ${payload.url}\n
            ${elements.map((el) => `${el.title}\n${el.content}`).join('\n')}

            ----------------------------------
            `;

            // We save it to our database
            await prisma.docs.upsert({
                where: {
                    url: payload.url
                }, update: {
                    content: page, identifier: payload.identifier
                }, create: {
                    url: payload.url, content: page, identifier: payload.identifier
                }
            });
        }
        catch (e) {
            console.log(e);
        }
    });
  },
});
Enter fullscreen mode Exit fullscreen mode

We have finished creating the background job to scrape and index the files 🎉

Question the assistant

Now, let’s create the job to question our assistant.

Go to jobs and create a new file, question.assistant.ts. Add the following code:

import {eventTrigger} from "@trigger.dev/sdk";
import {client} from "@openai-assistant/trigger";
import {object, string} from "zod";
import {openai} from "@openai-assistant/helper/open.ai";

client.defineJob({
    // This is the unique identifier for your Job; it must be unique across all Jobs in your project.
    id: "question-assistant",
    name: "Question Assistant",
    version: "0.0.1", // This is triggered by an event using eventTrigger. You can also trigger Jobs with webhooks, on schedules, and more: https://trigger.dev/docs/documentation/concepts/triggers/introduction
    trigger: eventTrigger({
        name: "question.assistant.event", schema: object({
            content: string(),
            aId: string(),
            threadId: string().optional(),
        })
    }), integrations: {
        openai
    }, run: async (payload, io, ctx) => {
        // Create or use an existing thread
        const thread = payload.threadId ? await io.openai.beta.threads.retrieve('get-thread', payload.threadId) : await io.openai.beta.threads.create('create-thread');

       // Create a message in the thread
       await io.openai.beta.threads.messages.create('create-message', thread.id, {
            content: payload.content,
            role: 'user',
        });

       // Run the thread
        const run = await io.openai.beta.threads.runs.createAndWaitForCompletion('run-thread', thread.id, {
            model: 'gpt-4-1106-preview',
            assistant_id: payload.aId,
        });

        // Check the status of the thread
        if (run.status !== "completed") {
            console.log('not completed');
            throw new Error(`Run finished with status ${run.status}: ${JSON.stringify(run.last_error)}`);
        }

        // Get the messages from the thread
        const messages = await io.openai.beta.threads.messages.list("list-messages", run.thread_id, {
            query: {
                limit: "1"
            }
        });

        const content = messages[0].content[0];
        if (content.type === 'text') {
            return {content: content.text.value, threadId: thread.id};
        }
    }
});
Enter fullscreen mode Exit fullscreen mode
  • The event takes three parameters
    • content - the message we want to send to our assistant.
    • aId - the internal ID of the assistant we previously created.
    • threadId - The thread id of the conversation. As you can see, this is an optional parameter because, on the first message, we will not have a thread ID yet.
  • Then, we create or get the thread the previous thread.
  • We add a new message to the thread of the question we ask the assistant.
  • We run the thread and wait for it to finish.
  • We get the list of messages (and limit it to 1) as the first message is the last one in the conversation.
  • We return the message content and the thread ID we just created.

Add routing

We need to create 3 API routes for our application:

  1. Send a new assistant for processing.
  2. Get a specific assistant by URL.
  3. Add a new message to an assistant.

Create a new folder inside of app/api called assistant, and inside, create a new file called route.ts. Add the following code inside:

import {client} from "@openai-assistant/trigger";
import {prisma} from "@openai-assistant/helper/prisma.client";

export async function POST(request: Request) {
    const body = await request.json();
    if (!body.url) {
        return new Response(JSON.stringify({error: 'URL is required'}), {status: 400});
    }

    // We send an event to the trigger to process the documentation
    const {id: eventId} = await client.sendEvent({
        name: "process.documentation.event",
        payload: {url: body.url},
    });

    return new Response(JSON.stringify({eventId}), {status: 200});
}

export async function GET(request: Request) {
    const url = new URL(request.url).searchParams.get('url');
    if (!url) {
        return new Response(JSON.stringify({error: 'URL is required'}), {status: 400});
    }

    const assistant = await prisma.assistant.findFirst({
        where: {
            url: url
        }
    });

    return new Response(JSON.stringify(assistant), {status: 200});
}
Enter fullscreen mode Exit fullscreen mode

The first POST method gets a URL and triggers the process.documentation.event job with a URL sent from the client.

The second GET method gets the assistant from our database, from the URL sent from the client.

Now, let’s create the route to add a message to our assistant.
Inside of app/api create a new folder message and add a new file called route.ts, then add the following code:

import {prisma} from "@openai-assistant/helper/prisma.client";
import {client} from "@openai-assistant/trigger";

export async function POST(request: Request) {
  const body = await request.json();

  // Check that we have the assistant id and the message
  if (!body.id || !body.message) {
      return new Response(JSON.stringify({error: 'Id and Message are required'}), {status: 400});
  }

  // get the assistant id in OpenAI from the id in the database
  const assistant = await prisma.assistant.findUnique({
      where: {
          id: +body.id
      }
  });

  // We send an event to the trigger to process the documentation
  const {id: eventId} = await client.sendEvent({
      name: "question.assistant.event",
      payload: {
          content: body.message,
          aId: assistant?.aId,
          threadId: body.threadId
      },
  });

  return new Response(JSON.stringify({eventId}), {status: 200});
}
Enter fullscreen mode Exit fullscreen mode

That’s a very basic code. We get the message, assistant id, and thread id from the client and send it to our previously created question.assistant.event.

The last thing to do is create a function to get all our assistants.

Inside of helpers create a new function called get.list.ts and add the following code:

import {prisma} from "@openai-assistant/helper/prisma.client";

// Get the list of all the available assistants
export const getList = () => {
    return prisma.assistant.findMany({
    });
}
Enter fullscreen mode Exit fullscreen mode

Very simple code to get all the assistants.

We have finished with the backend 🥳

Let’s move to the front.


Frontend

Creating the Frontend

We are going to create a basic interface to add URLs and show the list of the added URLs:

ss1

The main page

Replace the content of app/page.tsx with the following code:

import {getList} from "@openai-assistant/helper/get.list";
import Main from "@openai-assistant/components/main";

export default async function Home() {
  const list = await getList();
  return (
     <Main list={list} />
  )
}
Enter fullscreen mode Exit fullscreen mode

That’s a straightforward code that grabs the list from the database and passes it to our Main component.

Next, let’s create the Main component.

Inside app create a new folder components and add a new file called main.tsx. Add the following code:

"use client";

import {Assistant} from '@prisma/client';
import {useCallback, useState} from "react";
import {FieldValues, SubmitHandler, useForm} from "react-hook-form";
import {ChatgptComponent} from "@openai-assistant/components/chatgpt.component";
import {AssistantList} from "@openai-assistant/components/assistant.list";
import {TriggerProvider} from "@trigger.dev/react";

export interface ExtendedAssistant extends Assistant {
    pending?: boolean;
    eventId?: string;
}
export default function Main({list}: {list: ExtendedAssistant[]}) {
    const [assistantState, setAssistantState] = useState(list);
    const {register, handleSubmit} = useForm();

    const submit: SubmitHandler<FieldValues> = useCallback(async (data) => {
        const assistantResponse = await (await fetch('/api/assistant', {
            body: JSON.stringify({url: data.url}),
            method: 'POST',
            headers: {
                'Content-Type': 'application/json'
            }
        })).json();

        setAssistantState([...assistantState, {...assistantResponse, url:  data.url, pending: true}]);
    }, [assistantState])

    const changeStatus = useCallback((val: ExtendedAssistant) => async () => {
        const assistantResponse = await (await fetch(`/api/assistant?url=${val.url}`, {
            method: 'GET',
            headers: {
                'Content-Type': 'application/json'
            }
        })).json();
        setAssistantState([...assistantState.filter((v) => v.id), assistantResponse]);
    }, [assistantState])

    return (
        <TriggerProvider publicApiKey={process.env.NEXT_PUBLIC_TRIGGER_PUBLIC_API_KEY!}>
            <div className="w-full max-w-2xl mx-auto p-6 flex flex-col gap-4">
                <form className="flex items-center space-x-4" onSubmit={handleSubmit(submit)}>
                    <input className="flex-grow p-3 border border-black/20 rounded-xl" placeholder="Add documentation link" type="text" {...register('url', {required: 'true'})} />
                    <button className="flex-shrink p-3 border border-black/20 rounded-xl" type="submit">
                        Add
                    </button>
                </form>
                <div className="divide-y-2 divide-gray-300 flex gap-2 flex-wrap">
                    {assistantState.map(val => (
                        <AssistantList key={val.url} val={val} onFinish={changeStatus(val)} />
                    ))}
                </div>
                {assistantState.filter(f => !f.pending).length > 0 && <ChatgptComponent list={assistantState} />}
            </div>
        </TriggerProvider>
    )
}
Enter fullscreen mode Exit fullscreen mode

Let’s see what’s going on here:

  • We created a new interface that’s called ExtendedAssistant with two parameters pending and eventId. When we create a new assistant, we don’t have the final value, we will store only the eventId and listen to the job processing until finished.
  • We get the list from the server component and set it to our new state (so we can modify it later)
  • We added a TriggerProvider to help us listen for event completion and update it with data.
  • We use react-hook-form to create a new form for adding new assistants.
  • We added a form with one input URL to submit new assistants for processing.
  • We iterate and show all the assistants that exist.
  • On form submissions, we send the information to the previously created route to add the new assistant.
  • Once the event is completed, we trigger changeStatus to load the assistant from the database.
  • In the end, we have the ChatGPT component, only to be displayed if we don’t have assistants waiting to be processed (!f.pending)

Let’s create our AssistantList component.

inside components, create a new file assistant.list.tsx and add the following content there:

"use client";

import {FC, useEffect} from "react";
import {ExtendedAssistant} from "@openai-assistant/components/main";
import {useEventRunDetails} from "@trigger.dev/react";

export const Loading: FC<{eventId: string, onFinish: () => void}> = (props) => {
    const {eventId} = props;
    const { data, error } = useEventRunDetails(eventId);

    useEffect(() => {
        if (!data || error) {
            return ;
        }

        if (data.status === 'SUCCESS') {
            props.onFinish();
        }
    }, [data]);

    return <div className="pointer bg-yellow-300 border-yellow-500 p-1 px-3 text-yellow-950 border rounded-2xl">Loading</div>
};

export const AssistantList: FC<{val: ExtendedAssistant, onFinish: () => void}> = (props) => {
    const {val, onFinish} = props;
    if (val.pending) {
        return <Loading eventId={val.eventId!} onFinish={onFinish} />
    }

    return (
        <div key={val.url} className="pointer relative bg-green-300 border-green-500 p-1 px-3 text-green-950 border rounded-2xl hover:bg-red-300 hover:border-red-500 hover:text-red-950 before:content-[attr(data-content)]" data-content={val.url} />
    )
}
Enter fullscreen mode Exit fullscreen mode

We iterate over all the assistants we created. If the assistants have already been created, we just display the name. If not, we render the <Loading /> component.

The loading component shows a Loading on the screen and long-polling the server until the event is finished.

We used the useEventRunDetails function created by Trigger.dev to know when the event is finished.

Once the event is finished, it triggers the onFinish function to update our client with the newly created assistant.

Chat interface

Chat Interface

Now, let’s add the ChatGPT component and question our assistant!

  • Select the assistant we would like to use
  • Show the list of messages
  • Add input for the message we want to send and the submit button.

Inside of components add a new file called chatgpt.component.tsx

Let’s draw our ChatGPT chat box:

"use client";
import {FC, useCallback, useEffect, useRef, useState} from "react";
import {ExtendedAssistant} from "@openai-assistant/components/main";
import Markdown from 'react-markdown'
import {useEventRunDetails} from "@trigger.dev/react";

interface Messages {
    message?: string
    eventId?: string
}

export const ChatgptComponent = ({list}: {list: ExtendedAssistant[]}) => {
    const url = useRef<HTMLSelectElement>(null);
    const [message, setMessage] = useState('');
    const [messagesList, setMessagesList] = useState([] as Messages[]);
    const [threadId, setThreadId] = useState<string>('' as string);

    const submitForm = useCallback(async (e: any) => {
        e.preventDefault();
        setMessagesList((messages) => [...messages, {message: `**[ME]** ${message}`}]);
        setMessage('');

        const messageResponse = await (await fetch('/api/message', {
            method: 'POST',
            body: JSON.stringify({message, id: url.current?.value, threadId}),
        })).json();

        if (!threadId) {
            setThreadId(messageResponse.threadId);
        }

        setMessagesList((messages) => [...messages, {eventId: messageResponse.eventId}]);
    }, [message, messagesList, url, threadId]);

    return (
        <div className="border border-black/50 rounded-2xl flex flex-col">
            <div className="border-b border-b-black/50 h-[60px] gap-3 px-3 flex items-center">
                <div>Assistant:</div>
                <div>
                    <select ref={url} className="border border-black/20 rounded-xl p-2">
                        {list.filter(f => !f.pending).map(val => (
                            <option key={val.id} value={val.id}>{val.url}</option>
                        ))}
                    </select>
                </div>
            </div>
            <div className="flex-1 flex flex-col gap-3 py-3 w-full min-h-[500px] max-h-[1000px] overflow-y-auto overflow-x-hidden messages-list">
                {messagesList.map((val, index) => (
                    <div key={index} className={`flex border-b border-b-black/20 pb-3 px-3`}>
                        <div className="w-full">
                                {val.message ? <Markdown>{val.message}</Markdown> : <MessageComponent eventId={val.eventId!} onFinish={setThreadId} />}
                        </div>
                    </div>
                ))}
            </div>
            <form onSubmit={submitForm}>
                <div className="border-t border-t-black/50 h-[60px] gap-3 px-3 flex items-center">
                    <div className="flex-1">
                        <input value={message} onChange={(e) => setMessage(e.target.value)} className="read-only:opacity-20 outline-none border border-black/20 rounded-xl p-2 w-full" placeholder="Type your message here" />
                    </div>
                    <div>
                        <button className="border border-black/20 rounded-xl p-2 disabled:opacity-20" disabled={message.length < 3}>Send</button>
                    </div>
                </div>
            </form>
        </div>
    )
}

export const MessageComponent: FC<{eventId: string, onFinish: (threadId: string) => void}> = (props) => {
    const {eventId} = props;
    const { data, error } = useEventRunDetails(eventId);

    useEffect(() => {
        if (!data || error) {
            return ;
        }

        if (data.status === 'SUCCESS') {
            props.onFinish(data.output.threadId);
        }
    }, [data]);

    if (!data || error || data.status !== 'SUCCESS') {
        return (
            <div className="flex justify-end items-center pb-3 px-3">
                <div className="animate-spin rounded-full h-3 w-3 border-t-2 border-b-2 border-blue-500" />
            </div>

    }

    return <Markdown>{data.output.content}</Markdown>;
};
Enter fullscreen mode Exit fullscreen mode

A few exciting things are going on over here:

  • When we create a new message, we automatically render it on the screen as “our” message, but when we send it to the server, we need to push the event id, as we don’t have the message yet. That’s why we use {val.message ? <Markdown>{val.message}</Markdown> : <MessageComponent eventId={val.eventId!} onFinish={setThreadId} />}
  • We wrap our messages with a Markdown component. If you remember, we told ChatGPT in the previous steps to output everything in an MD format so we can render it correctly.
  • Once the event has finished processing, we update the thread id so that we will have the context of the same conversation from the following message.

And we are done 🎉


Done

Let's connect! 🔌

As an open-source developer, you can join our community to contribute and engage with maintainers. Don't hesitate to visit our GitHub repository to contribute and create issues related to Trigger.dev.

The source for this tutorial is available here:

https://github.com/triggerdotdev/blog/tree/main/openai-assistant

Thank you for reading!

Top comments (18)

Collapse
 
talboren profile image
Tal Borenstein

Wow! thank you for the in-depth explanation. Going to try it as well @nathan_tarbert

Collapse
 
safwaandk profile image
Sk Md Safwaan Uddin

Failed to compile
./src/app/page.tsx:1:0
Module not found: Can't resolve '@openai-assistant/helper/get.list'

1 | import {getList} from "@openai-assistant/helper/get.list";
2 | import Main from "@openai-assistant/components/main";
3 |
4 | export default async function Home() {

nextjs.org/docs/messages/module-no...
This error occurred during the build process and can only be dismissed by fixing the error.

Collapse
 
srbhr profile image
Saurabh Rai

I really like this attempt at making fine-tuning simpler.

Collapse
 
matijasos profile image
Matija Sosic

Assistants are definitely a very interesting concept. Thanks for the detailed tutorial!

Collapse
 
nathan_tarbert profile image
Nathan Tarbert

Great tutorial, I'm going to code this out. Thanks!

Collapse
 
nevodavid profile image
Nevo David

This is really a game-changer, I can finally drop all the vector databases :)

Collapse
 
srbhr profile image
Saurabh Rai

Ha ha 😂

Collapse
 
brainquest profile image
BrainQuest

Great writing, thanks for such a detailed post. The assistant is really a powerful tool.

Collapse
 
mfp22 profile image
Mike Pearson

Haven't read the article yet. Just wondering: Is this really training/fine-tuning it, or is it just providing it context to search like with a custom agent?

Collapse
 
nevodavid profile image
Nevo David

The assistant does not work with a context but indexes everything in their vector DB.
It's seamless to you :)

Collapse
 
guybuildingai profile image
Jeffrey Ip

Great article!

Collapse
 
biplobsd profile image
Biplob Sutradhar

Detailed tutorial 🔥🔥👍. Learned so much..
Thinking about openai.beta.assistants.create..

Collapse
 
marisogo profile image
Marine

I was actually thinking about how to do this; thanks for this tutorial!

Collapse
 
pranavabhat profile image
Pranava Bhat

This is great stuff

Collapse
 
fernandezbaptiste profile image
Bap

Wow - thanks for going into so much detail, definitely going to try it out.

Collapse
 
fredericg78 profile image
FGPROD

Very interesting, thanks a lot !

Collapse
 
shakilahmed007 profile image
Shakil Ahmed

Exciting opportunity! Eager to explore how training ChatGPT on documentation can enhance knowledge transfer and streamline communication. Count me in for the journey!