DEV Community

Cover image for Create a super fast AI assistant with Groq (Without a database)
hil for SerpApi

Posted on • Originally published at

Create a super fast AI assistant with Groq (Without a database)


Last week, I tried to build a voice AI assistant using OpenAI AI assistant. It takes a while to generate a response, which is not suitable for a voice assistant. So, I'm looking for an alternative to make my assistant faster. That's how I found out about Groq. This post will cover how I build an AI assistant using Groq.

Pros and Cons summary


Easy to implement with only one API (Groq API).

Respond is fast.


The longer we chat, the higher the chance that we might lose some context along the way.

What is Groq?

Groq is a service that provides a super fast engine to run AI applications. It's not an AI model! We can run different AI models like Llama, Mixtral, Gemma and more!

Ref: Why Groq?

How I build a fast AI assistant

Many AI models exist, but only OpenAI offers an easy way to implement a chat-like experience using the Assistants API. By default, these models won't know or understand the context of our previous chat. So, we have to re-explain everything if we want the AI to understand the context of each message.

There are some alternatives out there, such as using LangChain chat history. But I prefer to find a simple way (*with the caveat, of course). Luckily, I found some ideas on the internet (Thank you, Internet!).

The idea below can be implemented for any AI model/engine, not just Groq. You can try this with OpenAI itself, Mixtral, Claude, and so on.

chat flow illustration

Here is the flow:

  • The user sends the initial message
  • The AI responds to the message
  • We ask AI to summarize the conversation
  • We send the response and summary back to the user
  • The user will send the summary back later alongside the new message
  • AI now will reply based on the fresh message and with help of the conversation summary to provide some context.

The caveat of this method

By summarizing a conversation, we may lose some information along the way. That's why it's a good idea in certain cases to store the message history on a database (Vector database).

One way I can reduce this shortage is by attaching the recent reply from AI. I've also read an article that suggests keeping the latest 2-3 conversations and providing them as additional context later.

Code implementation

I'll use NodeJs for this tutorial. Feel free to use any language you want. The final code is available at GitHub:

GitHub -assistants-api-with-groq-ai

  1. Install dependencies
npm i express groq-sdk dotenv --save
Enter fullscreen mode Exit fullscreen mode
  • Express for creating a route for the endpoint
  • Groq-sdk is the official package for using Groq in Javascript
  • dotenv to store our API key safely.
  1. Add API Key

Create a new .env file. Add your Groq API key in this file like this:

Enter fullscreen mode Exit fullscreen mode

Make sure to sign up to Groq and get your API key here.

  1. Basic Setup

Let's create a new index.jsfile, and we'll write everything in this file. We prepare one endpoint called chat where we'll send these parameters:

- message: user's message

- latestReply: The latest reply from AI

- messageSummary: The conversation summary so far

In this endpoint, we'll do two things:

- Respond to new user message (with latestReply and messageSummary as context)

- Create a new conversation summary by providing the fresh reply from AI.

const express = require('express');

// Express Setup
const app = express();
const port = 3000

const { GROQ_API_KEY } = process.env;

// GROQ Setup
const Groq = require("groq-sdk");
const groq = new Groq({
    apiKey: GROQ_API_KEY

async function chatWithGroq() { } // soon
async function summarizeConversation() { } // soon'/chat', async (req, res) => {
    const { message, latestReply, messageSummary } = req.body;

    // request chat completion
    const reply = await chatWithGroq(message, latestReply, messageSummary)

    // request chat summary
    const summary = await summarizeConversation(message, reply, messageSummary)

    // Always return chat history/summary

app.listen(port, () => {
  console.log(`Example app listening on port ${port}`)
Enter fullscreen mode Exit fullscreen mode
  1. Chat with Groq method

Here is the chatWithGroq method implementation:

async function chatWithGroq(userMessage, latestReply, messageHistory) {
    let messages = [{
        role: "user",
        content: userMessage

    if(messageHistory != '') {
            role: "system",
            content: `Our conversation's summary so far: """${messageHistory}""". 
                     And this is the latest reply from you """${latestReply}"""`

    console.log('original message', messages)

    const chatCompletion = await{
        model: "llama3-8b-8192"

    const respond = chatCompletion.choices[0]?.message?.content || ""
    return respond
Enter fullscreen mode Exit fullscreen mode
  • We only provide a conversation summary when we have one (look at the if statement). So, it won't be included in our first message.
  1. Conversation summary method

Here is the summarizeConversation method implementation:

async function summarizeConversation(message, reply, messageSummary) {
    let content = `Summarize this conversation 
                    user: """${message}""",
                    you(AI): """${reply}"""

    // For N+1 message
    if(messageSummary != '') {
        content = `Summarize this conversation: """${messageSummary}"""
                    and last conversation: 
                    user: """${message}""",
                    you(AI): """${reply}"""

    const chatCompletion = await{
        messages: [
                role: "user",
                content: content
        model: "llama3-8b-8192"

    const summary = chatCompletion.choices[0]?.message?.content || ""
    console.log('summary: ', summary)
    return summary
Enter fullscreen mode Exit fullscreen mode

In this method, we ask the AI to create a summary based on the latest summary and recent reply.

Demo Time!

You can use any API client, like Postman, Thunder (VS Code), etc.

Don't forget to run your program with node index.js

Create a POST request for the /chat endpoint and provide message endpoint and provide the first message parameter.

initial message illustration

We can display the reply from the response on our user interface. This is the actual reply to our message.

We'll save the summary for the next request.

Now, this is how the JSON looks like for the N+1 message:

N+1 message parameters

The next messages should include the latestReply and messageSummary as parameters.

  • message: *Don't forget to add a new message. This is you talking to the AI. Notice that I use here on my question, to validate that the AI knows what's the previous context here.
  • latestReply: Send the latest reply from AI (from previous response)
  • messageSummary: Send the conversation summary so far (from previous response)

Here is the result to this request:

Summary conversation and reply example

As you can see, the AI knows that when I said here I was talking about Indonesia. You can try to send a follow-up message (create a new request) by asking something like "Can you tell me more about number 4?" as an example. But don't forget that we always need to update the latestReply and summaryConversation on each request.

To return the response and summarize the conversation, I only need to wait around 2s. This is much faster than using OpenAI AI assistants.


- Build a smart AI voice assistant

- Basic tutorial: Assistants API by OpenAI

Top comments (0)