RAG AI: Enhancing Customer Service with DeskDingo

#rag #llm

Customer service is all about efficiency and satisfaction. While traditional chatbots can handle basic inquiries, complex questions often require human intervention. Enter Retrieval-Augmented Generation (RAG) AI, a powerful technology that elevates chatbots to a whole new level.

In this post, I'll share how we set up a RAG AI chatbot for DeskDingo users to automate customer support 24/7.

Building the Knowledge Base

The foundation of a RAG chatbot is a robust knowledge base. Unlike traditional AI models that require chunked information, our approach leverages whole-article embeddings. Each article in a user's knowledge base undergoes a process that generates a vector of 1024 32-bit floating-point numbers. This vector is stored in a simple data store, indexed only by the article id and org id.
While vector databases offer robust scalability for handling millions of articles, they can be overkill for most use cases. Given the relatively small size of typical knowledge bases (often in the thousands of articles), a full scan can be performed in a matter of milliseconds. This eliminates the need for a dedicated index on the embeddings, significantly simplifying the architecture and reducing costs.

Retrieval for Informed Responses

The magic happens during each user interaction. Every message, both from the user and the chatbot, is utilized for retrieval. Here's the breakdown:

Message Embeddings: Each message is individually converted into its own vector representation.
Knowledge Base Search: The entire knowledge base (up to 10,000 articles in most cases) is queried.
Similarity Check: We compare the cosine similarity between the message embedding and all article embeddings. This metric tells us how "close" the message is to each article in terms of meaning.
Top Candidates: We accumulate articles with a similarity score above a pre-defined threshold (0.1 works well). This threshold ensures we retrieve relevant articles.
Prioritizing the Best: The retrieved articles are then sorted based on their similarity score, with the top 3 being prioritized.
Contextual Awareness: Next, we check if any of the top 3 articles have already been presented in the current conversation. This prevents redundant information overload.
Add the New Articles: Finally, we add the full article's text to the conversation as context in a system message.

The Power of Large Language Models (LLMs)

Once we have the top 3 most relevant articles added to the conversation context, it's time for the LLM to shine. DeskDingo allows seamless integration with LLMs, enabling the chatbot to generate informed, human-quality responses based on the retrieved knowledge and the overall conversation flow.

The Results

The RAG AI chatbot on DeskDingo has delivered impressive results, providing instant and accurate responses to user queries. The system's ability to retrieve relevant articles from the knowledge base in real-time has significantly enhanced customer satisfaction.

However, there are a few caveats when creating the knowledge base. It helps to keep articles in the knowledge base relatively short (about a thousand words works fine) and focused on a single topic. This approach facilitates efficient retrieval and prevents the chatbot from becoming overwhelmed with information that may not be directly relevant to the user's query.

Conclusion

Since enabling the AI bot on DeskDingo's website, we've witnessed a remarkable improvement in customer service. The combination of powerful retrieval with intelligent generation paves the way for a future where AI assistants can handle an even wider range of user inquiries, ultimately leading to a more positive customer experience.

Have you explored RAG AI for your customer service needs? Share your experiences in the comments below!

DEV Community

RAG AI: Enhancing Customer Service with DeskDingo

Building the Knowledge Base

Retrieval for Informed Responses

The Power of Large Language Models (LLMs)

The Results

Conclusion

Top comments (0)

Read next

Understanding SafeTensors: A Secure Alternative to Pickle for ML Models

Accelerate 1-bit LLM Inference with BitNet on WSL2 (Ubuntu)

How do LLMs like GPT Generate Human-Like Text?

Time Waits for No Document: 5 ways to speed up your work