Using vector embeddings for context
ChatGPT is a beautiful resource for growth and learning when it is educated about the technology you are using.
Initially, I wanted ChatGPT to help me create Hilla applications. However, since it was trained on data up to 2021, it gave manufactured replies that did not correspond to reality.
Another area I could have improved was that Hilla supports React and Lit for the front end. I needed to ensure that the replies took the relevant framework into account as context.
Here's my approach to building an assistant that uses the most recent documentation to provide relevant replies to ChatGPT.
Crucial Principle: Embeddings
ChatGPT, like other big language models, has a limited context size that must fit your question, relevant background facts, and the response. For example, get-3.5-turbo
has a token maximum of 4096, roughly equivalent to 3000 words. Combining the most helpful documentation pieces within the prompt is critical to elicit meaningful replies.
Embeddings are a practical approach for identifying these critical documentation sections. Embeddings are a technique for encoding the meaning of a text into a vector representing a location in a multidimensional space. Texts with similar meanings are placed close together, while those with distinct meanings are further apart.
The idea is similar to that of a color picker. A three-element vector containing red, green, and blue values can be used to represent each color. Colors with comparable values have similar values, whereas colors with different values have distinct values.
It's enough to know that an OpenAI API converts text into embeddings for this article. If you want to understand how embeddings function, this article is a beautiful place to start.
Once you've generated embeddings for your content, you can quickly discover the most relevant bits to include in the prompt by locating the parts that are most closely related to the query.
Overview: Supplying Documentation as Context for ChatGPT
The following are the high-level procedures required to have ChatGPT use your documentation as context while answering questions:
Creating Embeddings for Your Documentation
- Divide your documentation into smaller chunks, such as by heading, then generate an embedding (vector) for each one.
- Save the embedding, source text, and other metadata in a vector database.
Providing Responses with Documentation as Context
Make an embedding for the user query.
Using the embedding, search the vector database for the N portions of the documentation most relevant to the inquiry.
Create a prompt telling ChatGPT only to use the available documentation to answer the given query.
To produce a completion for the prompt, use the OpenAI API.
In the parts that follow, I will go into further detail about how I carried out these processes.
Tools for used
Source code
I'll simply mention the most important parts of the code below. You can locate the source code on GitHube
Documentation Processing
The Hilla documentation is written in Asciidoc. The following are the processes required to convert them into embeddings:
Asciidoctor should be used to process the Asciidoc files to include code snippets and other inclusions.
Based on the HTML document structure, divide the generated document into sections.
To save tokens, convert the material to plain text.
If necessary, divide parts into smaller pieces.
Make embedding vectors for each chunk of text.
Pinecone should be used to save the embedding vectors and the source text.
Processing of ASCIIdoc
async function processAdoc(file, path) {
console.log(`Processing ${path}`);
const frontMatterRegex = /^---[\s\S]+?---\n*/;
const namespace = path.includes('articles/react') ? 'react' : path.includes('articles/lit') ? 'lit' : '';
if (!namespace) return;
// Remove front matter. The JS version of asciidoctor doesn't support removing it.
const noFrontMatter = file.replace(frontMatterRegex, '');
// Run through asciidoctor to get includes
const html = asciidoctor.convert(noFrontMatter, {
attributes: {
root: process.env.DOCS_ROOT,
articles: process.env.DOCS_ARTICLES,
react: namespace === 'react',
lit: namespace === 'lit'
},
safe: 'unsafe',
base_dir: process.env.DOCS_ARTICLES
});
// Extract sections
const dom = new JSDOM(html);
const sections = dom.window.document.querySelectorAll('.sect1');
// Convert section html to plain text to save on tokens
const plainText = Array.from(sections).map(section => convert(section.innerHTML));
// Split section content further if needed, filter out short blocks
const docs = await splitter.createDocuments(plainText);
const blocks = docs.map(doc => doc.pageContent)
.filter(block => block.length > 200);
await createAndSaveEmbeddings(blocks, path, namespace);
}
Create embeddings and save them
async function createAndSaveEmbeddings(blocks, path, namespace) {
// OpenAI suggests removing newlines for better performance when creating embeddings.
// Don't remove them from the source.
const withoutNewlines = blocks.map(block => block.replace(/\n/g, ' '));
const embeddings = await getEmbeddings(withoutNewlines);
const vectors = embeddings.map((embedding, i) => ({
id: nanoid(),
values: embedding,
metadata: {
path: path,
text: blocks[i]
}
}));
await pinecone.upsert({
upsertRequest: {
vectors,
namespace
}
});
}
Get embeddings from OpenAI
export async function getEmbeddings(texts) {
const response = await openai.createEmbedding({
model: 'text-embedding-ada-002',
input: texts
});
return response.data.data.map((item) => item.embedding);
}
Searching with context
So far, we've divided the documentation into manageable chunks and put it in a vector database. When a user asks a question, we must do the following:
Create an embedding depending on the query asked.
Search the vector database for the ten most relevant documentation sections.
Create a question with as many documentation sections packed into 1536 tokens, leaving 2560 for the response.
async function getMessagesWithContext(messages: ChatCompletionRequestMessage[], frontend: string) {
// Ensure that there are only messages from the user and assistant, trim input
const historyMessages = sanitizeMessages(messages);
// Send all messages to OpenAI for moderation.
// Throws exception if flagged -> should be handled properly in a real app.
await moderate(historyMessages);
// Extract the last user message to get the question
const [userMessage] = historyMessages.filter(({role}) => role === ChatCompletionRequestMessageRoleEnum.User).slice(-1)
// Create an embedding for the user's question
const embedding = await createEmbedding(userMessage.content);
// Find the most similar documents to the user's question
const docSections = await findSimilarDocuments(embedding, 10, frontend);
// Get at most 1536 tokens of documentation as context
const contextString = await getContextString(docSections, 1536);
// The messages that set up the context for the question
const initMessages: ChatCompletionRequestMessage[] = [
{
role: ChatCompletionRequestMessageRoleEnum.System,
content: codeBlock`
${oneLine`
You are Hilla AI. You love to help developers!
Answer the user's question given the following
information from the Hilla documentation.
`}
`
},
{
role: ChatCompletionRequestMessageRoleEnum.User,
content: codeBlock`
Here is the Hilla documentation:
"""
${contextString}
"""
`
},
{
role: ChatCompletionRequestMessageRoleEnum.User,
content: codeBlock`
${oneLine`
Answer all future questions using only the above
documentation and your knowledge of the
${frontend === 'react' ? 'React' : 'Lit'} library
`}
${oneLine`
You must also follow the below rules when answering:
`}
${oneLine`
- Do not make up answers that are not provided
in the documentation
`}
${oneLine`
- If you are unsure and the answer is not explicitly
written in the documentation context, say
"Sorry, I don't know how to help with that"
`}
${oneLine`
- Prefer splitting your response into
multiple paragraphs
`}
${oneLine`
- Output as markdown
`}
${oneLine`
- Always include code snippets if available
`}
`
}
];
// Cap the messages to fit the max token count, removing earlier messages if necessary
return capMessages(
initMessages,
historyMessages
);
}
When a user asks a question, we utilize getMessagesWithContext()
to retrieve the messages that must be sent to ChatGPT. The OpenAI API is then used to obtain the complete and feed the response to the client.
export default async function handler(req: NextRequest) {
// All the non-system messages up until now along with
// the framework we should use for the context.
const {messages, frontend} = (await req.json()) as {
messages: ChatCompletionRequestMessage[],
frontend: string
};
const completionMessages = await getMessagesWithContext(messages, frontend);
const stream = await streamChatCompletion(completionMessages, MAX_RESPONSE_TOKENS);
return new Response(stream);
}
Source code
Thank you for sticking with me till the end. Youβre a fantastic reader!
Ahsan Mangal
I hope you found it informative and engaging. If you enjoyed this content, please consider following me for more articles like this in the future. Stay curious and keep learning!
Top comments (0)