DEV Community

Cover image for 🚀 🤖 Let's Retrieve Data and Talk: A Full-stack RAG App with Create-Llama and LlamaIndex.TS
Tim Pap
Tim Pap

Posted on

🚀 🤖 Let's Retrieve Data and Talk: A Full-stack RAG App with Create-Llama and LlamaIndex.TS

🦙 LlamaIndex is an open-source data framework that allows developers to build applications powered by large language models like GPT-3 using their own custom data.

🦾 The create-llama CLI tool makes it easy to scaffold a full LlamaIndex app with just one command. It sets up a Next.js frontend, choice of Express, Node or Python backends, data ingestion and indexing, and LLM integration out of the box so you can start building quickly. With LlamaIndex and create-llama, you can take advantage of powerful LLMs while keeping your data private and customizing the models to your specific domain. I'm excited to start experimenting with building conversational apps enhanced by LlamaIndex's data retrieval capabilities!

In this hands-on tutorial, we’ll use create-llama to scaffold a custom search assistant app. I’ll show you how I set up data ingestion pipelines, indexing, storage all using LlamaIndex’s TypeScript library.


The easiest way to leverage create-llama is by invoking its interactive mode:

npx create-llama@latest
Enter fullscreen mode Exit fullscreen mode

When create-llama runs, it will prompt you to name your project and set additional configuration:

✔ What is your project named? … llamaindex-ts-first-steps
✔ Which template would you like to use? › Chat with streaming
✔ Which framework would you like to use? › NextJS
✔ Which UI would you like to use? › Just HTML
✔ Which model would you like to use? › gpt-3.5-turbo
✔ Which data source would you like to use? › Use an example PDF
✔ Would you like to use a vector database? › No, just store the data in the file system
✔ Please provide your OpenAI API key (leave blank to skip): #### Add your OpenAI key here
✔ Would you like to use ESLint? … No / Yes
Enter fullscreen mode Exit fullscreen mode

Running the Example App

Navigate into the llamaindex-ts-first-steps directory:

cd llamaindex-ts-first-steps

Enter fullscreen mode Exit fullscreen mode

Next, open the project in your editor - for example using VS Code:

code .

Enter fullscreen mode Exit fullscreen mode

To provide custom data for LlamaIndex to ingest, first create a text file named my-file.txt within the data/ directory. Add whatever content you would like LlamaIndex to have access to - this can be any freeform text.

For example:

hey my name is Tim
the secret number is 12

Enter fullscreen mode Exit fullscreen mode

Then run:

npm run generate

Enter fullscreen mode Exit fullscreen mode

This preprocesses the documents and creates vector embeddings to enable semantic search.

With data indexed, start the application:

npm run dev

Enter fullscreen mode Exit fullscreen mode

You can now ask natural language questions about your documents!

Note: Re-run generate whenever new data is added to update the index.

This application utilizes Vercel's AI SDK for the text generation components.

Indexing Logic

The main indexing logic resides in generate.mjs. Let's examine how it works

const documents = await new SimpleDirectoryReader().loadData({
  directoryPath: STORAGE_DIR,

await VectorStoreIndex.fromDocuments(documents, {

Enter fullscreen mode Exit fullscreen mode

The SimpleDirectoryReader provides a straightforward method to ingest local files into LlamaIndex. While more robust Readers from LlamaHub may be better suited for production systems, the SimpleDirectoryReader offers a simple on-ramp to start loading data and experimenting with LlamaIndex.

A Document encapsulates a data source such as a PDF, API response, or database query result. Within LlamaIndex, data is divided into discrete Node objects representing atomic semantic units. For example, a node could contain a paragraph of text or table from a document.

Nodes maintain metadata linking them to their parent Document and any related Nodes. This connectivity between nodes and back to source documents creates a rich knowledge graph for targeted information retrieval.

Vector stores play a vital role in retrieval-augmented generation by efficiently indexing vector embeddings. You'll leverage vector stores, whether directly or behind the scenes, in most LlamaIndex applications.

A vector store ingests Node objects, analyzing the data to construct an optimized search index.

The most straightforward approach for indexing data utilizes the vector store's fromDocuments method. Simply pass in your documents and the vector store handles building the index

Indexes and Embeddings

After loading data, LlamaIndex facilitates indexing to optimize retrieval. Indexing transforms the raw content into vector embeddings - numeric representations of semantic meaning. These embeddings get stored in a vector database engine specialized for efficient similarity searches.

The index may also track extra metadata like relationships between nodes. This supplementary information bolsters the relevance of fetched content.

To locate relevant context for a query, LlamaIndex first converts the search terms into an embedding vector. It then identifies stored nodes with the closest matching embeddings to the query vector. This vector similarity search allows retrieving the most contextually related data points for any natural language query.

The chat functionality is enabled by createChatEngine defined in app/api/chat/engine/index.ts.

This exports the createChatEngine factory function which constructs a ChatEngine instance for conducting conversational flows.

export async function createChatEngine(llm: LLM) {
  const index = await getDataSource(llm);
  const retriever = index.asRetriever();
  retriever.similarityTopK = 5;

  return new ContextChatEngine({
    chatModel: llm,
Enter fullscreen mode Exit fullscreen mode

The ContextChatEngine enables conversational interactions powered by retrievals over indexed data.

Its chat loop works as follows for each user message:

  1. Retrieve the most relevant text passages from the index based on the message text.
  2. Incorporate those texts as context when formulating the system prompt.
  3. Generate a response to the user message using the context-aware prompt.
  4. Return the reply to the user.

This simplicity makes ContextChatEngine suitable for queries directly related to the knowledge base and basic conversational flows. By basing each response on contextual retrievals, it can directly leverage the indexed data.

The full source code for this application is available at:

Feel free to reference the repository to see the complete working example of initializing LlamaIndex, configuring indexing and retrieval, and enabling conversational interactions.

It demonstrates an end-to-end pipeline from ingesting custom data through to asking questions via a chat interface powered by vector similarity search.

Top comments (0)