RAG Step-by-Step: Open Source Edition

In a previous post, I demonstrated how to implement RAG using the Load-Transform-Embed-Store workflow. I've updated the RAG Step-by-Step repository with an open-source example.

Why open source?

The most common reason for using open source versus a service is cost. This is especially true with generative AI. Services often require a credit card to access their API, and every request costs. Charges are typically small and fractional but add up when writing and debugging code. Another reason is keeping your data private. For example, you may be working with data restricted to a particular country. Open source is one way to meet legal requirements. Finally, open source requires hands-on work, so you get a much deeper understanding of what the software is doing than when your code calls an API.

What's the difference?

Building an open-source version of RAG Step-by-Step wasn't too different from building the SaSS version. Here's a list of changes.

Instead of using OpenAI GPT3.5 and Pincone, I used the localstack_ai which is a containerized environment with Ollama, llama2, and PostgreSQL with pgvector.
To create embeddings, sentence-transformers was used instead of OpenAI's text-embedding-3-small model.
PostgreSQL requires instantiating a database, adding the pgvector extension, and creating the table to hold the data and embeddings.
The Streamlit app defines several SQL queries with different simiarity metrics supported by pgvector.
The prompt uses the llama2 prompt instead of the OpenAI Chat prompt format.

What's stopping you?

Open-source foundation models are released literally every week. The barriers to running a model have decreased significantly, allowing you to play and learn about generative AI quickly and with minimal cost and effort. SaSS or open source, give RAG Step-by-Step a spin.