DEV Community

Cover image for Using Stripe Docs in your RAG pipeline with LlamaIndex
Anthony M. for Stripe

Posted on

Using Stripe Docs in your RAG pipeline with LlamaIndex

Generative AI applications become extremely powerful when you augment them with up-to-date, domain-specific, or private data. This technique is called Retrieval Augmented Generation (RAG).

In this post we’ll build a Python script that uses StripeDocs Reader, a loader on LlamaIndex, that creates vector embeddings of Stripe's documentation in Pinecone. This allows a user to ask questions about Stripe Docs to an LLM, in this case OpenAI, and receive a generated response.

These techniques are similar to what we use to power Stripe Docs AI today. Which, when on Stripe Docs and logged in to a Stripe account you can try for yourself.

Requirements

This project requires an account and API key from both OpenAI and Pinecone. OpenAI provides the LLM that will generate an output based on a question. Pinecone provides a vector database that will allow us to search for relevant Stripe documentation prior to injecting them into our LLM prompt.

Setup

For this project, we’ll be using Python. It is a very popular language for AI development.

1. Create an OpenAI account and get your API key.

2. Create a Pinecone account and get your API key.

3. Save your API keys in a .env file.

PINECONE_API_KEY=abc_12345
OPENAI_API_KEY=abc_12345
Enter fullscreen mode Exit fullscreen mode

4. Create a requirements.txt file with the following dependencies:

llama-hub==0.0.77
llama-index==0.9.40
pinecone-client==3.0.2
python-dotenv==1.0.1
Enter fullscreen mode Exit fullscreen mode

5. Install the dependencies.

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Building your vector store and index

In order to determine which Stripe Docs are relevant to a user’s question, we’ll first have to create a vector store and index in Pinecone. The entire Stripe documentation corpus will be used to create vector embeddings, which will be searched over when determining what documents relate to a user’s question.

Parsing all Stripe Docs and turning them into embeddings is no small feat. With the StripeDocsReader, you don’t worry about any of that.

The loader navigates to Stripe’s sitemap and recursively loads all of Stripe’s documentation, processes it, and makes it easy to create embeddings out of the documentation.

Note: building the Pinecone index takes quite a bit of time, so you might want to grab a coffee while you wait.

1. Create a build.py file and load the appropriate packages and environment variables.

import os
import pinecone from llama_index
import VectorStoreIndex, download_loader, StorageContext from llama_index.vector_stores
import PineconeVectorStore from dotenv 
import load_dotenv

load_dotenv()

PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Make sure the API keys have been loaded correctly
if not OPENAI_API_KEY:
    raise Exception("OPENAI_API_KEY environment not set")

if not PINECONE_API_KEY:
    raise Exception("PINECONE_API_KEY environment not set")

# Set the Open AI API key
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
Enter fullscreen mode Exit fullscreen mode

2. Set up the StripeDocsReader loader from LlamaIndex.

StripeDocsReader = download_loader("StripeDocsReader")
loader = StripeDocsReader()

# Iterate through all Stripe Docs using StripeDocsReader 
documents = loader.load_data()
Enter fullscreen mode Exit fullscreen mode

You can pass the filters parameter to StripeDocsReader to narrow the corpus to a subset of Stripe Docs. See the StripeDocsReader documentation for an example.

3. Initialize a Pinecone index that will be used to search for relevant Stripe documentation.

# Initialize Pinecone index
# https://docs.llamaindex.ai/en/stable/examples/vector_stores/PineconeIndexDemo.html
pc = Pinecone(api_key=PINECONE_API_KEY)
if "stripe-docs" not in pc.list_indexes().names():
    pc.create_index(
        name="stripe-docs",
        dimension=1536,
        metric='euclidean',
        spec=ServerlessSpec(
            cloud='aws',
            region='us-west-2'
        )
    )

pinecone_index = pc.Index("stripe-docs")
Enter fullscreen mode Exit fullscreen mode

4. Create a vector store with your Pinecone index and add Stripe Docs to it.

# Create the vector store and index
# https://docs.llamaindex.ai/en/stable/understanding/indexing/indexing.html
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, show_progress=True, storage_context=storage_context)
Enter fullscreen mode Exit fullscreen mode

5. Run your build script:

python build.py
Enter fullscreen mode Exit fullscreen mode

Using RAG with Stripe documentation

Now that you’ve set up your Pinecone vector store and created an index out of Stripe’s documentation, it’s time to query the index. When using LlamaIndex to answer a user’s question, the library will automatically retrieve relevant Stripe Docs, post-process them, and send them to ChatGPT along with your prompt.

1. Create a query.py file and load the appropriate packages and environment variables,

import os
import pinecone
from llama_index import VectorStoreIndex, download_loader, StorageContext
from llama_index.vector_stores import PineconeVectorStore
from dotenv import load_dotenv

load_dotenv()

PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Make sure the API keys have been loaded correctly
if not OPENAI_API_KEY:
    raise Exception("OPENAI_API_KEY environment not set")

if not PINECONE_API_KEY:
    raise Exception("PINECONE_API_KEY environment not set")

# Set the Open API key
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
Enter fullscreen mode Exit fullscreen mode

2. Initialize the existing Pinecone index.

# Initialize Pinecone index
pc = Pinecone(api_key=PINECONE_API_KEY)
pinecone_index = pc.Index("stripe-docs")
Enter fullscreen mode Exit fullscreen mode

3. Initialize a query engine from the Pinecone vector store.

# Create the vector store and index
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_vector_store(vector_store, show_progress=True, storage_context=storage_context)

# Create the query engine
query_engine = index.as_query_engine(response_mode="refine")
Enter fullscreen mode Exit fullscreen mode

We use response_mode=”refine” to create more detailed responses.

4. Prompt for user input and query the query engine.

print("How can I help you today?")
q = input()

# Querying has the following steps:
# 1. Retrieve documents from the index
# 2. Post-process the documents
# 3. Send the prompt + documents to an LLM
# https://docs.llamaindex.ai/en/stable/understanding/querying/querying.html
res = query_engine.query(f"""
You are a world-class expert at Stripe integrations.

Your job is to provide detailed answers to help Stripe users integrate their products with Stripe.

I will provide you with relevant Stripe documentation. You will provide detailed answers to the questions asked.

NEVER tell users to read the documentation or contact Stripe support. Always provide the answer directly.

Use citations when possible.

Use real code examples when applicable.

You have been asked the following question: {q}""")

print(f"""\n{res}""")
Enter fullscreen mode Exit fullscreen mode

5. Run your query script to try asking ChatGPT a question about Stripe, augmented by the Stripe Docs:

python query.py
Enter fullscreen mode Exit fullscreen mode

Conclusion

In this post, we’ve explored how to augment generative AI applications using the Stripe documentation. Although RAG is a complex process, LlamaIndex and the StripeDocsReader loader make it easy to get started. Using the default settings for document chunking, retrieving, and querying, we can create something that resembles Stripe Docs AI. The quality of responses can be greatly improved by tweaking the knobs provided by LlamaIndex.

Take a look at the code on GitHub.

Top comments (0)