DEV Community

Cover image for Basics of RAG with pgVector and Langchain
Femi-ige Muyiwa for Hackmamba

Posted on

Basics of RAG with pgVector and Langchain

Retrieval augmented generation (RAG) is a technique that enhances the accuracy and reliability of generative AI models by augmenting their knowledge base with facts from external sources. RAG enables large language models (LLMs) to craft accurate, assertive, and brilliant responses on a subject matter.

In this article, we’ll demonstrate how to use the RAG technique in a modern application. To do so, we’ll create a Flutter application using Langchain for the LLM framework and pgVector, an open-source Postgres extension for vector similarity search.

Before beginning, you’ll need a few things:

Demystifying some concepts

With the aid of databases, especially those that support vector capabilities like Neon, we can use the RAG technique to assist LLMs in delivering accurate answers to an end user. Neon is a fully managed serverless Postgres that provides separate storage and computing to offer autoscaling, branching, and bottomless storage. Neon is fully open source under the Apache 2.0 licenses, and we can find the neondatabase on GitHub.

Let’s first demystify some concepts, starting with pgVector. pgVector is a Postgres extension that works with vector embeddings for storage, similarity search, and more. Enabling the pgVector extension in your Neon database simplifies storing vector embeddings as well as easy querying using the inner product (<#>) or cosine distance (<=>).

Langchain itself is not an LLM but a framework that aids application development with LLMs. Thus, it enables context-aware applications that need language models to reason.

That raises a burning question: How do these parts relate to one another?

RAG applications usually consist of two components: indexing and retrieval.

The indexing process involves integrating (loading) the external data source, splitting it into smaller pieces, embedding the document as a vector, and storing it.

Langchain handles splitting and embedding by providing the application access to OpenAI’s embedding API. Neon comes into play in the storage process.

For the retrieval process, pgVector uses its vector similarity index capability to search the distance between the query vector and the stored vector in the Neon database. Then Langchain uses OpenAI as an LLM to generate the desired result from the query in natural language.

The following sections will cover all the steps in building our application, from creating a Neon database to building the Flutter application. Let us set up a Neon account and create our database without further ado.

Creating a Neon database

After creating a Neon account, as specified earlier, let’s proceed to sign in to the account by selecting one of the methods provided for user authentication.

Neon sign-in

After successful sign-in, we’ll be redirected to a Create Project screen on the home page, where we are asked to fill in our desired project name, postgres version, and database name. We can explore more options for changing the branch name to any other name, but let’s leave it as main for now and click Create project.

Create project

More options

Afterward, we are redirected to the home page, where we get a popup showing the connection details to the Neon project we created earlier. We need these details to access the Neon project from our application and copy it to a safe file. And with that, we have successfully created a Neon database for our Flutter application.

Connection details

Neon provides three database management methods: the Neon CLI(command line interface), the Neon API, and SQL. With SQL, Neon made an SQL editor available to run SQL commands directly on the console. Thus, we will use SQL to manage our Neon database, but we‘ll do so via a Postgres connection from our application to the Neon database.

The Flutter application is a simple chatbot that responds to queries based on the data from the external data source—in this case, a PDF file. Therefore, in the coming sections, we will clone a Flutter template, connect the template to the Neon database, and add the functionalities to implement the RAG technique within the app.

Creating the Flutter application

To begin, we will use a Flutter template application containing a display area, a text area where we will type our query, and a drawer with a button to upload our desired PDF.

To clone the project, run the command below in a terminal:

git clone
Enter fullscreen mode Exit fullscreen mode

After cloning the project, run the following command:

flutter pub get
Enter fullscreen mode Exit fullscreen mode

This command obtains all the dependencies listed in the pubspec.yaml file in the current working directory and their transitive dependencies.

This project uses the Model View Controller (MVC) architecture to handle specific development aspects of the application. The architecture helps us maintain readability by separating the business (core) logic from the UI (presentation layer).

Template result

Template result drawer

To make things easier to locate, here’s an ASCII representation of the lib folder structure:

├─ home/
  ├─ controller/
  ├─ model/
  ├─ view/
    ├─ widgets/
      ├─ display_area.dart
      ├─ text_area.dart
    ├─ home_page.dart
  ├─ view_model/
├─ core/
    ├─ dependency_injection/
├─ main.dart
Enter fullscreen mode Exit fullscreen mode

Since we are using the MVC architecture, the UI code is placed in the lib/home/view folder. To proceed, we need to add some external dependencies necessary for building the application to the pubspec.yaml file.

Enter fullscreen mode Exit fullscreen mode

After successfully doing this, we’ll create an abstraction for all the services needed throughout this project. Let’s call this abstract class LangchainService — within it, we will implement the processes involved in implementing the RAG technique. So, next, locate the lib/home/view_model folder and create a dart file langchain_service.dart within it. To perform an abstraction, add the code below to the file:

abstract class LangchainService {
  // do something
Enter fullscreen mode Exit fullscreen mode


The load process involves integrating the document into the system, which is usually offline. Thus, to achieve this, we will do the following:

  • Use the file_picker package to select the files from a local device
  • Use the syncfusion_flutter_pdf package to read the document (PDF) and convert it to text
  • Use the path_provider package to find commonly used file ecosystems such as the temp or AppData directories

Compared to the other services, the load process is offline; thus, we will perform this operation separately from the other processes. To load a file, create an index_notifier.dart in the lib/home/controller directory. Next, we create a ChangeNotifier class, IndexNotifier, with a final value of LangchainService. Also, we will create two global private String variables, _filepath and _fileName, and a getter for the _fileName variable.

class IndexNotifier extends ChangeNotifier {
  late LangchainService langchainService;
  IndexNotifier({required this.langchainService});

  String? _filepath;
  String? _fileName;
  String? get fileName => _fileName;
Enter fullscreen mode Exit fullscreen mode

In essence, and by the ChangeNotifier, this class will be one of two files that handle the state management load of the application. Next, we will implement a function that returns a type Document from the Langchain package. We will use the method to pick a PDF document from our local device and assign the file type and name to the String variables created earlier.

Also, we will have a Future function that converts PDFs to text, which is loaded as Documents using the TextLoader class from Langchain.

class IndexNotifier extends ChangeNotifier {
  // do something

  Future<Document> _pickedFile() async {
    FilePickerResult? result = await FilePicker.platform
        .pickFiles(type: FileType.custom, allowedExtensions: ['pdf']);
    if (result != null) {
      _filepath = result.files.single.path;
      _fileName ='.pdf', '').toLowerCase();
      final textfile =
          _filepath!.isNotEmpty ? await _readPDFandConvertToText() : "";
      final loader = TextLoader(textfile);
      final document = await loader.load();
      Document? docs;
      for (var doc in document) {
        docs = doc;
      return docs!;
    } else {
      throw Exception("No file selected");

  Future<String> _readPDFandConvertToText() async {
    File file = File(_filepath!);
    List<int> bytes = await file.readAsBytes();
    final document = PdfDocument(inputBytes: Uint8List.fromList(bytes));
    String text = PdfTextExtractor(document).extractText();
    final localPath = await _localPath;
    File createFile = File('$localPath/output.txt');
    final res = await createFile.writeAsString(text);
    return res.path;

  Future<String> get _localPath async {
    final directory = await getApplicationDocumentsDirectory();
    return directory.path;
Enter fullscreen mode Exit fullscreen mode

We can load a PDF as a Langchain Document file with the code above.

Split and embed
Now, we need to split and embed the document and store it. To split and embed a Langchain document, we will return to the abstraction created in the langchain_service.dart. There, we will update it with the code below:

abstract class LangchainService {
  List<Document> splitDocToChunks(Document doc);
  Future<List<List<double>>> embedChunks(List<Document> chunks);
Enter fullscreen mode Exit fullscreen mode

We will create another file within the same directory called langchain_service_impl.dart to implement this abstraction. Within this file, we’ll implement the LangchainService abstraction created earlier. splitDocToChunks takes in a parameter Document, which is returned from the _pickedFile method in the IndexNotifier class earlier. It then gets the page content.

Then, we use the RecursiveCharacterTextSplitter object to create a document split text into several 1000-character chunks and return it as a Document list.

Next, we will pass the Document list to the embedChunks method, which then creates vector embeddings of this List and returns it as a List< List <double>>.

Below is how the code should look:

class LangchainServicesImpl extends LangchainService {
  final OpenAIEmbeddings embeddings;

    required this.embeddings,

  List<Document> splitDocToChunks(Document doc) {
    final text = doc.pageContent;
    const textSplitter = RecursiveCharacterTextSplitter(chunkSize: 1000);
    final chunks = textSplitter.createDocuments([text]);
    return chunks
          (e) => Document(
            pageContent: e.pageContent.replaceAll(RegExp('/\n/g'), "  "),
            metadata: doc.metadata,
  Future<List<List<double>>> embedChunks(List<Document> chunks) async {
    final embedDocs = await embeddings.embedDocuments(chunks);
    return embedDocs;
Enter fullscreen mode Exit fullscreen mode

Equally, we will update the IndexNotifier class to control the state of our application while going through all these processes:


So far, we’ve successfully enabled loading, splitting, and embedding the PDF document. Now, we need to store the split and embedded data, which is where the Neon database we created earlier comes in. To do this, we will update the LangchainService abstraction with the code below:

abstract class LangchainService {
  // the abstraction above
  Future<bool> checkExtExist();
  Future<bool> checkTableExist(String tableName);
  Future<String> createNeonVecorExt();
  Future<String> createNeonTable(String tableName);
  Future<String> deleteNeonTableRows(String tableName);
  Future<void> storeDocumentData(Document doc, List<Document> chunks,
      List<List<double>> embeddedDoc, String tableName);
Enter fullscreen mode Exit fullscreen mode

The checkExtExist method checks if the vector extension exists and returns the result from the execution. Also, the checkTableExist method checks if a table (the private String variable _filename created earlier) exists within the Neon database and returns the result from the execution, which is a boolean. To do this, we will add the code below to implement the LangchainService in the langchain_service_impl.dart file:


Note: Earlier, we mentioned that Neon allows us to write SQL commands directly on the console through their SQL Editor. Equally, we can execute these SQL commands programmatically from Flutter using the Postgres package.

The methods createNeonVecorExt, createNeonTable, and deleteNeonTableRows, handle the creation of pgVector extension, a Neon database table (the private String variable _filename created earlier), and the deletion of any stored rows (this is in the case the user wants to update the document in the database table and there is a name clash) respectively. When creating the Neon table, we will simultaneously activate vector indexing using the ivfflat algorithm from the pgVector extension. This algorithm provides an efficient solution for approximate nearest neighbor search over high-dimensional data like embeddings.


For the storeDocumentData we will pass the Langchain Document, the chunks, the embedded chunks, and the table name to it and execute an INSERT command in transaction.


Now, we will update the IndexNotifierto implement the changes to our LangchainServices accordingly. We will use the checkExtExist and checkTableExist as conditional checkers to run the createNeonVecorExt, createNeonTable, and deleteNeonTableRows as they satisfy each condition. Here is the updated code below:


We have successfully stored the PDF data within the database table as an id(text), Metadata (Map or JSON), and embedding.

To utilize the ChangeNotifier class within our application, we will mount the ChangeNotifier class using Provider for dependency injection. In this process, we will connect the Neon database and our Flutter application using the Postgres package.

The way to do this is by wrapping the initial stateless widget in the main.dart with a MultiProvider. Doing this mounts our Providers and ChangeNotifierProviders to the widget tree, allowing us to monitor the state of our application easily. Thus, we will head to the lib/core/dependency_injection/ folder, create a file called provider_locator.dart, and paste the code below:


The ProviderLocator class does the following:

  • Defines a method getProvider that:
    • Creates a LangchainService instance.
    • Returns a MultiProvider with a LangchainService provider and a ChangeNotifierProvider for IndexNotifier.
  • Defines a method _createLangchainService that:
    • Creates a PostgreSQL connection.
    • Creates an OpenAIEmbeddings instance.
    • Creates an OpenAI instance.
    • Returns a LangchainServicesImpl instance with the created connection, embeddings, and OpenAI.
  • Defines a method createPostgresConnection that:
    • Tries to establish a PostgreSQL connection with specified settings from the Neon connection details earlier.
    • If the connection fails, it retries up to a maximum number of times.
    • If the connection is not established after maximum retries, it throws an exception.
  • Defines a method _createEmbeddings that returns an OpenAIEmbeddings instance.
  • Defines a method _createOpenAIConnection that returns an OpenAI instance.

Note: For security reasons, we will use a .env file to secure our passkey. Kindly follow this article to learn more about how to use flutter_dotenv.

Now, let’s update the main.dart file with the code below:



Retrieval is a streamlined process commonly divided into two processes:

  • Retrieve: This is done by comparing the vector embedding of a user query with the closest available result present in the database. We perform this comparison using the cosine similarity search to compare a vector with another. Thus, when we get the closest results, we can use it for the second process.
  • Generate: After getting the closest result, we can use it as an assistant for the LLMs to generate responses based on that particular information.

To do this programmatically, we will head to the langchain_service.dart and in the abstraction, add this code below:

abstract class LangchainService {
  // do something
  Future<String> queryNeonTable(String tableName, String query);
Enter fullscreen mode Exit fullscreen mode

The method above returns a string response by following the retrieval process above. Here is the code for the implementation below:


The code above does the following:

  • Implements a method queryNeonTable that:
    • Embeds the query using the embeddings object.
    • Executes a SQL query on the connection to get similar items from the specified table.
    • Converts the result into a list of Metadata objects.
    • If Metadata is not empty, it concatenates the page content, creates a StuffDocumentsQAChain object, and calls it with the concatenated content and the original query to get a response.
    • If Metadata is empty, it returns a default message: “Couldn’t find anything on that topic”.

We will then create a separate ChangeNotifier class to handle the state of the query. This follows the same pattern as that of the IndexNotifier class with some slight changes. Here is the code below:

import 'package:flutter/material.dart';
import '../view_models/langchain_services.dart';

class Message {
  String? query;
  String? response;
  Message({required this.query, this.response = ""});

enum QueryState {

class QueryNotifier extends ChangeNotifier {
  late LangchainService langchainService;
  QueryNotifier({required this.langchainService});

  final List<Message> _messages = [];

  final _messagesState = ValueNotifier<List<Message>>([]);
  ValueNotifier<List<Message>> get messageState => _messagesState;

  final _queryState = ValueNotifier<QueryState>(QueryState.initial);
  ValueNotifier<QueryState> get queryState => _queryState;

  userqueryResponse(String tableName, String query) async {
    _messages.add(Message(query: query));
    _messagesState.value = List.from(_messages);

    try {
      _queryState.value = QueryState.loading;
      String response = await langchainService.queryNeonTable(tableName, query);
      final List<Message> updatedMessages = List.from(_messages);
      updatedMessages.last.response = response;
      _messagesState.value = updatedMessages;
      _queryState.value = QueryState.loaded;
    } catch (e) {
      // Handle errors if necessary
      _queryState.value = QueryState.error;
      await Future.delayed(const Duration(milliseconds: 2000));
      _queryState.value = QueryState.initial;
Enter fullscreen mode Exit fullscreen mode

The code above does the following:

  • Defines a Message class with query and response fields.
  • Defines an enum called QueryState with states: initial, loading, loaded, and error.
  • Creates a QueryNotifier class that extends ChangeNotifier:
    • Initializes a LangchainService object.
    • Maintains a list of Message objects.
    • Defines ValueNotifier objects for messagesState and queryState.
    • Defines a method userqueryResponse that:
      • Adds a new Message to _messages.
      • Sets the queryState to loading.
      • Calls queryNeonTable method of langchainService to get a response.
      • Updates the last message’s response and sets queryState to loaded.
      • Handles errors by setting queryState to error, then back to initial after a delay.

After, we will update the getProvider method in the provider_locator.dart file by adding another ChangeNotifierProvider class to the MultiProvider. Here is how the code is below:

class ProviderLocator {
  // provider tree
  static Future<MultiProvider> getProvider(Widget child) async {
    final langchainService = await _createLangchainService();
    return MultiProvider(
      providers: [
        Provider<LangchainService>.value(value: langchainService),
        // IndexNotifier
          create: (_) => IndexNotifier(langchainService: langchainService),
        // QueryNotifier
          create: (_) => QueryNotifier(langchainService: langchainService),
      child: child,
Enter fullscreen mode Exit fullscreen mode

That is it — we should have the result for the application as below:

neon and langchain | Opentape

Muyiwa Femi-Ige - Feb 7th, 3:10pm


neon database result

Here is a link to the repository containing all the code.


Retrieval augmented generation (RAG) enhances LLMs by integrating techniques to ensure a factual and contextual response. The collaboration of a vector database like Neon with the RAG technique and Langchain elevate the capabilities of learnable machines to unprecedented levels. This leads to more brilliant virtual assistants, data analysis tools, and more.

In conclusion, the integration of RAG with pgVector and Langchain is a testament to the incredible prowess of AI and its hopeful future.


Here are some resources that will guide you more in this journey:

Top comments (0)