DEV Community

Cover image for Local Intelligence: How to set up a local GPT Chat for secure & private document analysis workflow
Aslan Vatsaev
Aslan Vatsaev

Posted on

Local Intelligence: How to set up a local GPT Chat for secure & private document analysis workflow

Intro

In this article, I'll walk you through the process of installing and configuring an Open Weights LLM (Large Language Model) locally such as Mistral or Llama3, equipped with a user-friendly interface for analysing your documents using RAG (Retrieval Augmented Generation). This setup allows you to analyse your documents without sharing your private and sensitive data with third-party AI providers such as OpenAI, Microsoft, Google, etc.

Prerequisites

  • You can use pretty much any machine you want, but it's preferable to use a machine a dedicated GPU or Apple Silicon (M1,M2,M3, etc) for faster inference.
  • Docker must be preinstalled

Installation

Ollama

Image description


Ollama is a service that allows us to easily manage and run local open weights models such as Mistral, Llama3 and more (see the full list of available models).
Ollama installation is pretty straight forward just download it from the official website and run Ollama, no need to do anything else besides the installation and starting the Ollama service.

Installing Ollama User Interface

Image description


Next step is installing the Ollama User Interface that will run on Docker, so Docker must be installed and running before installing the Ollama UI.

To install the UI simply run the following command in the terminal:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name ollama-webui --restart always -e WEBUI_AUTH=false ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

This will install and start the Ollama UI webserver locally on address http://localhost:3000/

Download a local model

Now that everything is up and running, we need to download a model.

Good general purpose models as of today (May 2024) are Llama3 (from Meta) and Mistral, in this article, I'll show how to install Mistral Instruct.

Go to Ollama library: https://ollama.com/library and type "mistral" in the search bar, then click on the first result:


Image description


Pick the instruct variant in the dropdown menu:


Image description


And copy the name and the tag of the model from the right side (don't copy the entire command just the model_name:tag part):


Image description


In the Ollama UI, click on the username, the bottom left corner, to display the pop over menu, click on "Settings":


Image description


Then click on "Models" on the sidebar. This form below allows us to download any model that Ollama supports

Paste the model tag mistral:instruct in the text field and click download:


Image description

--

The model installation is the same for any other models in the Ollama Library

Chat with the model

Once the model is downloaded, you can select it and set it as default:


Image description


Image description


Let's see if everything works by sending a message to the model:


Image description


Great! The model is loaded and running without any issues 🎉🥳

Now we can do some interesting things with it.

Analyse documents and data - RAG (Retrieval Augmented Generation)

You can upload documents and ask questions related to these documents, not only that, you can also provide a publicly accessible Web URL and ask the model questions about the contents of the URL (an online documentation for example). All files you add to the chat will always remain on your machine and won't be sent to the cloud.

Working with a PDF document example

Click the "+" icon in the chat and pick any PDF document you want:


Image description


I've uploaded the "Attention All you need" paper as a PDF document, and asked a specific question related to this document:

"What is the purpose of multi head attention mechanism?"


Image description


Let's check if the RAG worked correctly by looking into the original PDF document:


Image description


The RAG system was able to pinpoint the relevant part of the paper in order to answer the question 🎉

Ask questions about the contents of a Web Page

The URL of the web page must be publicly accessible, if you need to authenticate in order to view the page, the RAG won't work, so if you need to analyse a web page protected by auth, a workaround would be to first download it as PDF and upload it as a simple document.

In the chat field type # followed by a URL, for this example I'll use Doctolib's FAQ about handling relatives in your Doctolib account:


Image description


Image description


Image description


Saving the documents to your Workspace

You can also save your most often used documents in your workspace so you don't have to upload them every time, for that, click on "Workspace". go "Documents" tab, and upload your files here:

Image description

Later when you want to work with your documents, just go to chat, and type # in the message fields, you'll be presented with all documents from your work space, you can chose to work with one specific document or all of them in a single chat session:

Image description


This is just scratching the surface, the Ollama UI can be configured to make the retrieval even more performant with some tricks. If you're interested in advanced configuration and usage of this workflow let me know in the comments.


Top comments (4)

Collapse
 
atsag profile image
Andreas

Hello Aslan, thank you for sharing. Your walkthrough is excellent, very descriptive. If you have found a good RAG workflow without using OpenAI tooling, please do share!

Collapse
 
avatsaev profile image
Aslan Vatsaev

I started working with tools like n8n and langflow for my RAG workflow, I could do I write up about these if interested

Collapse
 
atsag profile image
Andreas

Well, only if you think that they have great performance... otherwise I am afraid that we must wait some more time. Thank you for replying!

Collapse
 
patrikbandik profile image
patrikbandik

Great tutorial Aslan!

How would you run this on a larger scale with hundreds of company documents? Possibly host it on own server?
Tasks it would accomplish:

  1. find the most relevant documents
  2. chat with the document and stay within context
  3. if possible compare two documents