[ UPDATED 23/03/2024 ]
PrivateGPT is a production-ready AI project that allows you to ask questions about your documents using the power of Large Language Models (LLMs), even in scenarios without an Internet connection. 100% private, no data leaves your execution environment at any point.
Running it on Windows Subsystem for Linux (WSL) with GPU support can significantly enhance its performance. In this guide, I will walk you through the step-by-step process of installing PrivateGPT on WSL with GPU acceleration.
Installing this was a pain in the a** and took me 2 days to get it to work. Hope this can help you on your own journey… Good luck !
Prerequisites
Before we begin, make sure you have the latest version of Ubuntu WSL installed. You can choose from versions such as Ubuntu-22–04–3 LTS or Ubuntu-22–04–6 LTS available on the Windows Store.
Updating Ubuntu
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install build-essential
ℹ️ “upgrade” is very important as python stuff will explode later if you don’t
Cloning the PrivateGPT repo
git clone https://github.com/imartinez/privateGPT
Setting Up Python Environment
To manage Python versions, we’ll use pyenv. Follow the commands below to install it and set up the Python environment:
sudo apt-get install git gcc make openssl libssl-dev libbz2-dev libreadline-dev libsqlite3-dev zlib1g-dev libncursesw5-dev libgdbm-dev libc6-dev zlib1g-dev libsqlite3-dev tk-dev libssl-dev openssl libffi-dev
curl https://pyenv.run | bash
export PATH="/home/$(whoami)/.pyenv/bin:$PATH"
Add the following lines to your .bashrc file:
export PYENV_ROOT="$HOME/.pyenv"
[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
Reload your terminal
source ~/.bashrc
Install important missing pyenv stuff
sudo apt-get install lzma
sudo apt-get install liblzma-dev
Install Python 3.11 and set it as the global version:
pyenv install 3.11
pyenv global 3.11
pip install pip --upgrade
pyenv local 3.11
Poetry Installation
Install poetry to manage dependencies:
curl -sSL https://install.python-poetry.org | python3 -
Add the following line to your .bashrc:
export PATH="/home/<YOU USERNAME>/.local/bin:$PATH"
ℹ️ Replace by your WSL username ($ whoami
)
Reload your configuration
source ~/.bashrc
poetry --version # should display something without errors
Installing PrivateGPT Dependencies
Navigate to the PrivateGPT directory and install dependencies:
cd privateGPT
poetry install --extras "ui embeddings-huggingface llms-llama-cpp vector-stores-qdrant"
In need of a free and open-source Multi-Agents framework built for running local LLMs?
Nvidia Drivers Installation
Visit Nvidia’s official website to download and install the Nvidia drivers for WSL. Choose Linux > x86_64 > WSL-Ubuntu > 2.0 > deb (network)
Follow the instructions provided on the page.
Add the following lines to your .bashrc:
export PATH="/usr/local/cuda-12.4/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH"
ℹ️ Maybe check the content of “/usr/local” to be sure that you do have the “cuda-12.4” folder. Yours might have a different version.
Reload your configuration and check that all is working as expected
source ~/.bashrc
nvcc --version
nvidia-smi.exe
ℹ️ “nvidia-smi” isn’t available on WSL so just verify that the .exe one detects your hardware. Both commands should displayed gibberish but no apparent errors.
Building and Running PrivateGPT
Finally, install LLAMA CUDA libraries and Python bindings:
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
Let private GPT download a local LLM for you (mixtral by default):
poetry run python scripts/setup
To run PrivateGPT, use the following command:
make run
This will initialize and boot PrivateGPT with GPU support on your WSL environment.
ℹ️ You should see “blas = 1” if GPU offload is working.
...............................................................................................
llama_new_context_with_model: n_ctx = 3900
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 487.50 MiB
llama_new_context_with_model: KV self size = 487.50 MiB, K (f16): 243.75 MiB, V (f16): 243.75 MiB
llama_new_context_with_model: graph splits (measure): 3
llama_new_context_with_model: CUDA0 compute buffer size = 275.37 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 15.62 MiB
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
18:50:50.097 [INFO ] private_gpt.components.embedding.embedding_component - Initializing the embedding model in mode=local
ℹ️ Go to 127.0.0.1:8001
in your browser
Uploaded the Orca paper and asking random stuff about it.
Conclusion
By following these steps, you have successfully installed PrivateGPT on WSL with GPU support. Enjoy the enhanced capabilities of PrivateGPT for your natural language processing tasks.
If something went wrong then open your window and throw your computer away. Then start again at step 1.
You can also remove the WSL with:
wsl.exe --list -v
wsl --unregister <name of the wsl to remove>
If this article help you in any way consider giving it a like ! Thx
Troubleshooting
Having a crash when asking a question or doing make run ? Here are the issues I encountered and how I fixed them.
- Cuda error
CUDA error: the provided PTX was compiled with an unsupported toolchain.
current device: 0, in function ggml_cuda_op_flatten at /tmp/pip-install-3kkz0k8s/llama-cpp-python_a300768bdb3b475da1d2874192f22721/vendor/llama.cpp/ggml-cuda.cu:9119
cudaGetLastError()
GGML_ASSERT: /tmp/pip-install-3kkz0k8s/llama-cpp-python_a300768bdb3b475da1d2874192f22721/vendor/llama.cpp/ggml-cuda.cu:271: !"CUDA error"
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
make: *** [Makefile:36: run] Aborted
This one comes from downloading the latest CUDA stuff and your drivers are not up to date. So open the Nvidia "Geforce experience" app from Windows and upgrade to the latest version and then reboot.
- CPU only
If privateGPT still sets BLAS to 0 and runs on CPU only, try to close all WSL2 instances. Then reopen one and try again.
If it's still on CPU only then try rebooting your computer. This is not a joke… Unfortunatly.
A note on using LM Studio as backend
I tried to use the server of LMStudio as fake OpenAI backend. It does work but not very well. Need to do more tests on that and I’ll update here.
For now what I did is start the LMStudio server on the port 8002 and unchecked “Apply Prompt Formatting”.
On PrivateGPT I edited “settings-vllm.yaml”
and updated “openai > api_base” to “http://localhost:8002/v1"
and the model to “dolphin-2.7-mixtral-8x7b.Q5_K_M.gguf” which is the one I use in LMStudio. It’s displayed in LMStudio if your wondering.
Other authors you might like
the-rise-of-human-based-botnets-unconventional-threats-in-cyberspace
Top comments (107)
Thank you very much for this guide. I have ran in to some issues regarding an updated version of poetry.
They have now gotten rid of --with in favor of --extras and the group 'local' is missing.
I went through all the errors I got without installing local and came up with this command:
poetry install -E llms-llama-cpp -E ui -E vector-stores-qdrant -E embeddings-huggingface
The model runs, without GPU support for some reason, and errors out when I input something in the UI to interact with the LLM. Any thoughts?
Thx for the input. I'll try and update the tutorial as soon as possible.
In the meantime a SO post has been made. Maybe it can help you ? stackoverflow.com/questions/781499...
Hey. Updated the tutorial to the latest version of privateGPT. Works for me on a fresh install. HF.
Thanks! This worked for me. Installing the huggingface via pip didn't work.
this helped me:
stackoverflow.com/questions/781499...
Thank you very much for this information! Now it´s running but without gpu support.
When you start the server it sould show "BLAS=1". If not, recheck all GPU related steps. For instance, installing the nvidia drivers and check that the binaries are responding accordingly. Maybe start over the whole thing... Before getting things right I had to redo the whole process a bunch of times... If you messed up up a package it could have impacted GPU support...
Hey. Updated the tutorial to the latest version of privateGPT. Works for me on a fresh install. HF.
For CPU only problems a simple reboot tends to do the trick... lol.
I have used this tutorial multiple times with success. I even set up remote access despite WSL2 being a PITA to host through. I am having a recent/new issue though. The Mistral model is now gated at huggingface, so I get an error. I have my token but I am not sure how to execute the command. My error happens at nearly the last step. Any ideas? I did log into huggingface and get access as discussed here: huggingface.co/mistralai/Mistral-7... and also added my huggingface token to settings.yaml. No luck.
solved it:
Good job ;-)
In which step do i need to include it
@alex8642 - I tried installing Mistral 7.1 (I have a Nvidia 6gb GPU), but it said it needs more space as pyTorch tool majority of that. Just curious what GPU do you have and if you had similar issue. I'm running the basic Llama2 right now.
I was able to get it working after about 6 hours of trying to follow the changes and around 9-10 tries. It does reconize my GPU drivers but it just does not use them. Maybe because the local command in poetry is not there anymore..... 1 week after you made this tutorial. :(
It now has cuba-12.4 and that's what I used on my .bashrc but why does the nvidia-smi.exe display 12.3 ?
sadly BLAS = 0 and not 1.
I got a errors on the final page... I was just about ready to scrap it again, after getting errors about API Split and API tokens exceeded....... when it started working after a restart.
Looking forward to an update maybe? I just came to this page because of Network Chuck.. It's nice to get a taste of what AI can do..
Thx for all the investigation ! I updated the tutorial to the latest version of privateGPT. Works for me on a fresh install. HF ^^.
Nvidia-smi.exe called in WSL is actually the Windows Nivida driver which is currently running the Cuda version 12.3. Your GPU isn't being used because you have installed the 12.4 Cuda toolkit in WSL but your Nvidia driver installed on Windows is older and still using Cuda 12.3.
I suggest you update the Nvidia driver on Windows and try again. nvidia.com/download/index.aspx
I updated my graphics driver just like you said, but I used the Nvidia Experience because it was already waiting for an update and restarted my host computer..
But when I tried to run it, the graphics card was still not being used.
BLAS =0 :(
So instead of starting from scratch, I just started at the "Building and Running PrivateGPT" section, since I noticed that there was a --force-reinstall flag already there.
Now I have the BLAS =1 flag. :)
Thanks..
That was a challenge, because I had never used wsl before, even though it was already on my computer.. I just never knew what it was. I might tempt fate and start over and make a video.. I had some DNS issues when I first started, and I did not know which .bashrc file was... or whichone to use..... I found many.... I have not messed with Linux in a hot minute, this was a good refresher.
It's about 5 times faster now.
Good to hear you got it working!
Hi,
Thank you very much for your trouble doing this guide, I found it very useful and I could follow it and end up with a perfectly working Private GPT.
In the future how could I update the LLM Private GPT uses to an updated one?
Honnestly... You really should consider using ollama to deal with LLM installation and simply plug all your softwares (privateGPT included) directly to ollama. Ollama is very simple to use and is compatible with openAI standards.
Hi Emilien,
Thanks for the tip, how could I do that? Should I change the /scripts/setup in any way before running:
poetry run python scripts/setup ?
I'm looking forward to changing my LLM to llama3.
There is a "settings-ollama.yaml" file at the root of the repo. Follow the documentation on how to start privateGPT whith this file instead of the default "settings.yaml" and you should be okay.
About the current deprecation to wich I'll address asap, someone in the Medium comments made it out alive with the command
poetry install --extras "llms-llama-cpp ui vector-stores-qdrant embeddings-huggingface"
. Haven't tried it myself yet but if you'r stuck than you don't have much to lose.slight update to the command to run! Funny enough, I asked phind AI to help and it gave me this:
CMAKE_ARGS='-DGGML_CUDA=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
Running this gave me no issues when building the wheel.
I honestly have almost zero knowledge about programming but i have been able to set up privateGPT using your guide. Thanks a lot!
The only problem i had was this error message "LLAMA_CUBLAS is deprecated and will be removed in the future. Use GGML_CUDA instead."
I did what it says and it worked! Just letting you know if you'd like to update this guide if necessary.
Came across this today. I had to change DLLAMA_CUBLAS to DGGML_CUDA in the CMAKE line. I also downgraded numpy after the CMAKE to address resolver errors related to numpy 2.0.0.
CMAKE_ARGS='-DGGML_CUDA=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
poetry run pip install numpy==1.23.2.
Thank you for the time you put into this guide.
I just want to say thankyou for the guide. I only had to go through it once and was up and running in the browser in a couple of hours after figuring out where the Nvidia drivers were actually located. I'll start digging deeper into it tomorrow, but this sure is amazingly useful.
Mind you, it wasn't without difficulty as I misunderstood some of the steps to just be commands pasted in the terminal and run when they should have been pasted into ~/.bashrc file instead. Also it might be a good idea to put in a note at the end about being in the proper ~/privateGPT directory in the section on Building and Running PrivateGPT for those of us who are moving around while editing the ~/.bashrc file. Minor detail and that command is in an earlier step, it's just easy to find yourself in a different directory if not doing this all in one single session.
One final note about the ~/.bash_profile might need to be created and add the command "source ~/.bashrc" in it so that file gets sourced automatically when launching a fresh new wsl shell.
Thx for the input :-)
Hope you got it working.
For the Nvidia Drivers part, Choose Windows > x86_64 > WSL-Ubuntu > 2.0 > deb (network). I see Windows and x86_64, but then there is no WSL-Ubuntu. So I'm stuck at downloading the drivers part of the tutorial. Did Nvidia remove those drivers?
UPDATE: This was an error in the guide or the Nvidia website changed the way these options work recently. To get the correct drivers, select Linux > x86_64 > WSL-Ubuntu > 2.0 > Deb (Network). So it appears the guide needs to be updated.
The website must have change. I'll update. Thx.
For anyone still encountering issues after the updated tutorial like I did, check the version of poetry that is installed. In my case it was using poetry 1.12 which doesn't work with the updated tutorial.
I just did this
sudo apt install pipx
pipx install poetry==1.2.0
pipx ensurepath
You will have to login and check the poetry version before proceeding
PS: Thanks a lot for this guide, had tried quite a few others before getting it right with this one.