Run Ollama on Intel Arc GPU (IPEX)
As of the time of writing, Ollama does not officially support Intel Arc GPUs in its releases. However, Intel provides a Docker image that includes a version of Ollama compiled with Arc GPU support enabled. This guide will walk you through setting up and running Ollama on your Intel Arc GPU using the IPEX (Intel OneAPI Extension for XPU) Docker image.
Prerequisites
Before proceeding, ensure you have the following installed and properly configured:
- Docker Desktop
- Intel Arc GPU drivers
Links to the installation guides for Docker and the Arc drivers are provided at the end of this article. Be sure to follow the appropriate guide for your operating system.
Set Up Ollama Container
- Pull the Intel Analytics IPEX Image:
Pull the Intel Analytics IPEX image from Docker Hub:
docker pull intelanalytics/ipex-llm-inference-cpp-xpu:latest
- Start the Container with Ollama Serve:
Because the Docker command to start the container is quite long, it's convenient to save it to a script for easy adjustment and restarting.
Mac and Linux users: Create a file named start-ipex-llm.sh
in your home directory and add the following content:
#!/bin/bash
docker run -d --restart=always \
--net=bridge \
--device=/dev/dri \
-p 11434:11434 \
-v ~/.ollama/models:/root/.ollama/models \
-e PATH=/llm/ollama:$PATH \
-e OLLAMA_HOST=0.0.0.0 \
-e no_proxy=localhost,127.0.0.1 \
-e ZES_ENABLE_SYSMAN=1 \
-e OLLAMA_INTEL_GPU=true \
-e ONEAPI_DEVICE_SELECTOR=level_zero:0 \
-e DEVICE=Arc \
--shm-size="16g" \
--memory="32G" \
--name=ipex-llm \
intelanalytics/ipex-llm-inference-cpp-xpu:latest \
bash -c "cd /llm/scripts/ && source ipex-llm-init --gpu --device Arc && bash start-ollama.sh && tail -f /llm/ollama/ollama.log"
Once you have the script saved, make it executable (for Mac and Linux users) and run it:
chmod +x ~/start-ipex-llm.sh
~/start-ipex-llm.sh
Windows users: Create a file named start-ipex-llm.bat
and adjust the above command for the Windows terminal. Make sure to modify paths and syntax accordingly.
Explanation of Flags:
-
--restart=always
: Ensures the container restarts automatically if it stops. -
--device=/dev/dri
: Grants the container access to the GPU device. -
--net=bridge
: Uses the bridge networking driver. -
-p 11434:11434
: Maps port 11434 of the container to port 11434 on the host. -
-e OLLAMA_HOST=0.0.0.0
: Sets the host IP address for Ollama to listen on all interfaces. This allows other systems to call the Ollama API. -
-e no_proxy=localhost,127.0.0.1
: Prevents Docker from using the proxy server when connecting to localhost or 127.0.0.1. -
-e ONEAPI_DEVICE_SELECTOR=level_zero:0
: Tells Ollama which GPU device to use. This may need to be adjusted if you have an iGPU installed on your system. -
-e PATH=/llm/ollama:$PATH
: Adds the Ollama binary directory (/llm/ollama
) to thePATH
. This allow Ollama commands to be executed easily withdocker run
commands. -
-v ~/.ollama/models:/root/.ollama/models
: Mounts the host's Ollama models directory into the container. This allows downloaded models to be persisted when the container restarts. -
--shm-size="16g"
: Sets the shared memory size to 16 GB. This setting may need to be adjusted for your system. See the Docker documentation for more information on shared memory. -
--memory="32G"
: Limits the container's memory usage to 32 GB. This setting may need to be adjusted for your system. See the Docker documentation for more information on memory usage. -
--name=ipex-llm
: Names the containeripex-llm
. This name is used to reference the container in other commands.
- Download a Model:
Once the container is up, you can pull a model from the Ollama library. Replace <MODEL_ID>
with the specific model ID you wish to download (e.g., qwen2.5-coder:0.5b
, llama3.2
).
docker exec ipex-llm ollama pull <MODEL_ID>
You can browse the Ollama model library for more options here.
Using Ollama
With your desired model(s) downloaded, you can interact with them directly using the Ollama CLI, make API calls, or integrate with various tools. Below are some ways to get started.
Using the Ollama CLI
The Ollama CLI allows you to interact with models directly from your terminal. Any ollama
command that you would typically run locally can now be executed within your container by prefixing the command with docker exec -it ipex-llm
.
For example, to interact with the model you downloaded earlier:
docker exec -it ipex-llm ollama run <MODEL_ID>
Check the Ollama CLI Reference for more information about available commands.
Making API Calls
You can make API requests to the Ollama model endpoint:
curl http://localhost:11434/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "<MODEL_ID>",
"prompt": "Write a JavaScript function that takes an array of numbers and returns the sum of all elements in the array."
}'
Additional Tools
Once Ollama is running, you can leverage it with a variety of AI tools. Here are a few of my favorites:
-
Open WebUI: A user-friendly interface for interacting with AI models, offering many features similar to ChatGPT.
-
Continue.dev: An extension for VSCode and JetBrains that provides "Co-pilot" capabilities.
-
Aider: One of the first and still one of the great AI coding assistants.
-
CrewAI: An easy to use AI agent framework that can be used with Ollama models to run AI agents locally.
Feel free to suggest others that should be added to this list.
Troubleshooting
If you encounter issues, consider the following steps:
- Verify GPU Access:
Use sycl-ls
within the container to check if the Arc GPU is recognized.
To start an interactive shell within the container:
docker exec -it ipex-llm /bin/bash
Then run:
sycl-ls
You can find helpful tips here.
- Check Ollama Logs:
Monitor the logs for any errors:
docker logs ipex-llm -f
- Update Docker and Drivers:
Ensure that both Docker and your GPU drivers are up to date.
- Consult Community Resources:
Refer to Intel's GitHub repositories and community forums for additional support:
Conclusion
Running Ollama on your Intel Arc GPU is straightforward once you have the proper drivers installed and Docker running. With your system set up, it's as simple as running any other Docker container with a few extra arguments.
Keep an eye on the Ollama GitHub repository for updates, and consider contributing to pull requests to bring Intel Arc support to the official Ollama builds.
Additional Resources
Below is a running list of related links. Please feel free to comment others that you think should be added to the list.
-
Intel Arc Driver Installation:
- Ubuntu: Follow this guide
- Windows: Find the latest drivers here
-
Docker Installation Guides:
- Ubuntu: Follow this guide
- Windows: Follow this guide.
- macOS: Follow this guide
Top comments (0)