Part 1 covered PostgreSQL with pgvector setup, and Part 2 implemented vector search using OpenAI embeddings. This final part demonstrates how to run vector search locally using Ollama! β¨
Contents
- Contents
- Why Ollama?
- Setting Up Ollama with Docker
- Database Updates
- Implementation
- Search Queries
- Performance Tips
- Troubleshooting
- OpenAI vs. Ollama
- Wrap Up
Why Ollama? π¦
Ollama allows you to run AI models locally with:
- Offline operation for better data privacy
- No API costs
- Fast response times
We'll use the nomic-embed-text
model in Ollama, which creates 768-dimensional vectors (compared to OpenAI's 1536 dimensions).
Setting Up Ollama with Docker π³
To add Ollama to your Docker setup, add this service to compose.yml
:
services:
db:
# ... (existing db service)
ollama:
image: ollama/ollama
container_name: ollama-service
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
data_loader:
# ... (existing data_loader service)
environment:
- OLLAMA_HOST=ollama
depends_on:
- db
- ollama
volumes:
pgdata:
ollama_data:
Then, start the services and pull the model:
docker compose up -d
# Pull the embedding model
docker compose exec ollama ollama pull nomic-embed-text
# Test embedding generation
curl http://localhost:11434/api/embed -d '{
"model": "nomic-embed-text",
"input": "Hello World"
}'
Database Updates π
Update the database to store Ollama embeddings:
-- Connect to the database
docker compose exec db psql -U postgres -d example_db
-- Add a column for Ollama embeddings
ALTER TABLE items
ADD COLUMN embedding_ollama vector(768);
For fresh installations, update postgres/schema.sql
:
CREATE TABLE items (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
item_data JSONB,
embedding vector(1536), # OpenAI
embedding_ollama vector(768) # Ollama
);
Implementation πΎ
Update requirements.txt
to install the Ollama Python library:
ollama==0.3.3
Hereβs an example update for load_data.py
to add Ollama embeddings:
import ollama # New import
def get_embedding_ollama(text: str):
"""Generate embedding using Ollama API"""
response = ollama.embed(
model='nomic-embed-text',
input=text
)
return response["embeddings"][0]
def load_books_to_db():
"""Load books with embeddings into PostgreSQL"""
books = fetch_books()
for book in books:
description = (
f"Book titled '{book['title']}' by {', '.join(book['authors'])}. "
f"Published in {book['first_publish_year']}. "
f"This is a book about {book['subject']}."
)
# Generate embeddings with both OpenAI and Ollama
embedding = get_embedding(description) # OpenAI
embedding_ollama = get_embedding_ollama(description) # Ollama
# Store in the database
store_book(book["title"], json.dumps(book), embedding, embedding_ollama)
Note that this is a simplified version for clarity. Full source code is here.
As you can see, the Ollama API structure is similar to OpenAIβs!
Search Queries π
Search query to retrieve similar items using Ollama embeddings:
-- View first 5 dimensions of an embedding
SELECT
name,
(replace(replace(embedding_ollama::text, '[', '{'), ']', '}')::float[])[1:5] as first_dimensions
FROM items;
-- Search for books about web development:
WITH web_book AS (
SELECT embedding_ollama FROM items WHERE name LIKE '%Web%' LIMIT 1
)
SELECT
item_data->>'title' as title,
item_data->>'authors' as authors,
embedding_ollama <=> (SELECT embedding_ollama FROM web_book) as similarity
FROM items
ORDER BY similarity
LIMIT 3;
Performance Tips π
Add an Index
CREATE INDEX ON items
USING ivfflat (embedding_ollama vector_cosine_ops)
WITH (lists = 100);
Resource Requirements
- RAM: ~2GB for the model
- First query: Expect slight delay for model loading
- Subsequent queries: ~50ms response time
GPU Support
If processing large datasets, GPU support can greatly speed up embedding generation. For details, refer to the Ollama Docker image.
Troubleshooting π§
Connection Refused Error
The Ollama library needs to know where to find the Ollama service. Set the OLLAMA_HOST
environment variable in data_loader
service:
data_loader:
environment:
- OLLAMA_HOST=ollama
Model Not Found Error
Pull the model manually:
docker compose exec ollama ollama pull nomic-embed-text
Alternatively, you can add a script to automatically pull the model within your Python code using the ollama.pull() function.
High Memory Usage
- Restart Ollama service
- Consider using a smaller model
OpenAI vs. Ollama βοΈ
Feature | OpenAI | Ollama |
---|---|---|
Vector Dimensions | 1536 | 768 |
Privacy | Requires API calls | Fully local |
Cost | Pay per API call | Free |
Speed | Network dependent | ~50ms/query |
Setup | API key needed | Docker only |
Wrap Up π―
This tutorial covered only how to set up a local vector search with Ollama. Real-world applications often include additional features like:
- Query optimization and preprocessing
- Hybrid search (combining with full-text search)
- Integration with web interfaces
- Security and performance considerations
The full source code, including a simple API built with FastAPI, is available on GitHub. PRs and feedback are welcome!
Resources:
Questions or feedback? Leave a comment below! π¬
Top comments (0)