yukaty

Posted on Nov 7

Getting Started with Vector Search (Part 2)

#beginners #postgres #python #ai

In Part 1, we set up PostgreSQL with pgvector. Now, let's see how vector search actually works.

What are Embeddings?
Loading Sample Data
Exploring Vector Search
Understanding PostgreSQL Operators
Next Steps

What are Embeddings?

An embedding is like a smart summary of content in numbers. The distance between two embeddings indicates their level of similarity. A small distance suggests that the vectors are quite similar, and a large distance indicates that they are less related.

📚 Book A: Web Development  (Distance: 0.2) ⬅️ Very Similar!
📚 Book B: JavaScript 101   (Distance: 0.3) ⬅️ Similar!
📚 Book C: Cooking Recipes  (Distance: 0.9) ❌ Not Similar

Loading Sample Data

Now, let's populate our database with some data. We'll use:

Open Library API for book data
OpenAI API to create embeddings
pgvector to store and search them

Project Structure

pgvector-setup/             # From Part 1
  ├── compose.yml
  ├── postgres/
  │   └── schema.sql
  ├── .env                  # New: for API keys
  └── scripts/              # New: for data loading
      ├── requirements.txt
      ├── Dockerfile
      └── load_data.py

Create a Script

Let's start with a script to load data from external APIs. The full script is Here.

Setting Up Data Loading

Create .env:

OPENAI_API_KEY=your_openai_api_key

Update compose.yml to add the data loader:

services:
  # ... existing db service from Part 1

  data_loader:
    build:
      context: ./scripts
    environment:
      - DATABASE_URL=postgresql://postgres:password@db:5432/example_db
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    depends_on:
      - db

Load the data:

docker compose up data_loader

You should see 10 programming books with their metadata.

Exploring Vector Search

Connect to your database:

docker exec -it pgvector-db psql -U postgres -d example_db

Understanding Vector Data

Let's peek at what embeddings actually look like:

-- View first 5 dimensions of an embedding
SELECT
    name,
    (embedding::text::float[])[1:5] as first_5_dimensions
FROM items
LIMIT 1;

Each embedding has 1536 dimensions (using OpenAI's model)
Values typically range from -1 to 1
These numbers represent semantic meaning

Finding Similar Books

Try a simple similarity search:

-- Find 3 books similar to any book about Web
SELECT name, metadata
FROM items
ORDER BY embedding <-> (
    SELECT embedding
    FROM items
    WHERE metadata->>'title' LIKE '%Web%'
    LIMIT 1
)
LIMIT 3;

Find a book with "Web" in its title
Get that book's embedding (its mathematical representation)
Compare this embedding with all other books' embeddings
Get the 3 most similar books (smallest distances)

Understanding PostgreSQL Operators

Let's break down the operators used in vector search queries:

JSON Text Operator: `->>`

Extracts text value from a JSON field.

Example:

-- If metadata = {"title": "ABC"}, it returns "ABC"
SELECT metadata->>'title' FROM items;

Vector Distance Operator: `<->`

Measures similarity between two vectors.

Smaller distance = More similar
Larger distance = Less similar

Example:

-- Find similar books
SELECT name, embedding <-> query_embedding as distance
FROM items
ORDER BY distance
LIMIT 3;

Next Steps

Up next, we'll:

Build a FastAPI application
Create search endpoints
Make our vector search accessible via API

Stay tuned for Part 3: "Building a Vector Search API"! 🚀

Feel free to drop a comment below! 💬

DEV Community

Getting Started with Vector Search (Part 2)

Contents

What are Embeddings?

Loading Sample Data

Project Structure

Create a Script

Setting Up Data Loading

Exploring Vector Search

Understanding Vector Data

Finding Similar Books

Understanding PostgreSQL Operators

JSON Text Operator: `->>`

Vector Distance Operator: `<->`

Next Steps

Top comments (0)

Read next

5 Schritte zum Scraping mehrerer Bilder mit Python

Most affordable Whisper API

Fix Bug Search and Pagination in Vue.js: Dev Wants to Chill But Bug Doesn't Let

Create Toggle Button in HTML CSS & JavaScript

Contents

What are Embeddings?

Loading Sample Data

Project Structure

Create a Script

Setting Up Data Loading

Exploring Vector Search

Understanding Vector Data

Finding Similar Books

Understanding PostgreSQL Operators

JSON Text Operator: ->>

Vector Distance Operator: <->

Next Steps

Read next

5 Schritte zum Scraping mehrerer Bilder mit Python

Most affordable Whisper API

Fix Bug Search and Pagination in Vue.js: Dev Wants to Chill But Bug Doesn't Let

Create Toggle Button in HTML CSS & JavaScript

JSON Text Operator: `->>`

Vector Distance Operator: `<->`