DEV Community

Cover image for StarSense - new way of interacting with repos
hayerhans
hayerhans

Posted on

StarSense - new way of interacting with repos

This is a submission for the Open Source AI Challenge with pgai and Ollama

What I Built

I built StarSense, an intelligent chat interface that helps developers easily search and discover their starred GitHub repositories using natural language. The project leverages RAG (Retrieval-Augmented Generation) technology to create a seamless conversation experience with your GitHub stars.

StarSense automatically processes your starred repositories by:

  1. Authenticating with GitHub via OAuth
  2. Fetching all starred repositories
  3. Extracting and processing README content
  4. Storing repository information in PostgreSQL
  5. Generating embeddings using pgai vectorizer
  6. Enabling natural language queries using vector similarity search

Demo/Repo

Repo: https://github.com/XamHans/starsense
Video: https://youtu.be/Uf1uzI0e3jM

The application features a clean chat interface where users can interact with their starred repositories naturally:

Image description
Repository Management

The project utilizes a robust architecture integrating Timescale, pgai, and Ollama:

Architecture Diagram

Tools Used

Frontend

  • Next.js 14: Latest version of the React framework for building the web interface
  • TypeScript: For type-safe code
  • TailwindCSS: For styling and responsive design
  • NextAuth.js: Handling GitHub OAuth authentication
  • WebSocket Client: Real-time updates during repository ingestion

Backend

  • FastAPI: Modern Python web framework for building the API
  • WebSocket: Real-time connection for providing ingest phase status updates
  • Poetry: Python dependency management

AI and Vector Search

  1. pgai Vectorizer: Implemented to generate embeddings for repository content using the following configuration:
SELECT ai.create_vectorizer(
  'public.repositories'::regclass,
  embedding=>ai.embedding_openai('text-embedding-3-small', 1536, api_key_name=>'OPENAI_API_KEY'),
  chunking=>ai.chunking_recursive_character_text_splitter('readme'),
  formatting=>ai.formatting_python_template('name: $name url: $url content: $chunk')
);
Enter fullscreen mode Exit fullscreen mode
  1. AI Extensions: The project utilizes multiple Timescale extensions:

    • ai extension for core AI functionality
    • vector extension for similarity search
    • vectorscale for scalable vector operations
  2. Ollama: Used for generating natural language responses based on retrieved repository content, specifically utilizing the llama3 model.

Database & Infrastructure

  • TimescaleDB: PostgreSQL-based database with vector search capabilities
  • GitHub API: For fetching starred repositories and README content

Final Thoughts

Building StarSense has been an exciting journey in combining modern AI technologies with practical developer tools. The integration of pgai's vectorizer with Ollama's language models creates a powerful synergy that makes repository discovery feel natural and intuitive.

Some key learnings and highlights:

  • The pgai vectorizer dramatically simplified the embedding process by:
    • Automatically handling document chunking and preprocessing
    • Managing embedding generation and storage
    • Eliminating the need for separate embedding infrastructure
    • Seamlessly integrating with existing PostgreSQL workflows
  • Timescale's AI extensions provided a robust foundation for vector operations
  • Ollama's open-source models offered great performance for natural language generation
  • The WebSocket implementation enabled real-time feedback during the repository ingestion process
  • The combination of Next.js 14 and FastAPI created a performant and developer-friendly stack

This submission qualifies for the following prize categories:

  • Open-source Models from Ollama (utilizing llama3)
  • Vectorizer Vibe (implementing pgai vectorizer)
  • All the Extensions (using ai, vector, and vectorscale extensions)

Top comments (0)