🦉 AthenaDB: Distributed Vector Database Powered by Cloudflare 🌩️

#ai #database #cloud #serverless

What is AthenaDB?

AthenaDB is a serverless vector database designed to be highly distributed and easily accessible as an API. It leverages Cloudflare’s Workers AI platform to create the vectors, Cloudflare Vectorize for handling vector querying, and Cloudflare D1 as its database for storing text. This combination allows AthenaDB to offer a simple yet powerful set of API endpoints for inserting, querying, retrieving, and deleting vector text data.

Key Features of AthenaDB

Simple API Endpoints: AthenaDB provides straightforward endpoints for various database operations, making it accessible for developers of all skill levels.
Distributed Nature: With data replication across multiple data centers, AthenaDB ensures high availability and resilience.
Built-In Data Replication: Due to Cloudflare Workers’ underlying architecture, data is replicated across data centers automatically.
Scalability: AthenaDB is designed to handle large amounts of vector text data, making it suitable for projects with high data volumes.
Serverless Architecture: With AthenaDB being serverless, you don't have to worry about managing infrastructure, allowing for more focus on development.

What Are Vector Databases?

Vector databases are a special kind of computer storage that helps artificial intelligence (AI) programs quickly understand and use information. They work by turning data into numbers (called vectors) that the AI can easily compare to find similarities. This is really useful for things like online searches, suggesting products you might like, or creating smart chatbots.

For example, if a vector database has the following three items: “Python is cool”, “Java is cool”, and “C is statically typed”, and the user uses a search query “coffee”, it would return “Java is cool”. Why? Because while the user may not have been talking about the programming language Java, the words “java” and “coffee” have similar root meaning, which the neural network that created the vectors relates using complex math.

You can learn more about them in this article.

Why Cloudflare?

Cloudflare has a serverless compute platform called Workers. Workers are automatically replicated across all Cloudflare data centers, meaning that the developer can make an API or other application that automatically scales with zero infrastructure! Workers also automatically routes user requests to their nearest data center, meaning that latency is reduced significantly!

By using this, it means AthenaDB gets many of the features of Cloudflare’s Platform - Data replication, distribution across data centers, and an infinitely scalable serverless architecture - with no complicated code base or management.

Getting Started with AthenaDB

Deploying and using AthenaDB involves a few steps, starting from setting up your environment to deploying your instance of AthenaDB.

Prerequisites

Before you begin, make sure you have the following:

Cloudflare account
Node.js and npm installed
Wrangler CLI installed (npm install -g @cloudflare/wrangler)

Deployment Steps

Clone the Repository: Start by cloning the AthenaDB repository to your local machine, installing dependencies, and logging in to Wrangler.
```
git clone https://github.com/TimeSurgeLabs/athenadb.git
cd athenadb
npm i
npx wrangler login
```
Create a Vector and Database: Use the provided npm scripts to create a vector and database for your AthenaDB instance.
```
npm run create-vector
npm run create-db
```
After running these commands, copy the output Database ID and update the wrangler.toml file under database_id.
Initialize the Database: Run the initialization script to set up the database schema.
```
npm run init-db
```
Deploy AthenaDB: Finally, deploy your instance of AthenaDB using Wrangler.
```
npm run deploy
```
Upon successful deployment, you will receive an output with your API URL, which indicates that AthenaDB is now ready for use.

Using AthenaDB

With AthenaDB deployed, you can start interacting with the database through its API endpoints. Here are some examples of how you can use AthenaDB:

Inserting Text Data: Use the /insert endpoint to add text data into the database.

fetch('https://athenadb.yourusername.workers.dev/your-namespace/insert', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ input: 'Your text here' })
})

Querying the Database: To find similar text embeddings, use the /query endpoint.

fetch('https://athenadb.yourusername.workers.dev/your-namespace/query', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ input: 'Query text' })
})

Retrieving an Entry: Retrieve specific entries using their UUID with the GET endpoint.

fetch('https://athenadb.yourusername.workers.dev/your-namespace/your-uuid', {
  method: 'GET'
})

Deleting Data: Use the /delete endpoint to remove data from the database.

fetch('https://athenadb.yourusername.workers.dev/your-namespace/your-uuid', {
  method: 'DELETE'
})

Conclusion

AthenaDB stands out as a powerful tool for developers needing a scalable, serverless database solution for managing vector text data. By following the steps outlined in this blog post, you can deploy your own instance of AthenaDB and begin leveraging its capabilities for your projects. Whether you're building search engines, recommendation systems, or any application that requires efficient handling of vector data, AthenaDB provides a robust, easy-to-use solution.

If you’re looking to integrate AI into your existing workflow or products, TimeSurge Labs is here to help. Specializing in AI consulting, development, internal tooling, and LLM hosting, our team of passionate AI experts is dedicated to building the future of AI and helping your business thrive in this rapidly changing industry. Contact us today!