Introduction
Welcome to my latest adventure: creating an AI "second brain" that's more than just a digital notebook — it's like a smart personal assistant that knows what I need before I do. This journey involves using Cloudflare Vectorize for storing snippets of my life and then retrieving relevant entries for each query I have. The goal is a tool that makes sense of all my notes and reminders.
In this post, I'll explain how I pieced this together, the tech behind it, and why I think this will be useful in my daily life as a programmer (and a person). It won't be a tutorial, as such, but I'll provide snippets throughout the body to explain concepts, and then, I will provide the full code at the end.
Use Cases
I haven't explored all of these fully, but here are some use-cases I have thought of so far:
- daily briefings. Could be a cronjob that sends an email summarising everything I need to know that day. Any urgent action items, etc.
- automated meeting summaries. I keep meeting notes in Obsidian, and this could automatically detect these, and summarise in email (or other) form.
- project reports. Depending on how well I'm keeping notes on my project (I keep daily notes on all sorts of things), it could summarise work done in the last days/weeks/etc.
Vector Databases, Embeddings, and Similarity Search: The Tech Behind the AI Second Brain
Let's delve into the technical aspects of vector databases and embeddings, and and how they relate to my second brain. These are the core components that make this system not just a storage unit, but an intelligent assistant.
Vector Databases: Efficient Storage and Retrieval
A vector database, like Cloudflare Vectorize, is specialised for storing and querying vectors. In this context, vectors are essentially high-dimensional data points. Each piece of information - a note, a calendar entry, or an email - is converted into a vector. This vector represents the essence of the text in a mathematical form. The beauty of a vector database lies in its ability to compare these vectors "spatially", where those vectors that are "close" together are also similar semantically. So, how do we create the vectors?
Embeddings: Translating Text into Vectors
Embeddings are where the transformation of text into vectors happens. This process involves using algorithms (like those in OpenAI's models) to analyse the text and encode it into a numerical form that captures its semantic meaning. Cloudflare has its own tool for creating embeddings, but I chose to use OpenAI because of its extra little quality and higher dimensional vectors (1536 dimensions for OpenAI).
Here's how to get a vector from text using OpenAI:
const embedding = await openai.embeddings.create({
encoding_format: 'float',
input: text,
model: 'text-embedding-ada-002',
});
const vector = embedding?.data?.[0].embedding;
The vector
here is of type number[]
, and can be inserted directly into the vector DB like this:
await env.VECTORIZE_INDEX.insert([{
id: someId, // Very useful because you need this if you want to be able to delete it later, which we do.
values: vector,
metadata: {
// Any metadata you want to store, usually you will store the content so that when you query the DB, you can grab the original text too:
text,
},
}]);
Splitting the Text up into Manageable Chunks
Something I only thought about after having started on this journey is: imagine you have a huge document and you then create an embedding for it. And then you have 5 smaller texts too. Your query might retrieve all 6, but in terms of word count, they could be absolutely dominated by the larger text.
We don't want that. We need something more balanced, so that our LLM can draw context from a wide variety of sources for its responses.
Therefore we'll use a text splitter. Here I'm using the one from Langchain, which splits it up into text lengths of 1536 characters, with a 200 character overlap to keep semantic similarity between the sections. So when we post a note, it will split it up into smaller documents, then embed each of those. Luckily embeddings are cheap so this isn't a costly process, even for large files.
const splitter = new RecursiveCharacterTextSplitter({
chunkOverlap: 200,
chunkSize: 1536,
});
const documents = splitter.createDocuments(
[content],
[
{
fileName,
timestamp: Date.now(),
},
],
{
appendChunkOverlapHeader: true,
chunkHeader: `FILE NAME: ${fileName || 'None.'}\n\n---\n\n`,
},
);
Then, we loop over the documents
and embed them individually:
for (const [i, document] of documents.entries()) {
try {
const embedding = await openai.embeddings.create({
encoding_format: 'float',
input: document.pageContent,
model: 'text-embedding-ada-002',
});
const vector = embedding?.data?.[0].embedding;
Considerations when Inserting Documents into the Vector DB.
One issue I thought of was that what if, when I post an update to a note, some of the split chunks are different. If I just "upserted" the docs by their id
, you could quickly lose the context of your original text.
I therefore decided to keep track of any files I've added here and, when adding and update, delete the existing entries before recalculating and adding in the new vectors.
I found this difficult and messy to do with just Vectorize. Some other vector databases have the required functionality, e.g. Pinecone - but it's ludicrously expensive for hobbyists. Luckily Cloudflare gives you easy access to a KV store that is both rock-solid and extremely cheap (never been charged for it in any of my toy projects to date).
So, after creating the embeddings, we add the id
s of the embeddings under the key of the file name. In reverse, therefore, when we post a note we can search for id
s by file name and delete by id
s (which is a Vectorize function).
Adding the id
s to KV:
await env.NOTES_AI_KV.put(filename, JSON.stringify(embeddingsArray.map(embedding => embedding.id)));
Deleting the entries by file name:
async function deleteByFilename(filename: string, env: Env) {
// If there are existing embeddings for this file, delete them
const existingIds: string[] = JSON.parse((await env.NOTES_AI_KV.get(filename)) ?? '') ?? [];
if (existingIds.length) {
await env.VECTORIZE_INDEX.deleteByIds(existingIds);
}
return existingIds;
}
Similarity Search: Finding Relevant Connections
So what happens when we query? First we convert that query itself into a vector using the same embedding tool and model (i.e. Open AI's text-embedding-ada-002
). Then we use that vector to do a similarity search on our vector database.
This similarity is determined based on how close or far apart vectors are in the high-dimensional space. The result is a set of data points that are contextually similar to the query, not just textually.
const embedding = await openai.embeddings.create({
encoding_format: 'float',
input: query,
model: 'text-embedding-ada-002',
});
const vector = embedding?.data?.[0].embedding;
const similar = await env.VECTORIZE_INDEX.query(vector, {
topK: 10,
returnMetadata: true,
});
Second Brain Endpoints
I want to call these functions from all sorts of places (which I'll talk about in a subsequent post), which is why I chose a Cloudflare Worker to host it. Endpoints are exposed to allow me to post notes, query notes, and delete notes by their file name.
/vectors (POST)
Functionality: Handles the addition of new data.
Process: Receives content and a filename, splits the content into manageable chunks, converts these chunks into embeddings, and stores them in the vector database.
/vectors/delete_by_filename (POST)
Functionality: Allows for deletion of data based on filename.
Process: When provided with a filename, it removes all associated embeddings from the vector database, ensuring that outdated or unwanted data is not retained.
/vectors/query (POST)
Functionality: Handles querying for information.
Process: Accepts a query, converts it into an embedding, and performs a similarity search in the vector database. It retrieves the most contextually relevant information based on the query.
Application in the Second Brain
In the context of the second brain, this technology astounds me with its capability. When you ask a question or make a query, the system doesn’t just retrieve direct matches - it understands the context and essence of your query. It then uses similarity search to find and provide information that's contextually relevant. This means your interactions with the AI are more intuitive and insightful, as it brings forward information based on semantic understanding, not just keyword matching.
It's like ChatGPT, but tailored to you.
I'm finding it useful for all sorts of things, like upcoming deadlines, planning weekly tasks, etc. And I'm sure I'll find a lot more as it grows.
To get set up with Cloudflare Vectorize, follow their docs here: https://developers.cloudflare.com/vectorize/
Code
Here's the full worker code. I've deliberately kept it quite raw - not done any heavy refactoring - because I wanted to show exactly what is required without hiding any details. I think you'll be surprised at how simple it is.
Note that I added a little security by just ensuring I have a locally-defined key in the request headers. This is only a personal project, and will keep the hoards out for now while I work on it.
import OpenAI from "openai";
import { splitFileIntoDocuments } from "./text-splitter";
export interface Env {
NOTES_AI_KV: KVNamespace;
NOTES_AI_API_KEY: string;
OPENAI_API_KEY: string;
VECTORIZE_INDEX: VectorizeIndex;
}
const DEFAULT_MODEL = 'gpt-3.5-turbo-1106';
export default {
async fetch(request: Request, env: Env): Promise<Response> {
if (request.headers.get('NOTES_AI_API_KEY') !== env.NOTES_AI_API_KEY) {
return new Response('Unauthorized', { status: 401 });
}
const openai = new OpenAI({
apiKey: env.OPENAI_API_KEY,
});
if (request.url.endsWith('/vectors') && request.method === 'POST') {
const body = (await request.json() as { content: string; filename: string; });
if (!body?.content || !body?.filename) {
return new Response('Missing content or filename', { status: 400 });
}
const { content, filename } = body;
const documents = await splitFileIntoDocuments(content, filename);
if (!documents.length) {
return new Response('No content found', { status: 400 });
}
const timestamp = Date.now();
let successful = true;
const embeddings = new Set<{ content: string, id: string, vector: number[] }>();
for (const [i, document] of documents.entries()) {
try {
const embedding = await openai.embeddings.create({
encoding_format: 'float',
input: document.pageContent,
model: 'text-embedding-ada-002',
});
const vector = embedding?.data?.[0].embedding;
if (!vector?.length) {
successful = false;
break;
}
embeddings.add({
content: document.pageContent,
id: `${filename}-${i}`,
vector,
});
} catch (e) {
successful = false;
break;
}
}
if (successful === false) {
return new Response('Could not create embeddings', { status: 500 });
}
// If there are existing embeddings for this file, delete them
deleteByFilename(filename, env);
for (const embedding of embeddings) {
await env.VECTORIZE_INDEX.insert([{
id: embedding.id,
values: embedding.vector,
metadata: {
filename,
timestamp,
content: embedding.content,
},
}]);
}
const embeddingsArray = [...embeddings];
await env.NOTES_AI_KV.put(filename, JSON.stringify(embeddingsArray.map(embedding => embedding.id)));
return new Response(JSON.stringify({
embeddings: embeddingsArray.map(embedding => ({
filename,
timestamp,
id: embedding.id,
})),
}), { status: 200 });
}
if (request.url.endsWith('/vectors/delete_by_filename') && request.method === 'POST') {
const body = (await request.json() as { filename: string });
if (!body?.filename) {
return new Response('Missing filename', { status: 400 });
}
const { filename } = body;
const deleted = await deleteByFilename(filename, env);
new Response(JSON.stringify({
deleted,
}), { status: 200 });
}
if (request.url.endsWith('/vectors/query') && request.method === 'POST') {
const body = (await request.json() as { model: string; query: string });
if (!body?.query) {
return new Response('Missing query', { status: 400 });
}
const { model = DEFAULT_MODEL, query } = body;
const embedding = await openai.embeddings.create({
encoding_format: 'float',
input: query,
model: 'text-embedding-ada-002',
});
const vector = embedding?.data?.[0].embedding;
if (!vector?.length) {
return new Response('Could not create embedding', { status: 500 });
}
const similar = await env.VECTORIZE_INDEX.query(vector, {
topK: 10,
returnMetadata: true,
});
const context = similar.matches.map((match) => `
Similarity: ${match.score}
Content:\n${(match as any).vector.metadata.content as string}
`).join('\n\n');
const prompt = `You are my second brain. You have access to things like my notes, meeting notes, some appointments.
In fact you're like a CEO's personal assistant (to me), who also happens to know everything that goes on inside their head.
Your job is to help me be more productive, and to help me make better decisions.
Use the following pieces of context to answer the question at the end.
If you really don't know the answer, just say that you don't know, don't try to make up an answer. But do try to give any
information that you think might be relevant.
----------------
${context}
----------------
Question:
${query}`;
try {
const chatCompletion = await openai.chat.completions.create({
model,
messages: [{ role: 'user', content: prompt }],
});
const response = chatCompletion.choices[0].message;
return new Response(JSON.stringify({
prompt,
response,
}), { status: 200 });
} catch (e) {
return new Response('Could not create completion', { status: 500 });
}
}
return new Response('Not found', { status: 404 });
},
};
async function deleteByFilename(filename: string, env: Env) {
// If there are existing embeddings for this file, delete them
const existingIds: string[] = JSON.parse((await env.NOTES_AI_KV.get(filename)) ?? '') ?? [];
if (existingIds.length) {
await env.VECTORIZE_INDEX.deleteByIds(existingIds);
}
return existingIds;
}
Top comments (6)
Could you please share some examples and, especially, the use cases? The latter are the most interesting ones, even if some of them could be so important only to you 😊
Sure thing! One example here is I'm starting a new job, and I wanted it to give me some book recommendations. I have a large repository of book titles that I've collected over the last few years.
Prompt:
Response:
I can also prompt it to summarise my upcoming meetings, and do things like give me checklists of things I need to do to prepare for the week.
One interesting thing that I'm working on is running a prompt on a daily cronjob to summarise what I need to be doing that day, and have it email me with the response.
I'm honestly only scratching the surface of this, and as my notes repository (Obsidian) grows, it will become more and more useful.
Any suggestions for ways it could help me are most welcome!
Thanks again for the input, I've added a "Use Cases" section at the top.
I like the direction! I will follow how you evolve your usages with care!!!
And thanks for sharing, of course ♥️
Nice article, touches on some if the things I've been working, and more importantly, the second brain I've been planning on setting up.
I've been struggling with some other aspects though centered around the actual capturing of data/notes. I've been out of the habit of keeping notes regularly for some years now, and even when I did they were never well organized and just ... a hot mess. While one of the points and strengths of relative distance in latent space is not having to worry about finding things or a poor organizational structure, i find myself obsessed with the idea of leveraging AI to also help me organize my personal knowledge base in to a nice human digestible collection. Perhaps if nothing more than to look pretty or to uphold tradition, I can't seem to get beyond this idea.
Basically, being able to lazily jot down a thought and then have a system where the LLM would tag and file away the thought in the appropriate place, or even if it has to be periodically done on a collection wide basis ... this idea is super appealing to me. Let me know if you have any thoughts.
I guess my struggle comes from not being able yo organize and structure my notes manually... how can I expect to instruct an LLM to do it if I can't? The actual process, well at least one or two approaches (one being an outline + divide and conquer using some sort of agent orchestrator over the corpus .... similar to @swyx's approach with smol developer) are straight forward if I can manage to put the task in to words better. > fear, "organize this shit", will not suffice.
Back to the article, I was surprised you didn't mention cloudflare workers inference or whatever they are calling it for the embeddings though. Perhaps a version 2 utilizing AI workers for the embeddings (bge-base model performs better in my experience than openai's ada anyway) that could be a fun follow up. I'm sure @cloudflaredev would appreciate the content :)
Cheers,
Fielding
Hey Fielding, thanks very much for your thoughts. To address your last point first, yes I could have used Cloudflare AI - the main reason I didn't is because I had thought that ada is a little better than the Cloudflare model, and because I'm using it for the main prompt too, it was just easier. However, it would be nice to do an all Cloudflare version!
For the note-taking, I use Obsidian. I struggle with the same thing as you and I've found the best way is simply to avoid trying. I just write notes and give them some sort of title, then send it to the AI.
I have also recently made an addition to this system where I get the AI to tag it (e.g. up to 5 tags), which kind of works, but I'm considering this upgrade:
What do you think of such a system?