OpenAI released a new version of their Assistants API and made some really neat upgrades to retrieval.
https://platform.openai.com/docs/assistants/whats-new
Now we can ingest up to 10,000 files per assistant. Is that enough for you?
My favorite part is where they explain how the new file_search tool works:
How File Search Works
- Rewrites user queries to optimize them for search.
- Breaks down complex user queries into multiple searches it can run in parallel.
- Runs both keyword and semantic searches across both assistant and thread vector stores.
- Reranks search results to pick the most relevant ones before generating the final response.
⠀By default, the file_search tool uses the following settings:
- Chunk size: 800 tokens
- Chunk overlap: 400 tokens
- Embedding model: text-embedding-3-large at 256 dimensions
- Maximum number of chunks added to context: 20 (could be fewer) https://platform.openai.com/docs/assistants/tools/file-search/how-it-works
https://platform.openai.com/docs/assistants/tools/file-search/how-it-works
This is stinking awesome because the AI agent I architected a month ago for a client is satisfyingly similar. What took me over 120 hours and 600+ lines of LangChain, you get for free when you build your solution with Assistant API.
A few differences I find interesting. The number of chunks added to context is a lot higher and the overlap is higher. I was targeting around 5 chunks and only a 20% overlap. I suspect this is for two reasons, 1) I have not added reranking yet. 2) If I add up the chunks between the parallel RAGs, I am getting close to 20 total chunks. OpenAI did not specify if they are talking per RAG or total.
The large overlap in the chunks likely helps this solution work for a broader set of use cases. Which is exactly what OpenAI is targeting here. If you are building this all custom, you can tune each piece of this to your specific use case.
One clear limitation is that I chose to create separate vector databases because my client had two sets of data that were for very different purposes, however you can only configure an assistant with a single Vector Store.
Custom LangChain or Assistant API?
One aspect of building with OpenAI tools that fascinates me is the tradeoff between:
Custom LangChain : customization, dependability, privacy
Assistant API : limited customization, faster development, and rising with the tide.
What do I mean by rising with the tide? As OpenAI improves Assistant API, your solution will get better for free. You might need to update a few things to take advantage of the latest and greatest, but man is that a lot easier than potentially trying to replicate the new feature in your custom solution. When and for whom does this tradeoff make sense?
Top comments (0)