What does it take to go from an idea in a notebook to an application handling real-world traffic?
The Pulumi and Pinecone teams worked together to build a reference architecture that teaches you how to scale AI apps in production. We tested it with batches of 10k to 1M records. The reference architecture demonstrates microservices scaling, data processing pipelines, infrastructure segmentation through networking and security groups, and UI and database synchronizations.
On the infrastructure side, the reference architecture uses a Pinecone index as a vector store, a queue to fan out work, networking and security groups to separate the infrastructure, ECS services for the frontend and backend microservices, and autoscaling configured to expand the worker pool up and down elastically in response to system load.
On the application side, the reference architecture launches a semantic search interface over a Postgres database of product records, leveraging Pinecone’s vector database for queries and instant index updates.
The reference architecture is built with the best practices for how to use AWS and Pinecone. It is designed for production, and it is easily modifiable to fit most use cases.