Jay Bhavsar

Posted on Dec 15, 2023 • Edited on Jan 17

Scaling Cheer: A Deep Dive into the Architecture of Red Jingles AI Chatbot

#ai #architecture #serverless #aws

A state-of-the-art holiday theme-based AI chatbot focusing on scalability, extensibility and real world applicability.

Official submission to Serverless Holiday Hackathon 2023 hosted by Serverless Guru

Checkout the hosted app, Santa and the team waiting for you here!!

Also check out GitHub Repo

Submission by:

Jay Bhavsar and Riya Bhavsar

1. Purpose and Objectives

The world has witnessed AI's potential thanks to platforms like OpenAI. It's crucial that AI isn't just limited to tech experts but accessible to everyone.

Red Jingles is our attempt to create a chatbot backed by LLM with a Christmas theme featuring Santa Claus, Snowman, and Elf.

Beyond the initial fun, this interaction is more than just choosing a character. Users feel engaged, empowered, and heard when their preferences are considered instantly. This transforms the interaction from machine to virtual character, making it feel like talking to a friend.

Red Jingles offers a unique, joyful AI conversation experience that doesn't feel like traditional AI. It makes interactions more personal, engaging, and fun, creating a deeper connection.

2. Functionalities and features

Authentication: Users sign up using their email, which undergoes verification. Subsequently, they can log in using their verified email and password for application access. We have a strong password policy in place to have a minimum 8 chars and mix of uppercase, lowercase, symbols and numerics.
Selecting Christmas themed characters: Users enjoy a unique and fun aspect by selecting one of three Christmas characters — Santa Claus, Snowman or Elf. Each character has a unique personality and charm.
Have a conversation with AI, ask anything: Centered around the holiday theme, common inquiries may revolve around gift suggestions, recipes, dinner plans, home decor, travel tips, and more, though there is no limitation for any genre. AI can access the internet to get up to date and accurate information.
Contextual memory: Temporary memory assists the AI in quicker responses and better understanding during extended and deeper conversations.
Persistent storage: Users benefit from permanent storage, enabling them to refer back to previous conversations with the AI for valuable insights at any time. Users can also resume any previous conversation.
API for developers: We have exposed REST APIs as a separate service. It is also secured with the same auth mechanism as the chat app. It follows OpenAPI standard, and has a well documented interactive UI.

3. High Level Design

3.1 Architecture Diagram

NOTE: Chainlit Cloud has since been migrated to literal

3.2 Technical Components in Detail

3.2.1 AWS Cognito

Constructing a robust authentication system is a huge and challenging task. Recognizing that, we opted for a fully managed and serverless authentication service—Amazon’s AWS Cognito.

Configuration:

Authentication with OAuth2 protocol
Email as primary attribute
Required email verification for successful signup
Strong password policy to have minimum 7 characters including lowercase, uppercase, numeric and special characters

3.2.2 Google App Engine

We have deployed 2 GAE services.

Chainlit+Langchain application is containerized and runs on custom runtime and flexible environment.
The other service runs on standard environment and python runtime which hosts LangServe API and documentation

3.2.3 Memorystore (Redis)

Memorystore offers a fully managed serverless redis database. Since redis is a fast, in-memory database, we use it to store conversation context for active chats.

3.2.4 Chainlit Cloud

Chainlit cloud is a fully managed service to store users' chat history. Behind the scenes it uses SQL databases.

3.2.5 OpenAI

OpenAI hosts various LLMs and offers a robust API for their models. We use these LLMs to power our app.

3.3 Scalability

Amazon Cognito

By default, Cognito accommodates up to 40 million users. Specifically, it enables 120 Requests per Second (RPS) for user sign-ins and 50 RPS for user sign-ups (Source: Quotas in Amazon Cognito - Amazon Cognito).

Google App Engine (GAE)

GAE is capable of automatically scaling up to an impressive 2,147,483,647 instances. Its most potent instance type, F4_1G, boasts 3072 MB of memory and a CPU clocked at 2.4 GHz. This scalability equates to serving millions of active users in real-world scenarios.

Google Memorystore

Offering a default quota of 1 terabyte (TB) of storage per region, Memorystore for Redis becomes instrumental. Assuming an average conversation occupies 1 megabyte (MB), this allocation allows support for up to 1 million active conversations per region (Source: Quotas and limits | Memorystore for Redis | Google Cloud).

OpenAI API

With its default quota, the OpenAI API facilitates up to 10,000 Requests per Minute (RPM) (Source: Rate limits - OpenAI API).

Chainlit Cloud

Despite the absence of explicitly mentioned quotas and limits on their website, Chainlit Cloud presents a managed enterprise solution. Accurate estimations can be derived by referencing a self-hosting guide.

3.4 CI/CD Diagram

3.5 CI/CD components in detail

Git

Git serves as a robust version control system, enabling effective codebase management. The code is open source and hosted on the widely used platform GitHub.

GitHub Actions

This feature automates the processes of building and deploying code. When code is pushed, GitHub Actions triggers a sequence of actions. Firstly, it builds the Docker image and subsequently pushes it to Docker Hub. Following the successful completion of the initial action, a second action is triggered. This action executes the "terraform apply" command, implementing any alterations made to the infrastructure.

Docker Hub

The chatbot application is containerized and publicly hosted on Docker Hub, a repository for Docker container images.

Terraform

Terraform operates as an infrastructure-as-code tool, facilitating seamless integration across various platforms, including AWS Cognito, Google App Engine, and Google Memorystore Redis, among others. Google Cloud Storage buckets are utilized to store Terraform state files, ensuring efficient management and tracking of infrastructure changes.

3.6 Design choices summary

Ideation thoughts for choosing a particular service:

AWS Cognito

Chosen over alternatives like Auth0, Okta, and Firebase due to its customizable hosted UI, which provided a straightforward and seamless configuration process.

Google App Engine (GAE)

Selected because of its minimal configuration requirements and excellent scalability. While other serverless solutions like ECS and Cloud Run could be viable, GAE's simplicity and scalability were determining factors.

LangChain

Its open-source nature, large active community, well-maintained documentation, and comprehensive range of tools available out of the box made it extremely user-friendly, especially for newcomers.

Chainlit

Stood out for providing a ready-to-use solution along with infrastructure support, making it a convenient choice for the project's needs.

Google Memorystore Redis

Chosen for its seamless compatibility with applications running on Google Cloud, full management, and serverless functionality. Although alternatives like Redis Cloud and ElastiCache were considered, its integration with GAE was a decisive factor.

OpenAI

Previous positive experience with their APIs led to its selection. While alternatives like AWS Bedrock and Huggingface were viable, the ease of interchangeability in the project's design made it a suitable choice.

Docker

Its ability to establish loose coupling and isolation between application and infrastructure requirements, along with its containerization capabilities facilitating running applications on any host, made it a preferred choice.

Terraform

Highly efficient in supporting Infrastructure as Code (IAC), enabling streamlined infrastructure management and deployment processes.

Acknowledging the evolving nature of technology, we as a team remain always open to future learnings and potential adaptations to emerging tools and services.

3.7 Design Challenges

3.7.1 Abrupt Disconnection of Service

Scenario: During chat interactions, users encountered an abrupt "Service not available" error after initiating conversation, disrupting the flow of dialogue.
Analysis: The application deployed on Google App Engine's flexible environment utilizes VM instances equipped with a front-end load balancer responsible for load distribution. When a user initiates a chat, the request is directed to a specific instance (I1). Subsequent queries might be routed to a different instance (I2), leading to a loss of the previously established webhooks session.
Fix: Set session affinity to true to ensure the load balancer consistently directs requests from the same session to the same origin server.

3.7.2 Configuring AWS Cognito for Chainlit Authentication

Scenario: Chainlit’s limited support for OAuth platforms lacked compatibility with AWS Cognito, creating authentication challenges.
Fix: We duplicated and customized files sourced from the Chainlit library. Subsequently, modifications were made to the library during the Docker image building process. Our intention is to initiate a Pull Request (PR) on the Chainlit repository to introduce support for AWS Cognito. This implementation serves as a valuable resource for individuals aiming to integrate AWS Cognito with Chainlit, offering a clear reference point for such integration.

NOTE: AWS Cognito is now officially supported in Chainlit, and this workaround is not needed anymore

4. Low Level Design

4.1 Directory Structure

There are two primary directories within the project: 'app' and 'terraform.'

The 'app' directory houses all the necessary code and configurations required to execute the application. Meanwhile, the 'infra' directory contains Terraform code, enabling the creation of infrastructure components on the cloud platform.

4.2 Python files

Constants.py: This file serves as a central repository defining the AI characters—Santa, Snowman, and Elf. It includes their names, display names, image thumbnails, system prompts, welcome messages, and other associated attributes essential for their representation within the application.
Agent.py: Responsible for defining and configuring the Langchain agent. It specifically initializes the LLM (Language Model) and allows for customization of the LLM model if needed. Additionally, it provides the capability to configure and incorporate additional tools into the agent.
app.py: Functions as the primary entry point for the Chainlit application. It relies on 'constants.py' and 'agent.py' as its core dependencies. Within this file, all the necessary callbacks for managing conversation handling are defined and implemented.
api_server.py: Serves as the main entry point for Langserve. This file is built using FastAPI and exposes a REST API specifically designed for the agent, facilitating communication and interaction with the AI functionalities provided by the system.

4.3 Code Quality

All the python code follows the PEP-8 styling guide.
Terraform code follows the HCL style conventions.

5. Extensions and Real World Applicability

Enable Multi-Modal Support: Enhance the capabilities of the system to incorporate a wider range of inputs and outputs, expanding beyond text-only interactions. This enhancement aims to enable the AI agent to process and generate responses using various modalities, such as images and documents, in addition to textual inputs and outputs.
Expand Agent Functionality with Additional Tools: Extend the functionality of the AI agent by integrating supplementary tools, including but not limited to email, Slack, Jira, etc. This enhancement aims to configure the agent to access and utilize various personal and productivity tools, allowing for more comprehensive and versatile interactions and task execution within the system.

6. Running on Local

See instruction on GitHub

7. Links and Contact Details

Live Red Jingles Chatbot Application:

https://red-jingles-zo5w7qkf4a-ue.a.run.app/
Red Jingles API Server:

https://red-jingles.ue.r.appspot.com/docs
GitHub:

jbhv12/red-jingles (github.com)
Docker Hub:

jbhv12/red-jingles general | Docker Hub
Video Demo:

Quick Application Walkthrough

Detailed Explaination

Developers Contact Details:

Jay Bhavsar
- Email: jbhv12@gmail.com
- LinkedIn: https://www.linkedin.com/in/jbhv12/
Riya Bhavsar
- Email: riyashah0895@gmail.com
- LinkedIn: https://www.linkedin.com/in/riya-bhavsar-52237b117/

Top comments (1)

Jay Bhavsar • Jan 17

This submission got me 2nd place in the hackathon!

1. Purpose and Objectives

2. Functionalities and features

3. High Level Design

3.1 Architecture Diagram

3.2 Technical Components in Detail

3.2.1 AWS Cognito

3.2.2 Google App Engine

3.2.3 Memorystore (Redis)

3.2.4 Chainlit Cloud

3.2.5 OpenAI

3.3 Scalability

Amazon Cognito

Google App Engine (GAE)

Google Memorystore

OpenAI API

Chainlit Cloud

3.4 CI/CD Diagram

3.5 CI/CD components in detail

Git

GitHub Actions

Docker Hub

Terraform

3.6 Design choices summary

AWS Cognito

Google App Engine (GAE)

LangChain

Chainlit

Google Memorystore Redis

OpenAI

Docker

Terraform

3.7 Design Challenges

3.7.1 Abrupt Disconnection of Service

3.7.2 Configuring AWS Cognito for Chainlit Authentication

4. Low Level Design

4.1 Directory Structure

4.2 Python files

4.3 Code Quality

5. Extensions and Real World Applicability

6. Running on Local

7. Links and Contact Details

Read next

Self Writing Lang Graph State

A Practical Guide to Reducing LLM Hallucinations with Sandboxed Code Interpreter

The Limitations of Machine Learning: What We Still Can't Teach Machines

.NET Development and Localization for JustAnswer – case study