Omolayo Victor

Posted on Dec 11, 2024

Deploying AI Models with Amazon Web Services: A Practical Guide

#hackathon #aws #ai

Introduction

This is my first rodeo on AI model development, and it has been an incredible learning journey. As part of Blackthorn’s just-concluded company-wide hackathon on AI and Agent Force, I embarked on a chosen project that involved deploying an AI model that generates a niche(event-related) image. This project offered an opportunity to gain in-depth knowledge about AI development, models, datasets, and the infrastructure required to support them. The results were both enlightening and rewarding, showcasing the power of modern AI and cloud technologies.

If you’re here for the source code, it’s available on GitHub: GitHub Repo

Why Control Your AI Infrastructure?

One of the core advantages of deploying our AI model on AWS was gaining complete control over data handling and retention. By hosting the model on our infrastructure, we were able to:

Maintain strict control over sensitive data, ensuring secure storage and retention policies.
Implement data TTL (time-to-live) mechanisms to comply with compliance requirements.
Tailor the environment for optimal performance, resource allocation, and cost efficiency.

This approach highlighted the importance of balancing privacy, performance, and scalability in AI solutions.

Choosing an AI Model

We are all familiar with popular AI tools like ChatGPT, Gemini, and Claude, which showcase the power of conversational AI. While browsing the vast ocean of datasets and models available on Hugging Face was tempting, we decided to focus on leveraging an open-source model for our hackathon project. This led us to explore Stable Diffusion—a remarkable latent text-to-image diffusion model.

Stable Diffusion (GitHub Repository)

Stable Diffusion stood out for its versatility as a latent text-to-image diffusion model pre-trained on a subset of the LAION-5B dataset. Some key features include:

Text Encoder: It uses a text encoder to condition the model on text prompts, enabling intuitive image generation from descriptions.
Resource Efficiency: Lightweight enough to run on GPUs with at least 10GB VRAM, making it accessible for medium-scale deployments.
Default Model: The model "CompVis/stable-diffusion-v1-4" is pre-trained and ready for adaptation, although other versions offer varying trade-offs in terms of fidelity and inference time.

Hugging Face (Hugging Face Hub)
played a significant role in this journey. As a leading platform for sharing pre-trained AI models and datasets, Hugging Face provided access to a wide range of resources. From discovering datasets to fine-tuning models, the platform proved invaluable for quickly iterating and adapting Stable Diffusion to our project’s needs.

Infrastructure on AWS

To host the AI model, we chose the Deep Learning OSS Nvidia Driver AMI (Amazon Linux 2) with the AMI ID ami-002a53be89c7bb5de. This decision was driven by the need for:

High GPU Performance: The AMI’s compatibility with Nvidia drivers ensures efficient usage of GPUs for model inference.
Flexibility with Docker: Using the stable-diffusion-docker repository (GitHub Repository), we adapted the model for containerized deployment.
Cost Efficiency: EC2’s on-demand pricing allowed us to scale resources as needed.

Additionally, we explored Amazon SageMaker for internal model training and deploying models directly within the AWS ecosystem. This service provided a seamless integration for training and inference, leveraging AWS’s robust infrastructure. Further explore AWS Batch to efficiently run AI tasks as jobs for batch processing, which are invaluable for handling workloads at scale.

Diving into Hugging Face

Hugging Face is a platform that provides a repository of pre-trained models, datasets, and tools for AI development. We used it to:

Discover Datasets: Identify relevant datasets for fine-tuning Stable Diffusion.
Create Custom Datasets: Curate and upload datasets with selective questions and answers, tailored to our project needs.
Train the Model: Fine-tune Stable Diffusion to align more closely with our domain-specific requirements.

Challenges and Solutions

The project wasn’t without hurdles. Some notable challenges and how we addressed them include:

API Gateway Timeout:

Problem: The default API Gateway timeout caused issues when EC2 took longer to generate images.
Solution: We implemented an S3-based placeholder system where:
- The AI-generated image was stored in an S3 bucket.
- A response was sent back to the client with a reference to the S3 location.
  - Alternative Approaches: Bidirectional communication with WebSockets, queues like SQS, or real-time protocols could have mitigated this issue further.

Fine-Tuning Stable Diffusion:

Problem: Achieving accurate and domain-specific image generation required additional fine-tuning.
Solution: Leveraged Hugging Face datasets to train the model with targeted data, iterating to improve outcomes.

Latency Optimization:

Problem: Initial inference times averaged 32 seconds per banner, which may not scale well for high-volume usage.
Solution: Optimized Docker configurations, utilized larger GPU instances during high-load periods, and explored model quantization.

Open Source Contribution

The entire infrastructure-as-code for this project has been made open source. The Terraform scripts used to create necessary AWS resources, pull the model, and set up datasets are available at the following repository: GitHub Repo

Lessons Learned

The project was a crash course in AI and cloud engineering. Key takeaways include:

Model Choice Matters: Different versions of Stable Diffusion offer varying benefits; understanding these trade-offs is essential.
Infrastructure Optimization: Balancing cost and performance is critical when scaling AI workloads.
System Design: Asynchronous processing with S3 helped circumvent API limitations, emphasizing the need for resilient architectures.
Collaboration Tools: Platforms like Hugging Face streamline model development and dataset curation.

Future Directions

For the POC, additional considerations include:

Scaling Infrastructure: Implement autoscaling to handle varying demand.
Real-Time Communication: Explore WebSocket-based communication for live updates.
Monitoring and Observability: Integrate CloudWatch to monitor GPU usage, latency, and system health.
Enhanced Security: Implement stricter IAM roles and encryption mechanisms for data in transit and at rest.

Conclusion

Deploying AI models with AWS provides unparalleled flexibility and control, making it an ideal choice for custom AI projects. This journey, from Stable Diffusion exploration to creating an optimized cloud-based infrastructure, has been both challenging and rewarding. The experience has laid a strong foundation for tackling future AI endeavors and scaling them to production-ready solutions.

As I look forward, I’m excited to continue exploring AI models, refining cloud-based architectures, and driving innovation in AI-powered solutions.

DEV Community