End to end LLMOps Pipeline - Part 2 - FastAPI

#llm #rag #fastapi #llmops

Welcome to Day 2. Yesterday, we explored Hugging Face, a leading platform in Natural Language Processing (NLP), which simplifies the process of building and deploying state-of-the-art machine learning models. Today, we will build on the code we wrote yesterday and integrate it with FastAPI.

✅ What is FastAPI?
FastAPI is designed to create robust, fast, and secure APIs with minimal effort. It leverages Python type hints to enable features like automatic generation of interactive API documentation, which is a significant advantage for both development and user experience. Whether you're a beginner or an experienced developer, FastAPI offers tools that streamline API development, from easy parameter validation to detailed error messages.

✅ Key Features:
✔️ Ease of Use: FastAPI simplifies API development by providing automatic interactive documentation via Swagger UI and ReDoc. This interactive interface not only makes it easier to understand and test your API but also enhances collaboration between developers and users.
✔️ Type Hints: FastAPI heavily relies on Python type hints, which improve code quality, readability, and development speed. Type hints also enable powerful IDE support, such as inline errors and code completion, making the coding process smoother and more error-free.
✔️ Performance: Known for being one of the fastest Python frameworks, FastAPI achieves remarkable performance by using Starlette and Pydantic, which ensure that your web applications are both scalable and efficient.
✔️ Async Support: FastAPI natively supports asynchronous programming, making it ideal for building high-performance applications that can handle numerous simultaneous users. This is a crucial feature for modern, scalable web applications.

✅ Getting Started with FastAPI
To begin using FastAPI, you need to set up a few prerequisites:

Prerequisites:

Python 3.6 or higher
FastAPI library
Uvicorn for running the server
Transformers library by Hugging Face
Pydantic for data validation

✅ Installation
You can install the necessary libraries using pip:

pip install fastapi uvicorn transformers pydantic`

Note: While manual installation via pip is demonstrated here, it's recommended to create a requirements.txt file and install these dependencies via GitHub Action for a more streamlined and reproducible setup.

✅ Importing Necessary Libraries
Let's start by importing the required libraries:

from fastapi import FastAPI, HTTPException 
from pydantic import BaseModel 
from transformers import pipeline import uvicorn

FastAPI and HTTPException from FastAPI are used to build the API and handle exceptions.
BaseModel from Pydantic is used to define request and response data models.
pipeline from Transformers initializes the Hugging Face question-answering model.
uvicorn is used to run the FastAPI server.

✅ Creating the FastAPI Application
The core of your API is initialized as follows:

app = FastAPI()
Initializing the Question-Answering Pipeline
We use the Hugging Face Transformers library to initialize a question-answering model:
qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")

This pipeline will handle the question-answering task by leveraging the distilbert-base-uncased-distilled-squad model from Hugging Face's model hub.

✅ Defining Data Models
Pydantic models are defined to structure the request and response data:

class ChatRequest(BaseModel): 
     question: str 
     context: str 
class ChatResponse(BaseModel): 
    answer: str

ChatRequest expects two fields: question (the question to be answered) and context (the context in which to search for the answer).
ChatResponse contains a single field: answer, which holds the model's answer.

✅ Creating the /chat Endpoint
Here's how to define an endpoint for the chat functionality:

@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
    try:
        result = qa_pipeline(question=request.question, context=request.context)
        return ChatResponse(answer=result['answer'])
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

The @app.post("/chat") decorator creates a POST endpoint at /chat.
The chat function takes a ChatRequest object, uses the qa_pipeline to find the answer, and returns it as a ChatResponse.
If any error occurs, an HTTP 500 error is raised with the corresponding exception message.

✅ Running the Server
To start the FastAPI server, use the following script:

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

✅ Conclusion
FastAPI is an excellent choice for developers looking to build fast, modern, and efficient APIs with Python. Its native support for asynchronous programming, automatic documentation generation, and ease of use make it a standout framework. Whether you're building a small application or a large-scale project, FastAPI provides the tools and features needed to create a robust API effortlessly.

📚 If you'd like to learn more about this topic, please check out my book. Building an LLMOps Pipeline Using Hugging Face

DEV Community

End to end LLMOps Pipeline - Part 2 - FastAPI

Top comments (0)

Read next

PDF chat with source highlights

Top 10 Real-World Applications of Artificial Intelligence to Watch in 2025

Day 27: Regularization Techniques for Large Language Models (LLMs)

Building an Intelligent Customer Service Agent System from Scratch