Emmanuel Onwuegbusi

Posted on Sep 24, 2023

Build your own toxic comment detector API

#webdev #machinelearning #fastapi #python

In this article, I will show you how to build a toxic comment detector API using FastAPI.

From the image below, you will see the API responded to the text trash stuff with a response of toxic comment and the degree to which it is a toxic comment.

Introduction
Get model and Vectorizer
Setup fastapi app
Add model and vectorizer
Test app
Conclusion

Introduction

Toxicity is anything rude, disrespectful, or otherwise likely to make someone leave a discussion. We will use Fastapi to build the API.

See how the API works:

When I enter a comment, for Example: "trash stuff" as seen below:

The API will detect whether "trash stuff" is a toxic comment. From the prediction below, "trash stuff" is definitely a toxic comment.

Get model and Vectorizer

Download the trained model and the vectorizer which will help to transform words into numerical vectors from here:
https://www.kaggle.com/code/emmamichael101/toxicity-bias-logistic-regression-tfidfvectorizer/data?scriptVersionId=113780035

You can also get it from the project repo: https://github.com/emmakodes/fastapi-toxic-comment-detector

We are using a Logistic Regression model.

Setup fastapi app

Create a new folder and open it with your code editor
Run the following command on your terminal to create a new virtual environment called .venv

python -m venv .venv

Activate the virtual environment

.venv\Scripts\activate

Run the following command to install fastapi, pytest, scikit-learn

pip install "fastapi[all]" pytest scikit-learn

Create a new directory called app. Inside the app directory create an init.py, main.py, schemas.py, test_main.py, dependencies.py files, and a routers directory. Your directory should be as below:

main.py: add the following code to main.py:

from fastapi import Depends, FastAPI

from .routers import comments

app = FastAPI()


app.include_router(comments.router)

The above code sets up a FastAPI web application with some initial configurations. It creates an instance of the FastAPI class and includes a router that defines routes related to comments.

schemas.py: add the following code to schemas.py file:

from pydantic import BaseModel


class Comment(BaseModel):
    comments: str

The Comment model above is used to define the structure of comments, specifically that they should be represented as strings.

dependencies.py add the following code to dependencies.py file:

import pickle
from . import schemas


def check_toxicity(comment: schemas.Comment):
    comment_dictionary = {}
    comment_dictionary['comment_key'] = [comment.comments]

    # load the vectorizer and vectorize comment
    vectorizer = pickle.load(open("./Vectorize.pickle", "rb"))
    testing = vectorizer.transform(comment_dictionary['comment_key'])

    # load the model
    with open('./Pickle_LR_Model.pkl', 'rb') as file:
        lr_model = pickle.load(file)

    # predict toxicity. prediction range from 0.0 to 1.0 (0.0 = non-toxic and 1.0 toxic)
    prediction = lr_model.predict_proba(testing)[:, 1]
    prediction = float(prediction)

    if prediction >= 0.9 and prediction <= 1.0:
        response_list = ["toxic comment", prediction]
        return {"response": response_list}

    elif prediction >= 0.0 and prediction <= 0.1:
        response_list = ["non toxic comment", prediction]
        return {"response": response_list}

    else:
        response_list = ["Manually check this", prediction]
        return {"response": response_list}

check_toxicity function receives a comment, loads the vectorizer to transform the comment to numerical values, and then passes the numerical values to the model to predict the toxicity of the comment. The model will return any prediction value from 0.0 to 1.0.

Prediction at exactly 0.0 signifies a strongly nontoxic comment while prediction at 1.0 signifies a strongly toxic comment.

For the API, I decided to call predictions greater than or equal to 0.9 and less than or equal to 1.0 toxic comments since the model is pretty sure at these instances, predictions greater than or equal to 0.0 and less than or equal to 0.1 nontoxic comments and regarded every prediction from 0.2 to 0.8 neither toxic nor non-toxic comments.

- test_main.py: Add the following code to test_main.py file:

from fastapi.testclient import TestClient

from .main import app

client = TestClient(app)


def test_check_toxicity():
    response = client.post("/comments/")
    response = client.post(
        "/comments/",
        json={"comments": "string"},
    )
    assert response.status_code == 200


def test_check_toxicity_wrong_datatype():
    response = client.post(
        "/comments/",
        json={"comments": 20},
    )
    assert response.status_code == 422
    assert response.json() == {
                                "detail": [
                                    {
                                        "type": "string_type",
                                        "loc": [
                                            "body",
                                            "comments"
                                        ],
                                        "msg": "Input should be a valid string",
                                        "input": 20,
                                        "url": "https://errors.pydantic.dev/2.3/v/string_type"
                                    }
                                ]
                            }

These test cases use the TestClient to simulate HTTP requests to the FastAPI application and then assert that the responses match the expected behavior. They help ensure that the endpoint for checking the toxicity of comments handles various scenarios correctly, including incorrect data types and missing data.

In routers directory, create an init.py file, and comments.py file. Inside comments.py file, add the following code:

from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, status
from ..dependencies import check_toxicity



router = APIRouter(
    prefix="/comments",
    tags=["comments"],
)


@router.post('/')
async def check_toxic_comment(toxicity_check: Annotated[dict, Depends(check_toxicity)]):
    return toxicity_check

The code above defines a FastAPI router (router) that handles HTTP POST requests to "/comments/" by using the check_toxicity dependency to validate and check the toxicity of the incoming data. The endpoint itself simply returns if the comment is toxic or not. The use of annotations and dependencies allows for modular and reusable code, as well as a clear separation of concerns.

Add model and vectorizer

Add the downloaded model and vectorizer to the project root directory.

Also, you can create a .gitignore file and add the following:

__pycache__/
.venv/
.pytest_cache

Generate a requirements.txt file using the following command:

pip freeze > requirements.txt

The final project directory should be as follows:

Test app

Now, you can test the project.

Open Postman
Add a new request
change method to POST
enter the follownig url http://127.0.0.1:8000/comments/
Select Body, raw and JSON
Enter the following JSON comment for example and click Send:

{
  "comments": "trash stuff"
}

The API should respond with the following to show that trash stuff is a toxic comment:

{
    "response": [
        "toxic comment",
        0.9616073904446953
    ]
}

Conclusion

Basically, when you send a JSON comment, the fastapi application takes the comment and calls check_toxic_comment function which calls check_toxicity dependency that predicts if a comment is toxic or not and returns the value.

You can get the complete code here: https://github.com/emmakodes/fastapi-toxic-comment-detector

Top comments (2)

Jon Randy 🎖️ • Sep 25 '23 • Edited

Why exactly is 'trash stuff' considered a toxic comment? Devoid of context, it's impossible to make that judgement.

> Alan: My job is 'refuse collector'
    |
    > Barbara: Sounds interesting, what kind of thing does that entail?
        |
        > Alan: Trash stuff

This kind of detector should only really be used to flag comments that would then be passed to a human for review... but even a system like that would not be foolproof due to the fact that the 'toxicity' of any comment is subjective.

A reliable toxic comment detector would be a very difficult thing to build indeed. It would need to take into account: context, regional dialects/slang/idioms, sarcasm, jokes, etc.

Emmanuel Onwuegbusi • Sep 25 '23

Yes, @jonrandy It is right to also have a human act as a second reviewer for this system so as to improve the system over time and indeed you are right. We can think of it as a system to help owners know comments to pay more attention to.

DEV Community

Build your own toxic comment detector API

Table of Contents

Introduction

Get model and Vectorizer

Setup fastapi app

Add model and vectorizer

Test app

Conclusion

Top comments (2)

Read next

The Limitations of Machine Learning: What We Still Can't Teach Machines

Predicting House Rent with Linear Regression in Python

Design Patterns: Your Secret Weapon in Software Engineering

.NET Development and Localization for JustAnswer – case study