DEV Community

Emmanuel Onwuegbusi
Emmanuel Onwuegbusi

Posted on

Build your own toxic comment detector API

In this article, I will show you how to build a toxic comment detector API using FastAPI.

From the image below, you will see the API responded to the text trash stuff with a response of toxic comment and the degree to which it is a toxic comment.

how the api works

Table of Contents

  • Introduction
  • Get model and Vectorizer
  • Setup fastapi app
  • Add model and vectorizer
  • Test app
  • Conclusion

Introduction

Toxicity is anything rude, disrespectful, or otherwise likely to make someone leave a discussion. We will use Fastapi to build the API.

See how the API works:

When I enter a comment, for Example: "trash stuff" as seen below:

api comment

The API will detect whether "trash stuff" is a toxic comment. From the prediction below, "trash stuff" is definitely a toxic comment.

trash stuff prediction

Get model and Vectorizer

Download the trained model and the vectorizer which will help to transform words into numerical vectors from here:
https://www.kaggle.com/code/emmamichael101/toxicity-bias-logistic-regression-tfidfvectorizer/data?scriptVersionId=113780035

You can also get it from the project repo: https://github.com/emmakodes/fastapi-toxic-comment-detector

We are using a Logistic Regression model.

Setup fastapi app

  • Create a new folder and open it with your code editor

  • Run the following command on your terminal to create a new virtual environment called .venv

python -m venv .venv
Enter fullscreen mode Exit fullscreen mode
  • Activate the virtual environment
.venv\Scripts\activate
Enter fullscreen mode Exit fullscreen mode
  • Run the following command to install fastapi, pytest, scikit-learn
pip install "fastapi[all]" pytest scikit-learn
Enter fullscreen mode Exit fullscreen mode
  • Create a new directory called app. Inside the app directory create an init.py, main.py, schemas.py, test_main.py, dependencies.py files, and a routers directory. Your directory should be as below:

project directory 1

  • main.py: add the following code to main.py:
from fastapi import Depends, FastAPI

from .routers import comments

app = FastAPI()


app.include_router(comments.router)
Enter fullscreen mode Exit fullscreen mode

The above code sets up a FastAPI web application with some initial configurations. It creates an instance of the FastAPI class and includes a router that defines routes related to comments.

  • schemas.py: add the following code to schemas.py file:
from pydantic import BaseModel


class Comment(BaseModel):
    comments: str
Enter fullscreen mode Exit fullscreen mode

The Comment model above is used to define the structure of comments, specifically that they should be represented as strings.

  • dependencies.py add the following code to dependencies.py file:
import pickle
from . import schemas


def check_toxicity(comment: schemas.Comment):
    comment_dictionary = {}
    comment_dictionary['comment_key'] = [comment.comments]

    # load the vectorizer and vectorize comment
    vectorizer = pickle.load(open("./Vectorize.pickle", "rb"))
    testing = vectorizer.transform(comment_dictionary['comment_key'])

    # load the model
    with open('./Pickle_LR_Model.pkl', 'rb') as file:
        lr_model = pickle.load(file)

    # predict toxicity. prediction range from 0.0 to 1.0 (0.0 = non-toxic and 1.0 toxic)
    prediction = lr_model.predict_proba(testing)[:, 1]
    prediction = float(prediction)

    if prediction >= 0.9 and prediction <= 1.0:
        response_list = ["toxic comment", prediction]
        return {"response": response_list}

    elif prediction >= 0.0 and prediction <= 0.1:
        response_list = ["non toxic comment", prediction]
        return {"response": response_list}

    else:
        response_list = ["Manually check this", prediction]
        return {"response": response_list}
Enter fullscreen mode Exit fullscreen mode

check_toxicity function receives a comment, loads the vectorizer to transform the comment to numerical values, and then passes the numerical values to the model to predict the toxicity of the comment. The model will return any prediction value from 0.0 to 1.0.

Prediction at exactly 0.0 signifies a strongly nontoxic comment while prediction at 1.0 signifies a strongly toxic comment.

For the API, I decided to call predictions greater than or equal to 0.9 and less than or equal to 1.0 toxic comments since the model is pretty sure at these instances, predictions greater than or equal to 0.0 and less than or equal to 0.1 nontoxic comments and regarded every prediction from 0.2 to 0.8 neither toxic nor non-toxic comments.

- test_main.py: Add the following code to test_main.py file:

from fastapi.testclient import TestClient

from .main import app

client = TestClient(app)


def test_check_toxicity():
    response = client.post("/comments/")
    response = client.post(
        "/comments/",
        json={"comments": "string"},
    )
    assert response.status_code == 200


def test_check_toxicity_wrong_datatype():
    response = client.post(
        "/comments/",
        json={"comments": 20},
    )
    assert response.status_code == 422
    assert response.json() == {
                                "detail": [
                                    {
                                        "type": "string_type",
                                        "loc": [
                                            "body",
                                            "comments"
                                        ],
                                        "msg": "Input should be a valid string",
                                        "input": 20,
                                        "url": "https://errors.pydantic.dev/2.3/v/string_type"
                                    }
                                ]
                            }
Enter fullscreen mode Exit fullscreen mode

These test cases use the TestClient to simulate HTTP requests to the FastAPI application and then assert that the responses match the expected behavior. They help ensure that the endpoint for checking the toxicity of comments handles various scenarios correctly, including incorrect data types and missing data.

  • In routers directory, create an init.py file, and comments.py file. Inside comments.py file, add the following code:
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, status
from ..dependencies import check_toxicity



router = APIRouter(
    prefix="/comments",
    tags=["comments"],
)


@router.post('/')
async def check_toxic_comment(toxicity_check: Annotated[dict, Depends(check_toxicity)]):
    return toxicity_check
Enter fullscreen mode Exit fullscreen mode

The code above defines a FastAPI router (router) that handles HTTP POST requests to "/comments/" by using the check_toxicity dependency to validate and check the toxicity of the incoming data. The endpoint itself simply returns if the comment is toxic or not. The use of annotations and dependencies allows for modular and reusable code, as well as a clear separation of concerns.

Add model and vectorizer

Add the downloaded model and vectorizer to the project root directory.

  • Also, you can create a .gitignore file and add the following:
__pycache__/
.venv/
.pytest_cache
Enter fullscreen mode Exit fullscreen mode
  • Generate a requirements.txt file using the following command:
pip freeze > requirements.txt
Enter fullscreen mode Exit fullscreen mode

The final project directory should be as follows:

final project directory

Test app

Now, you can test the project.

  • Open Postman

  • Add a new request

  • change method to POST

  • enter the follownig url http://127.0.0.1:8000/comments/

  • Select Body, raw and JSON
    Enter the following JSON comment for example and click Send:

{
  "comments": "trash stuff"
}
Enter fullscreen mode Exit fullscreen mode

The API should respond with the following to show that trash stuff is a toxic comment:

{
    "response": [
        "toxic comment",
        0.9616073904446953
    ]
}
Enter fullscreen mode Exit fullscreen mode

fastapi app postman

Conclusion

Basically, when you send a JSON comment, the fastapi application takes the comment and calls check_toxic_comment function which calls check_toxicity dependency that predicts if a comment is toxic or not and returns the value.

You can get the complete code here: https://github.com/emmakodes/fastapi-toxic-comment-detector

Top comments (2)

Collapse
 
jonrandy profile image
Jon Randy 🎖️ • Edited

Why exactly is 'trash stuff' considered a toxic comment? Devoid of context, it's impossible to make that judgement.

> Alan: My job is 'refuse collector'
    |
    > Barbara: Sounds interesting, what kind of thing does that entail?
        |
        > Alan: Trash stuff
Enter fullscreen mode Exit fullscreen mode

This kind of detector should only really be used to flag comments that would then be passed to a human for review... but even a system like that would not be foolproof due to the fact that the 'toxicity' of any comment is subjective.

A reliable toxic comment detector would be a very difficult thing to build indeed. It would need to take into account: context, regional dialects/slang/idioms, sarcasm, jokes, etc.

Collapse
 
emmakodes_ profile image
Emmanuel Onwuegbusi

Yes, @jonrandy It is right to also have a human act as a second reviewer for this system so as to improve the system over time and indeed you are right. We can think of it as a system to help owners know comments to pay more attention to.