Nirmol is a Bangla/Bengali/Banglish offensive/bad/slang words Detection API available on GitHub as open source. Before Nirmal was created, there was no good API or any other solution to detect swearing or bad words in the Bengali language.
Motivation
Although such API is available for English or other popular languages, I could not find any kind of solution for the Bengali language even after searching a lot. I needed such a solution mainly for the platform I'm working with other developers to build.
In the platform, we are building, a Bengali-speaking person can fill in his/her data, and after publishing it can be seen by anyone on the internet. Many mischievous users use profanity which can cause considerable damage.
My job was to design the whole system properly and find some microservices-based solutions for minor problems. Finally, I was able to create a solution to this problem myself.
Design and Development
There can be several solutions to this problem like we can filter out Bengali words using artificial intelligence or we can create a collection where negative or bad words will be kept together to create a filter-out system.
Since our entire platform is already very complex initially we don't have much competition power. So in that case we could not agree on using artificial intelligence. Besides, none of the existing Bengali artificial intelligence can properly detect Bangla bad words or swearing.
I created a form and shared it with my friends and acquaintances through social media and collected a list of various commonly used Bengali bad words from them. Apart from that, I created a data set myself by combining different previously published datasets. It was not so easy to build my dataset.
Then I generated a JSON file and wrote an Express JS app that gets words or sentences and then checks if that is in the JSON file or not.
Installation
You can download the dataset from the GitHub repository but here is the Direct dataset link. You can download and use this dataset for ML and AI model training.
Nirmol API is based on:
- Node.js
- Express.js
npm package used
- body-parser
- cors
- fs
- nodemon
Run Nirmol locally
Step 1: Clone the Nirmol repository
git clone https://github.com/Sigmakib2/Nirmol.git
Step 2: Go to the Nirmol directory
cd Nirmol
Step 3: Install node modules
npm install
Step 4: Start the project
npm start
Then, open your web browser and navigate to http://localhost:3000, and you should see "Cannot GET /" displayed on the page. To test the API you have to enter something after the '/'. For example "http://localhost:3000/hello world"
API Response
The API endpoint analyzes a sentence for offensive/slang words and provides additional information about the sentence.
For example here is a get request and response:
{
"bad_sentence": true,
"bad_word_list": [
"কুত্তা"
],
"normal_words": [
"একটি",
"গালি",
"বা",
"খারাপ",
"শব্দ"
],
"badness": "16.67%"
}
You can also use the POST method to get a response. This feature was added by Tasnim Anas.
For POST request: the endpoint is "http://localhost:3000/" and you have to send payload in the body like this:
{
"sentence": "Your sentence here..."
}
Here's what the response means:
- bad_sentence: Indicates whether the sentence contains any offensive/bad/slang words or not. This only returns boolean values.
- bad_word_list: Lists the offensive/bad/slang words found in the sentence.
- normal_words: Lists the words in the sentence that are considered normal or not offensive/bad/slang words.
- badness: Indicates the proportion of offensive/bad/slang words in the sentence.
Use Cases
Here are some use cases of this API
- Content moderation: Bangla websites often host user-generated content such as comments, forum posts, or user profiles. This API can be integrated into these platforms to automatically detect and filter out inappropriate language, thus maintaining a clean and respectful environment for users.
- Social media platforms: Social media platforms that support Bangla language content can use this API to automatically flag or filter out offensive or inappropriate content in user posts, comments, and messages, helping to maintain a positive and safe community for users.
- E-commerce platforms: E-commerce websites serving the Bangla-speaking community can utilize this API to ensure that product reviews and comments remain free from offensive language, ensuring a positive shopping experience for customers.
- Educational platforms: Educational websites and software applications targeting Bangla-speaking users can use this API to monitor and filter user-generated content in discussion forums, chatrooms, or collaborative projects, promoting a respectful and constructive learning environment.
- Parental control software: Parental control software can leverage this API to monitor and filter out inappropriate content in Bangla language websites and applications, helping parents protect their children from exposure to harmful or offensive material online.
- Chat applications: Bangla language chat applications can integrate this API to automatically detect and filter out offensive language in user messages, helping to maintain a friendly and respectful communication environment among users.
- Customer support platforms: Customer support platforms serving Bangla-speaking customers can use this API to monitor and filter out abusive or inappropriate language in customer inquiries and support tickets, ensuring a professional and respectful interaction with users.
Top comments (0)