Best Codes

Posted on Feb 8

⚡ I Made a JavaScript Library that Leaves Profanity Speechless! 🤬

#javascript #webdev #programming #opensource

How I Built a Profanity Blocking JavaScript Library

Introduction

As developers, we often come across situations where we need to filter and sanitize text to block or remove profanity. To tackle this problem, I decided to create a JavaScript library called bc-ProfanityBlock. In this article, I will walk you through the steps I took to build this library and explain how it can be used to effectively block profanity in your applications.

Step 1: Defining the Problem

The first step in building any library is to clearly define the problem we are trying to solve. In this case, the problem was to create, well, a solution that could detect and handle profanity in text. Profanity can be present in various forms, including common words, variations, and even with evasion characters. The goal was to build a library that could efficiently detect and sanitize such content.

Step 2: Research and Planning

Before diving into the implementation, I conducted thorough research on existing profanity blocking techniques and libraries. Among these were
https://github.com/2Toad/Profanity
and
https://www.npmjs.com/package/bad-words
(both fantastic libraries!).
This helped me understand the different approaches and challenges involved. Based on my research, I decided to use a combination of encoded bad words and evasion pattern detection to build an effective solution.

Step 3: Designing the Architecture

Next, I designed the architecture of the library. I created a ContentFilterBadWord class that would encapsulate all the necessary methods and properties for filtering and cleaning text. The class would have functions for decoding encoded bad words, normalizing text with evasion patterns, checking if text contains bad words, and cleaning text by replacing or removing bad words.

Step 4: Implementing the Functionality

With the architecture in place, I started implementing the functionality of the library. I created methods to decode base64 encoded bad words, normalize text with evasion patterns, and check if text contains bad words. I also added options to match bad words as whole words and detect evasion characters and separators. Lastly, I implemented functions to clean text by replacing or removing bad words.

Step 5: Testing and Optimization

Once the functionality was implemented, I conducted extensive testing to ensure the library was working as expected. I created test cases with different scenarios, including common bad words, variations, and evasion techniques. I also tested the library's performance with large volumes of text. Based on the test results, I made optimizations to improve the speed and accuracy of the library. (There are some very minor bugs I am still working on).

Now, Let's take a look at the code

Step 1: Define a class (`ContentFilterBadWord`)

In JavaScript, a class is a blueprint or template for creating objects that share similar properties and behaviors. It provides a way to define the structure and behavior of an object.

To create a class in JavaScript, you can use the class keyword followed by the name of the class. Here's an example:

class Person {
  constructor(name, age) {
    this.name = name;
    this.age = age;
  }

  greet() {
    console.log(`Hello, my name is ${this.name} and I'm ${this.age} years old.`);
  }
}

In the above example, we define a Person class with a constructor and a greet method. The constructor is a special method that gets called when a new object is created from the class. It is used to initialize the object's properties.

To create an instance of a class, you can use the new keyword followed by the class name with parentheses. Here's an example:

const person1 = new Person("John", 25);
const person2 = new Person("Jane", 30);

person1.greet(); // Output: Hello, my name is John and I'm 25 years old.
person2.greet(); // Output: Hello, my name is Jane and I'm 30 years old.

In the above example, we create two instances of the Person class and call the greet method on each instance.

Using classes in JavaScript allows you to create reusable and organized code by encapsulating related properties and behaviors within a single class.

In this case, we define a class called ContentFilterBadWord. In our constructor, we put our bad word list (Base64 encoded, so we can't just read them) and our evasion patterns. Now, we need to add some functions to our class. See below:

class ContentFilterBadWord {
  constructor() {
    // Base64 encoded bad words
    this.encodedCussWords = [
      "AAAAAA",
      "BBBBBB",
      "CCCCCC",
    ];

    this.evasionPatterns = [
      { pattern: /4/gi, replacement: "a" },
      { pattern: /\$/gi, replacement: "s" },
      { pattern: /5/gi, replacement: "s" },
      { pattern: /0/gi, replacement: "o" },
      { pattern: /1/gi, replacement: "i" },
      { pattern: /!/gi, replacement: "i" },
      { pattern: /@/gi, replacement: "a" },
    ];
  }

Step 2: `decodeBase64`

This one is pretty simple.

  decodeBase64(encodedString) {
    return atob(encodedString);
  }

Step 3: `normalizeText`

This one is also pretty simple. The code defines a normalizeText function that takes a text parameter and applies evasion patterns to normalize the text by replacing specified patterns with their replacements. It uses the evasionPatterns array to iterate through each pattern and replacement and apply the replacements to the text.

  normalizeText(text) {
    // Apply evasion patterns to normalize text
    this.evasionPatterns.forEach(({ pattern, replacement }) => {
      text = text.replace(pattern, replacement);
    });
    return text;
  }

Step 4: `containsBadWords`

The function containsBadWords accepts four parameters:

text: The string to be checked for bad words.
matchWord (default false): A boolean indicating whether to match only whole words.
detectEvasionCharacters (default true): A boolean indicating whether to normalize the text for character evasion attempts (like using "@" instead of "a").
detectEvasionSeperators (default true): A boolean indicating whether to remove certain separators or spaces that might be used to disguise bad words.

The function begins by decoding an array of base64-encoded bad words (this.encodedCussWords) to their original form for comparison.

If detectEvasionCharacters is true, the function applies a series of patterns (defined in this.evasionPatterns) to replace evasion characters in text with their normal counterparts.

If detectEvasionseperators is true, the function removes common separators (like hyphens, underscores, and periods) from the text. It then goes further to remove spaces between the letters of each bad word within the text, to catch cases where spaces are used to evade detection.

After normalization, the function logs the normalized text to the console.

Finally, it uses the Array.prototype.some method to check if any bad words are present in the normalized text. It does this by creating a regular expression for each bad word. If matchWord is true, it ensures that only whole words are matched by using word boundaries (\b). Otherwise, it matches the bad word as a substring anywhere in the text. The function returns true if any bad word is detected, and false otherwise.

  containsBadWords(
    text,
    matchWord = false,
    detectEvasionCharacters = true,
    detectEvasionSeperators = true
  ) {
    // Decode bad words for comparison
    const cussWords = this.encodedCussWords.map((encodedWord) =>
      this.decodeBase64(encodedWord)
    );

    // Normalize text to catch evasion attempts
    let normalizedText = text;
    if (detectEvasionCharacters) {
      // Apply evasion patterns to normalize text
      this.evasionPatterns.forEach(({ pattern, replacement }) => {
        normalizedText = normalizedText.replace(pattern, replacement);
      });
    }

    if (detectEvasionSeperators) {
      // Remove common separators between letters
      normalizedText = normalizedText.replace(/[-_.]/g, "");
      // Remove spaces between letters only for bad words
      cussWords.forEach((cussWord) => {
        // Create a dynamic regular expression that matches the bad word with any spaces between the letters
        let wordRegex = new RegExp(cussWord.split("").join("\\s*"), "gi");
        // Replace the matched substring with the bad word without spaces
        normalizedText = normalizedText.replace(wordRegex, (match) => {
          return match.replace(/\s/g, "");
        });
      });
    }

Step 5: `cleanText`

Here is a breakdown of the function cleanText:

Parameters:

text: The text to be cleaned.
method (default "replace"): The method to use for cleaning the text. Can be either "replace" or "remove".
detectEvasionCharacters (default true): A boolean indicating whether to normalize the text for character evasion attempts (like using "@" instead of "a").
detectEvasionSeparators (default true): A boolean indicating whether to remove certain separators or spaces that might be used to disguise bad words.

Function Body:

Initialization:
- The function creates a new variable cleanedText and assigns it the value of the input text.
Evasion Character Detection (if enabled):
- If detectEvasionCharacters is true, the function calls the normalizeText function (not provided in the snippet) to replace evasion characters in cleanedText with their normal counterparts.
Evasion Separator Detection (if enabled):
- If detectEvasionSeparators is true:
  - The function removes common separators (like hyphens, underscores, and periods) from cleanedText using a regular expression [-_.]/g.
  - It iterates over an array of base64-encoded bad words (this.encodedCussWords).
  - For each encoded word:
  - It decodes the word using this.decodeBase64 (not provided in the snippet).
  - It creates a regular expression object wordRegex for the decoded word, with the flags g (global) and i (case-insensitive).
  - Based on the method value:
    - If method is "replace":
    - The function replaces all occurrences of the bad word in cleanedText with the same number of asterisks using a callback function.
    - If method is "remove":
    - The function replaces all occurrences of the bad word in cleanedText with an empty string.
Return:
- The function returns the cleaned text cleanedText.

Overall, this function takes text as input and cleans it by removing or replacing bad words. It can optionally handle evasion attempts by normalizing characters and separators.

cleanText(
    text,
    method = "replace",
    detectEvasionCharacters = true,
    detectEvasionSeparators = true
  ) {
    let cleanedText = text;

    if (detectEvasionCharacters) {
      cleanedText = this.normalizeText(cleanedText);
    }

    if (detectEvasionSeparators) {
      cleanedText = cleanedText.replace(/[-_.]/g, "");
      this.encodedCussWords.forEach((encodedWord) => {
        const cussWord = this.decodeBase64(encodedWord);

        let wordRegex;
        wordRegex = new RegExp(cussWord, "gi");

        if (method === "replace") {
          cleanedText = cleanedText.replace(wordRegex, (match) => {
            return match.replace(/\S/g, "*");
          });
        } else if (method === "remove") {
          cleanedText = cleanedText.replace(wordRegex, "");
        }
      });
    }

    return cleanedText;
  }

That's it as far as code!

How to use...

If you are interested in the Usage docs, see https://github.com/The-Best-Codes/bc-ProfanityBlock.

Conclusion

In this article, I shared the process of building the bc-ProfanityBlock JavaScript library for blocking profanity in text. By combining encoded bad words and evasion pattern detection, the library provides an efficient and effective solution for filtering and sanitizing content. Whether you are building a social media platform, chat application, or any other system where content moderation is important, this library can be a valuable addition to your toolkit.

You can find the complete source code and documentation for the ContentFilterBadWord library on GitHub. I hope this article has been informative and encourages you to explore the world of content moderation in your applications.

If you have any questions or feedback, please feel free to reach out to me via email at best-codes@proton.me.
Happy coding!

Some content in this article is generated by the BestCodes AI.
Article by Best_codes.

Top comments (12)

Michael Tharrington • Feb 8 • Edited

Rock on! Such a cool idea and nice to give folks the option to have this filter.

Also, I appreciate ya labeling that you used AI to assist with the writing... annnnd, particularly cool that it's your own flavor of AI, too!

Good stuff all around, Best Codes. 🙌

Best Codes • Feb 8

Thank you! I'm glad you enjoyed my post.

I believe that I read the guidelines here:
dev.to/devteam/guidelines-for-ai-a...
And they said that I could just put a disclaimer at the end.

I don't like to put it there for some of my posts, because I mostly use AI to rephrase, correct grammar, etc. and not to create content (AI content is quite shallow or uncreative, usually). But, I figure that it's good to be on the safe side and put it there anyway.

Such a cool idea and a good cause.

I originally made this because I made a simple home and weather monitor for myself, which included a home assistant I made called 'Heather'. I could ask it things, and it would search the Internet and use NLP to generate a result. Obviously, I didn't feel like having my AI spout profanity at me, so I created this filter for incoming commands (in case it misheard) and spoken text (in case the result was unsafe).

To others reading this comment, I will warn you that the word list itself is quite strict (it contains about 400 words, if that gives you an idea), so please feel free to customize it for your needs.

Thanks again for the feedback!

Michael Tharrington • Feb 9

That's exactly right! Putting the disclaimer at the end is totally acceptable.

Plus, it really does sound like you are using it lightly and tactfully. Your post sounds very human to me, it's clear that you're putting yourself into it and that's the main thing as far as we're concerned. 🙂

Obviously, I didn't feel like having my AI spout profanity at me

Oh yeah, good call! That would be an odd and unfortunate experience.

Sounds like you've made a cool bot, and I like that ya went with "Heather" when it's primary task was for monitoring the weather. "Heather, how's the weather?" is satisfying. 😀

Best Codes • Feb 9 • Edited

OK, great! I'm glad I did everything correctly. 🙂

"Heather, how's the weather?" is satisfying. 😀

Actually, one of my friends came up with the name. I told them I was working on a home and weather monitor, and I needed a name for the assistant. They suggested that I combine the H in home with Weather. So, I did. 😀

As you can see, the AI does still need some work, but it described you fairly well, I think:
("Who is Michael Tharrington?")

Anyway, thanks for chatting, and have a great weekend!

James Livesey • Feb 9

Very cool! But is it subject to the Scunthorpe problem?

(Hopefully this comment doesn't trigger the dev.to profanity filters 😅)

Best Codes • Feb 9

I'm glad you brought this up! That's exactly why I included the matchWord parameter in my library. That way, if cat was a bad word, then catastrophe would not trigger it. Plus, many profanity filters (and hopefully mine, here soon) are beginning to use NLP or brain.js filters that analyze sentiment or intent rather than the words themselves (AI, in a sense).
Thanks for the feedback!

Vitus • Feb 12

nice one ;))) nice project

Best Codes • Feb 12

Thanks! I'm glad you liked it.

Comment hidden by post author - thread only accessible via permalink

Jim Pincer • Feb 8

Interesting though

Best Codes • Feb 8

Hi Jim, thanks for the feedback! It looks like you are using GPT Zero to see if my content is AI. While some of it is (as stated at the bottom of the post), certainly less than 60 sentences are!
I would encourage you to read this article about using GPT Zero here:
gptzero.me/news/5-steps-towards-re...
Thanks again for the feedback and happy coding!

Bcfiend(dad • Feb 8

This is so cool
I like the Base64 encoded part

Best Codes • Feb 8

Thanks! I'm glad you enjoyed.

View full discussion (12 comments)

Some comments have been hidden by the post's author - find out more

DEV Community

⚡ I Made a JavaScript Library that Leaves Profanity Speechless! 🤬

How I Built a Profanity Blocking JavaScript Library

Introduction

Step 1: Defining the Problem

Step 2: Research and Planning

Step 3: Designing the Architecture

Step 4: Implementing the Functionality

Step 5: Testing and Optimization

Now, Let's take a look at the code

Step 1: Define a class (`ContentFilterBadWord`)

Step 2: `decodeBase64`

Step 3: `normalizeText`

Step 4: `containsBadWords`

Step 5: `cleanText`

How to use...

Conclusion

Top comments (12)

Read next

My journey in competitive programming

Understanding the Barrel Pattern in JavaScript/TypeScript

I wish more designers wrote code

Built a cli for browser

How I Built a Profanity Blocking JavaScript Library

Introduction

Step 1: Defining the Problem

Step 2: Research and Planning

Step 3: Designing the Architecture

Step 4: Implementing the Functionality

Step 5: Testing and Optimization

Now, Let's take a look at the code

Step 1: Define a class (ContentFilterBadWord)

Step 2: decodeBase64

Step 3: normalizeText

Step 4: containsBadWords

Step 5: cleanText

How to use...

Conclusion

Read next

My journey in competitive programming

Understanding the Barrel Pattern in JavaScript/TypeScript

I wish more designers wrote code

Built a cli for browser

Step 1: Define a class (`ContentFilterBadWord`)

Step 2: `decodeBase64`

Step 3: `normalizeText`

Step 4: `containsBadWords`

Step 5: `cleanText`