DEV Community

Cover image for How to Count Words and Characters in JavaScript
The Dev Drawer
The Dev Drawer

Posted on • Updated on

How to Count Words and Characters in JavaScript

Most if not all developers have used some sort of character counter online to validate SEO, or to just see how many characters a string has. You can do this on your own website or internal tooling. It is simple JavaScript and can have a wide variety of uses throughout your career as a JavaScript developer.

In this tutorial, we'll be exploring how to count words and characters in JavaScript. We'll be using the String constructor and the length property to get the word and character count for a string.

View This On YouTube

File Structure

index.html
/sass
     style.scss
/js
     Count.js
/css (generated by Sass)
   style.css
   style.min.css
Enter fullscreen mode Exit fullscreen mode

Our HTML

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Character and Word Counter</title>
    <link rel="stylesheet" href="/css/style.min.css">
</head>
<body>
    <div class="main">
        <h1>Count Your</h1>
        <h2>Words and Characters</h2>
        <div class="card">
            Words
            <span class="word-count">0</span>
        </div>
        <div class="card">
            Characters
            <span class="character-count">0</span>
        </div>
        <textarea class="count__textarea">This is sample text that should be counted on load.</textarea>
        You've written <span class="word-count">0</span> words and <span class="character-count">0</span> characters.
    </div>
    <script src="/js/Count.js"></script>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

The basic HTML is simple and has a textarea and different spans (2 displayed as cards, and 2 inline texts) to show the counts.

Our Count JS Class

class Count {
    constructor() {
        this.textArea = document.querySelector(".count__textarea")
        this.wordCount = document.querySelectorAll(".word-count")
        this.charCount = document.querySelectorAll(".character-count")
        /*
        bind(this) is used to make sure that the this keyword inside the updateCount method refers to the Count class
        */
        window.addEventListener("load", this.updateCount.bind(this))
        this.textArea.addEventListener("input", this.updateCount.bind(this))
    }

    /**
     * trim() removes whitespace from both sides of a string
     * if the value is empty, return 0
     * split() splits a string into an array of substrings, and returns the new array
     * @returns {number} the number of words in the textarea
     */
    countWords() {
        const value = this.textArea.value.trim()
        if(!value) return 0
        return value.split(/\s+/).length
    }

    /**
     * length returns the length of a string
     * @returns {number} the number of characters in the textarea
     */
    countChars() {
        return this.textArea.value.length 
    }

    /**
     * update the word and character count
     * forEach() calls a function once for each element in an array, in order
     * toString() converts a number to a string
     * @returns {void}
     */
    updateCount() {
        const numWords = this.countWords()
        const numChars = this.countChars()

        this.wordCount.forEach((wordCount) => {
            wordCount.textContent = numWords.toString(10)
        })

        this.charCount.forEach((charCount) => {
            charCount.textContent = numChars.toString(10)
        })
    }
}

new Count() // create a new instance of the Count class
Enter fullscreen mode Exit fullscreen mode

This is a really easy-to-use and simple class that gets the spans and textarea. Once it has those it either listens on page load or when the textarea input is changed.

Once one of those events happens, it simply counts the words and characters and then displays the count in the spans we created in the HTML.

Sample Styles

I created a sample style in Sass. You don't have to use it for your project, but it basically displays the cards and makes the inline text spans bold. You can copy the following to make it look just like my tutorial.

@import url('https://fonts.googleapis.com/css2?family=Open+Sans&display=swap');

$font-family: 'Open Sans', sans-serif;
$font-size:16px;
$primary-color: #38a4ef;

body {
    font-size: $font-size;
    font-family: $font-family;
    width:100%;
    min-height:100vh;
    background-color:$primary-color;
    padding:0;
    margin:0;
    color:#fff;
    overflow:hidden;
    .main {
        width:60%;
        text-align:center;
        margin:10% auto;
        h1,h2 {
            margin:0;
            padding:0;
        }
        .card {
            background-color:#fff;
            padding:3rem;
            border-radius: 5px;
            width:200px;
            display:inline-block;
            margin:1rem;
            color:#333;
            box-shadow: 15px 15px 0px 0px rgba(0,0,0,.2);
            span {
                font-size: 2rem;
                display:block;
            }
        }
        textarea {
            display:block;
            width:calc(100% - 2rem);
            height:200px;
            border:0;
            margin:1rem 0;
            padding:1rem;
            font-size: $font-size;
            border-radius: 5px;
            &:focus {
                outline:none;
            }
        }
        .word-count, .character-count {
            font-weight: bold;
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

There you go. With a few lines of code, you can create your own internal character and word counters using vanilla JS. I hope this helps. Let me know if you have any questions.

Read more articles on DevDrawer

Top comments (13)

Collapse
 
jonrandy profile image
Jon Randy 🎖️ • Edited

Interestingly, the regex metacharacter for whitespace - \s - does not work for zero-length spaces, so the word count (using your code) for the Thai 'น้อยก็หนึ่ง' comes out as 1 when there are actually 3 words: 'น้อย', 'ก็', and 'หนึ่ง'. There are other languages that would have similar issues.

A better way to do this is with Intl.Segmenter - which is language aware:

const segmenterTh = new Intl.Segmenter('th', { granularity: 'word' })
const string1 = 'น้อยก็หนึ่ง'
const wordCount = [...segmenterTh.segment(string1)].length
console.log(wordCount)   // 3
Enter fullscreen mode Exit fullscreen mode

One drawback here is that Intl.Segmenter is not yet supported on Firefox

Collapse
 
lionelrowe profile image
lionel-rowe

Wait, Thai is delimited with zero-width spaces? Is that a standard thing that gets done automatically with common input methods? If so, that makes it much easier to implement an approximate word-counting algorithm that works cross-linguistically (CJK is still a problem, but counting Script=Han characters as each being 1 "word" is usually an acceptable alternative — e.g. MS Word does that. Not sure about kana, though)

Collapse
 
jonrandy profile image
Jon Randy 🎖️

If it is done correctly, then the zero width spaces are there (not sure how input methods/apps handle this) - in reality however, people don't usually bother putting any spaces in (except between sentences, which is normal). There are tools around to automatically add the zero-width spaces though - but I imagine writing those would be no fun.

Thread Thread
 
jonrandy profile image
Jon Randy 🎖️

Just did some quick googling on Thai input methods - apparently one spacebar hit for zero-width, and two hits for real space is common.

Thread Thread
 
lionelrowe profile image
lionel-rowe • Edited

Yeah if I go to thai.tourismthailand.org/Home, grab the first longish span of text, and split on /[^\p{L}\p{N}\p{M}]+/u, it gives me

Thai "word" MTed English
ททท TAT
เปิดตัวโครงการ project launch
365 365
วัน day
มหัศจรรย์เมืองไทยเที่ยวได้ทุกวัน Amazing Thailand, you can travel every day.
ชวนผู้ประกอบการธุรกิจท่องเที่ยวเสนอดีลพิเศษผ่าน inviting travel business operators to offer special deals through
LAZADA LAZADA
ร่วมสร้างตำนานการท่องเที่ยวไทยครั้งใหม่ตลอดปี Join to create a new legend of Thai tourism throughout the year.
2566 2023

Guessing "Join to create a new legend of Thai tourism throughout the year" isn't considered a single word in Thai 😅

Edit: Intl.Segmenter seems to give much better results, though.

[...new Intl.Segmenter('th', { granularity: 'word' }).segment(str)].filter(x => x.isWordLike).map(x => x.segment).slice(0, 10)
Enter fullscreen mode Exit fullscreen mode

Results:

Thai "word" MTed English
ททท TAT
เปิด open
ตัว one
โครงการ project
365 365
วัน day
มหัศจรรย์ amazing
เมือง city
ไทย Thai
เที่ยว travel

Would love to know how it works for Thai, even without ZWSPs as cues.

Thread Thread
 
jonrandy profile image
Jon Randy 🎖️

If memory serves from the Thai lessons I've had, there are many rules about how words can begin and end (what letters can be used in what order etc.) - these would probably get you a lot of the way there.

Thread Thread
 
lionelrowe profile image
lionel-rowe • Edited

It might just be some sort of massive dictionary lookup, as it even does a decent approximation for Chinese, which has no such rules:

const str = `聚苯乙烯塑料、聚苯乙……、……乙烯塑料`
;[...new Intl.Segmenter('zh', { granularity: 'word' }).segment(str)].filter(x => x.isWordLike).map(x => x.segment)
// ['聚苯乙烯', '塑料', '聚', '苯', '乙', '乙烯', '塑料']
Enter fullscreen mode Exit fullscreen mode
Collapse
 
kamonwan profile image
Kamonwan Achjanis

Don't forget to filter by isWordLike property of each segment:
dev.to/kamonwan/the-right-way-to-b...

Collapse
 
lionelrowe profile image
lionel-rowe

Why does Count need to be a class? I'd sort of understand if it was a Web Component, but it just references global state via document and window. Instantiating it with new is meaningless, as it doesn't encapsulate any of its own state.

You'd be better off just using a module instead. Better still, 2 modules — one for DOM manipulation and another for the counting logic, as they're quite different concerns (though including all the logic in the same file is fair enough when the entire app has <100 lines of JS, IMO).

You might also want to look into improving the counting algorithms — for example:

export const countWords = (str) => {
    const value = str.trim()
    if (!value) return 0
    return value.split(/\s+/).length
}

export const countChars = (str) => {
    return str.length 
}

countChars('🚀') // 2 (expected 1)
countWords('web-development tutorial') // 2 (expected 3)
Enter fullscreen mode Exit fullscreen mode

I'll leave this here 😉 Counting symbols in a JavaScript string — Mathias Bynens

Collapse
 
thedevdrawer profile image
The Dev Drawer

Thanks for the tips. I did it as a class to simply show how it can be used in the tutorial or be added as a module. I like OOP versions of code so my preference bled over in the tutorial. I understand it is a small file that could essentially be done by calling the functions directly but I wanted to show it as an OOP way.

I understand what you are saying though, sometimes using the KISS method is better, but I wanted the script to not only showcase how to get the result but how to build it as part of a large thing, even if it is a simple script.

Also, I did not account for symbols or in the above comment, other languages for this tutorial. I was hoping to get a quick tutorial out for something I recently used. It may be a bit specific but it was something I had to use recently as part of a larger project so I wanted to share.

Collapse
 
lionelrowe profile image
lionel-rowe

I'd argue that instantiating a singleton class that doesn't encapsulate any state and instead accesses global state outside of itself isn't really OOP, despite the class and new keywords... though I guess it depends on your definition of OOP. Practically speaking, it's a module that for some reason needs to be instantiated. It'd make sense for such a thing to be a class in Java, because everything has to be a class in Java, even things that really shouldn't be... but JS has no such limitation.

BTW I hope my feedback doesn't come across as too negative — it's a nice-looking app, and this article has already inspired me in 2 ways in my own code. Firstly by reminding me that class-based encapsulation can be pretty damn useful (I usually opt for more of a mixed functional/imperative style), and secondly by Jon Randy alerting me to the existence of Intl.Segmenter in the other comment thread. Both turned out to be extremely useful for my current project of creating a locale-aware (encapsulating the locale data) term checker for translations.

Thread Thread
 
thedevdrawer profile image
The Dev Drawer

Any feedback is good feedback for me, so you are good. I used this code as part of a larger project with many other JS classes so it was not really a singleton in how it was being used but I see your point. I just thought it was cool so I put something together quickly for the video using the basic aspects of what I was doing in my other project.

Also, I saw the other comment and the segmenter is something I was unaware of. I am glad other commenters helped show you something new as well. I will definitely be using it in the future.

Thread Thread
 
lionelrowe profile image
lionel-rowe

Sorry for late reply — meant to reply earlier then forgot. Hooray for stale Chrome tabs! So, when I say "singleton", I mean specifically the Singleton design pattern. That doesn't mean the class doesn't coexist and interact with other classes; it simply means only 1 instance of that class is supposed to exist at any given time.