Most if not all developers have used some sort of character counter online to validate SEO, or to just see how many characters a string has. You ca...
For further actions, you may consider blocking this person and/or reporting abuse
Why does
Count
need to be a class? I'd sort of understand if it was a Web Component, but it just references global state viadocument
andwindow
. Instantiating it withnew
is meaningless, as it doesn't encapsulate any of its own state.You'd be better off just using a module instead. Better still, 2 modules — one for DOM manipulation and another for the counting logic, as they're quite different concerns (though including all the logic in the same file is fair enough when the entire app has <100 lines of JS, IMO).
You might also want to look into improving the counting algorithms — for example:
I'll leave this here 😉 Counting symbols in a JavaScript string — Mathias Bynens
Thanks for the tips. I did it as a class to simply show how it can be used in the tutorial or be added as a module. I like OOP versions of code so my preference bled over in the tutorial. I understand it is a small file that could essentially be done by calling the functions directly but I wanted to show it as an OOP way.
I understand what you are saying though, sometimes using the KISS method is better, but I wanted the script to not only showcase how to get the result but how to build it as part of a large thing, even if it is a simple script.
Also, I did not account for symbols or in the above comment, other languages for this tutorial. I was hoping to get a quick tutorial out for something I recently used. It may be a bit specific but it was something I had to use recently as part of a larger project so I wanted to share.
I'd argue that instantiating a singleton class that doesn't encapsulate any state and instead accesses global state outside of itself isn't really OOP, despite the
class
andnew
keywords... though I guess it depends on your definition of OOP. Practically speaking, it's a module that for some reason needs to be instantiated. It'd make sense for such a thing to be a class in Java, because everything has to be a class in Java, even things that really shouldn't be... but JS has no such limitation.BTW I hope my feedback doesn't come across as too negative — it's a nice-looking app, and this article has already inspired me in 2 ways in my own code. Firstly by reminding me that
class
-based encapsulation can be pretty damn useful (I usually opt for more of a mixed functional/imperative style), and secondly by Jon Randy alerting me to the existence ofIntl.Segmenter
in the other comment thread. Both turned out to be extremely useful for my current project of creating a locale-aware (encapsulating the locale data) term checker for translations.Any feedback is good feedback for me, so you are good. I used this code as part of a larger project with many other JS classes so it was not really a singleton in how it was being used but I see your point. I just thought it was cool so I put something together quickly for the video using the basic aspects of what I was doing in my other project.
Also, I saw the other comment and the segmenter is something I was unaware of. I am glad other commenters helped show you something new as well. I will definitely be using it in the future.
Sorry for late reply — meant to reply earlier then forgot. Hooray for stale Chrome tabs! So, when I say "singleton", I mean specifically the Singleton design pattern. That doesn't mean the class doesn't coexist and interact with other classes; it simply means only 1 instance of that class is supposed to exist at any given time.
Interestingly, the regex metacharacter for whitespace -
\s
- does not work for zero-length spaces, so the word count (using your code) for the Thai 'น้อยก็หนึ่ง' comes out as 1 when there are actually 3 words: 'น้อย', 'ก็', and 'หนึ่ง'. There are other languages that would have similar issues.A better way to do this is with
Intl.Segmenter
- which is language aware:One drawback here is that
Intl.Segmenter
is not yet supported on FirefoxWait, Thai is delimited with zero-width spaces? Is that a standard thing that gets done automatically with common input methods? If so, that makes it much easier to implement an approximate word-counting algorithm that works cross-linguistically (CJK is still a problem, but counting Script=Han characters as each being 1 "word" is usually an acceptable alternative — e.g. MS Word does that. Not sure about kana, though)
If it is done correctly, then the zero width spaces are there (not sure how input methods/apps handle this) - in reality however, people don't usually bother putting any spaces in (except between sentences, which is normal). There are tools around to automatically add the zero-width spaces though - but I imagine writing those would be no fun.
Just did some quick googling on Thai input methods - apparently one spacebar hit for zero-width, and two hits for real space is common.
Yeah if I go to thai.tourismthailand.org/Home, grab the first longish span of text, and split on
/[^\p{L}\p{N}\p{M}]+/u
, it gives meGuessing "Join to create a new legend of Thai tourism throughout the year" isn't considered a single word in Thai 😅
Edit:
Intl.Segmenter
seems to give much better results, though.Results:
Would love to know how it works for Thai, even without ZWSPs as cues.
If memory serves from the Thai lessons I've had, there are many rules about how words can begin and end (what letters can be used in what order etc.) - these would probably get you a lot of the way there.
It might just be some sort of massive dictionary lookup, as it even does a decent approximation for Chinese, which has no such rules:
Don't forget to filter by isWordLike property of each segment:
dev.to/kamonwan/the-right-way-to-b...