DEV Community

Discussion on: Spam sucks

ben profile image
Ben Halpern

Ironically, the spammers are probably reading this post, and that pull request... Such is the nature of open source. In the long run, we seek to make this a game they cannot win because we will all collaborate to ensure our mitigation strategies rely on sophistication, not obfuscation.

destynova profile image
Oisín • Edited

I couldn't determine the details of your spam prevention system from just this one post, but do you have a spam classifier already set up? I started using Paul Graham's "Popfile" to filter my POP email back in 2000 or so and it was amazingly effective, with at least 99% accuracy and a similarly low false positive rate. And that was just a bag-of-words Naïve Bayes classifier; probably even a small deep neural net with word embeddings would be much better.

Anyway, if you clarify each submission ass they come in, they could be automatically flagged for review and hidden if the spam likelihood is greater than some threshold (95%?). Then mods could attend to the flagged queue, which would hopefully consist almost entirely of obvious spam.

Apologies if I've just described something you've already been doing for ages -- but just in case :)

crimsonmed profile image
Médéric Burlet

Hey Ben here are also a few more generic rules that could be added. These are similar to the ones I had implemented on a game a while ago:

  • Limit the number of posts that can be posted just after registration
  • Simple title analysis (removing numbers) will reveal the similarity percentage to be high (60%+) as the number is the way for them to make it unique.
  • Auto ban or disable posting when more than 5 articles are published in less than an hour
  • Check age of account vs post rate (this would need a bit of balance but someone who has created an account in past few days and is posting 10 message a day is a bit suspicious)
  • Add re-captcha to the publishing action
darkwiiplayer profile image
𒎏Wii 🏳️‍⚧️

the number is the way for them to make it unique.

Could easily be done with numbers, random words from a dictionary, etc. Filtering out the numbers will not have any lasting effect.

Auto ban or disable posting when more than 5 articles are published in less than an hour

From what I see, it's mostly one post per account, so that won't help.

Allowing a new account to make a single post is enough; any limitation on top of that will mostly get in the way of legitimate users.

Thread Thread
crimsonmed profile image
Médéric Burlet

Could easily be done with numbers, random words from a dictionary, etc. Filtering out the numbers will not have any lasting effect.

For long term I had other suggestions on the list. There is never any perfect spam protection you always will have services like account creators and captcha bypassers and others. The goal is to mitigate to maximum current and future threats by taking into consideration past and possible attempts.

From what I see, it's mostly one post per account, so that won't help.

Sorry but I saw a few some single accounts with 24 posts in 10 minutes and their account was created on the same day.

jamesrweb profile image
James Robb • Edited

Captcha is the devil for accessibility so never ever add captcha anywhere... ever. There’s other ways to validate without bombing your application accessibility for users with access needs.

Otherwise good points overall.

Thread Thread
crimsonmed profile image
Médéric Burlet

I disagree totally there are now invisible captcha for example 😊

Thread Thread
jamesrweb profile image
James Robb

Which is inaccessible too.

Captcha is always in the top 10 issues users with access needs bring up, doesn’t matter the type of captcha. Captcha in its current form is a problem and not a solution. The users have spoken 🤷‍♂️

Thread Thread
crimsonmed profile image
Médéric Burlet

really? We have it on 13 of our commercial solutions and never had an issue with it so :/
let's agree to disagree.

Thread Thread
jamesrweb profile image
James Robb • Edited

“Pages with ReCAPTCHA had 12.6 more [accessibility] errors on average than those without.” - Web Aim 1,000,000

Captcha has improved, but it’s not accessible to all users yet or even a plurality. Issues remain. Old versions in the wild. Etc.

There are valid captcha alternatives which solve many of captchas (all versions) issues but still captcha has improved as I said, it’s just not all the way there for users with access needs or those with privacy concerns.