DEV Community

Cover image for Spam sucks

Spam sucks

Ben Halpern on October 06, 2020

If you've been around DEV for the last few days, we apologize for the feed having too much spam. The spam fight is an ongoing battle for any platf...
Collapse
 
ben profile image
Ben Halpern

Ironically, the spammers are probably reading this post, and that pull request... Such is the nature of open source. In the long run, we seek to make this a game they cannot win because we will all collaborate to ensure our mitigation strategies rely on sophistication, not obfuscation.

Collapse
 
crimsonmed profile image
Médéric Burlet

Hey Ben here are also a few more generic rules that could be added. These are similar to the ones I had implemented on a game a while ago:

  • Limit the number of posts that can be posted just after registration
  • Simple title analysis (removing numbers) will reveal the similarity percentage to be high (60%+) as the number is the way for them to make it unique.
  • Auto ban or disable posting when more than 5 articles are published in less than an hour
  • Check age of account vs post rate (this would need a bit of balance but someone who has created an account in past few days and is posting 10 message a day is a bit suspicious)
  • Add re-captcha to the publishing action
Collapse
 
jamesrweb profile image
James Robb • Edited

Captcha is the devil for accessibility so never ever add captcha anywhere... ever. There’s other ways to validate without bombing your application accessibility for users with access needs.

Otherwise good points overall.

Thread Thread
 
crimsonmed profile image
Médéric Burlet

I disagree totally there are now invisible captcha for example 😊

Thread Thread
 
jamesrweb profile image
James Robb

Which is inaccessible too.

Captcha is always in the top 10 issues users with access needs bring up, doesn’t matter the type of captcha. Captcha in its current form is a problem and not a solution. The users have spoken 🤷‍♂️

Thread Thread
 
crimsonmed profile image
Médéric Burlet

really? We have it on 13 of our commercial solutions and never had an issue with it so :/
let's agree to disagree.

Thread Thread
 
jamesrweb profile image
James Robb • Edited

“Pages with ReCAPTCHA had 12.6 more [accessibility] errors on average than those without.” - Web Aim 1,000,000

Captcha has improved, but it’s not accessible to all users yet or even a plurality. Issues remain. Old versions in the wild. Etc.

There are valid captcha alternatives which solve many of captchas (all versions) issues but still captcha has improved as I said, it’s just not all the way there for users with access needs or those with privacy concerns.

Collapse
 
darkwiiplayer profile image
𒎏Wii 🏳️‍⚧️

the number is the way for them to make it unique.

Could easily be done with numbers, random words from a dictionary, etc. Filtering out the numbers will not have any lasting effect.

Auto ban or disable posting when more than 5 articles are published in less than an hour

From what I see, it's mostly one post per account, so that won't help.

Allowing a new account to make a single post is enough; any limitation on top of that will mostly get in the way of legitimate users.

Thread Thread
 
crimsonmed profile image
Médéric Burlet

Could easily be done with numbers, random words from a dictionary, etc. Filtering out the numbers will not have any lasting effect.

For long term I had other suggestions on the list. There is never any perfect spam protection you always will have services like account creators and captcha bypassers and others. The goal is to mitigate to maximum current and future threats by taking into consideration past and possible attempts.

From what I see, it's mostly one post per account, so that won't help.

Sorry but I saw a few some single accounts with 24 posts in 10 minutes and their account was created on the same day.

Collapse
 
destynova profile image
Oisín • Edited

I couldn't determine the details of your spam prevention system from just this one post, but do you have a spam classifier already set up? I started using Paul Graham's "Popfile" to filter my POP email back in 2000 or so and it was amazingly effective, with at least 99% accuracy and a similarly low false positive rate. And that was just a bag-of-words Naïve Bayes classifier; probably even a small deep neural net with word embeddings would be much better.

Anyway, if you clarify each submission ass they come in, they could be automatically flagged for review and hidden if the spam likelihood is greater than some threshold (95%?). Then mods could attend to the flagged queue, which would hopefully consist almost entirely of obvious spam.

Apologies if I've just described something you've already been doing for ages -- but just in case :)

Collapse
 
kretaceous profile image
Abhijit Hota

I was going to address this in an E-Mail or something. Glad this was taken seriously.

P.S.
Honestly though, I'd report more spam posts if the verification process (recognizing cars and fire hydrants) was not this rigorous.

Sometimes I leave them midway after recognizing them for 10 times in a row.

Collapse
 
vtrpldn profile image
Vitor Paladini

Now they are spamming forem issues. Seems like the fix made them salty, haha

A screenshot of Forem repo issues

Collapse
 
ben profile image
Ben Halpern

Those fuckers

Collapse
 
calummoore profile image
Cal

I really don't understand how that could be effective for the spammer?! Who reads that and thinks, oh I must call that number immediately. 🤔

Collapse
 
vtrpldn profile image
Vitor Paladini

I'd bet that it is probably an SEO thing. Having those words and number in other places might bump their website a bit.

It's kind of like when WordPress websites get hacked and the abuser creates thousands of pages linking to their websites.

Thread Thread
 
amorpheuz profile image
Yash Dave

It is, not only are they trying to falsely improve their SEO, but it is also a kind of phishing attempt. They make google display the wrong number in their top results for legit brands (like Google pay, etc.) and end up Scamming unaware folks who think that these are legit customer care phone numbers / websites. This kind of fraud has been doing its rounds in India recently.

Pretty toxic stuff. 😔

Collapse
 
gregorywitek profile image
Gregory Witek

Thank you for taking care of it, I know how hard it is to fight bad actors, it's a neverending thing! 💜

Question to DEV team: if we notice such spam, is it useful for you if we report it, or does it just add noise to your list of reports and cause more harm than good?

Collapse
 
khmarbaise profile image
Karl Heinz Marbaise

First I would like to congratulate the whole dev.to team behind that cause I can imagine it's a cat-and-mouse game ... The time the team has responded is awesome and apart from that a big thank you for making this platform.

Collapse
 
scrabill profile image
Shannon Crabill

I appreciate the action you all are taking regarding SPAM. I'll keep reporting it when I see it. I will say, reporting SPAM accounts, comments, posts, etc made for a productive alternative for doom scrolling.

I did have a question. If I have to view a post/comment to confirm a post/comment is spam before reporting it, does that view figure into the algorithm that determines which posts should be more visible? Or does reporting/vomiting/marking as abuse cancel out any views, etc?

Collapse
 
marcellothearcane profile image
marcellothearcane

@ben , can we get feedback on how we (trusted users - thanks for that by the way) are doing? I'm clicking away downvotes and reports on things that look like spam, but I don't know if it's doing a good job or hindering.

Can I see a list of things that I've reported that were successfully deleted as spam? A bit like how Stackexchange does it with flags: meta.stackexchange.com/questions/1...

Collapse
 
anuraganand profile image
anurag-anand

I too report spammed 2 posts but both of the time..if was kind of that infinte captcha..the likes of which you get in tor browser.. but I still did it twice..coz I love dev.to but for the third time I didn't have that much patience.

Collapse
 
yo profile image
Yogi • Edited

We in our project(taskord.com) flag users if more than 2 users associated with the same IP and count the post profanity if profanity count exceeds 10 the system will automatically flag the user. But in the first place, we don't allow disposable emails to prevent fake accounts.

When a user is flagged all his entities are hidden from the public and returns 404, this will increase good UX for other users.

After flagging, it will come to notice for all staffs and we will take necessary action weather to suspend or un-flag the user.

Next plan is I need to implement some ML models to find spammy posts and users and working on rate limiting based flagging too!

This is our mini-mod panel where all the action takes place here!

Collapse
 
mortoray profile image
edA‑qa mort‑ora‑y

The biggest problem in blocking spam, isn't blocking spam, it's allowing ham through.

Filters can create often undetectable bubbles of information, where legitimate information is suppressed. This happens frequently from big providers, like say GMail, where certain addresses are shunted to spam for no apparent reason.

You need to have a feedback mechanism to report incorrectly marked spam.

Collapse
 
manishfoodtechs profile image
manish srivastava

If you find 10 integers in topic.... Most probably it's spam.

Collapse
 
scrabill profile image
Shannon Crabill

That's what I was thinking about this current batch of SPAM.

Collapse
 
thomasbnt profile image
Thomas Bnt ☕

FIGHT

Collapse
 
nieuwepixels profile image
Nieuwe Pixels • Edited

Ben, as opted in another discussion can't we get a flag option? Above a treshhold a user message gets say delayed. Above a second treshold their message abillity is (temp) revoked. For fairness, false flagging wil be penalized too.
This way you delegate the problem and I'm sure most members are willing to help. It's at least a solution untill something better is in place.

Collapse
 
defman profile image
Sergey Kislyakov

Community mods does not have to solve the captcha. Just saying... Though I don't even know how one could become a community mod. I guess there are some algorithms behind that.

Collapse
 
karandpr profile image
Karan Gandhi

From what I have seen.

The spam posts have 4 buzz words.
They have a phone number with a random letter/s attached.
I think a regex based spam filter can combat the issue effectively.

Collapse
 
jacobmgevans profile image
Jacob Evans

Is AI/Machine Learning something you're all looking into for this problem... Sorry if it was mentioned I skim read most of it.

Collapse
 
pavanbelagatti profile image
Pavan Belagatti

Yesterday I reported two threads that were spam. Thanks Ben for making Dev community all great again.

Collapse
 
sandordargo profile image
Sandor Dargo

Thanks for following up on this and taking it seriously. By the way are you interested in my customer care number? 😂

Collapse
 
lucretius profile image
Robert Lippens

Glad to find this post - I just scrolled through and reported a few and stumbled upon this, glad to see its being addressed. Thanks to the DEV team!

Collapse
 
sharozijaz profile image
Sharoz Ijaz

Thanks for Listening to Us.I recently reported 4 or 5 Accounts related to Spam's.

Collapse
 
leob profile image
leob

I'm seeing it occasionally, it's pretty rare ... not a big problem by any means. When I do come across it, I'm like "wtf", I chuckle a bit, and I move on (like probably almost everyone is doing).