That might sound like a clickbait title, but it really isn't. You have with 99.9999% certainty helped Google in building self driving cars without knowing it.
Oh and probably helped Google with Optical Character Recognition (OCR), Google Images and Google Street Maps too (if you are as old as me 🤣)!
In the interests of speed, you are doing this every time you fill out a Captcha.
Now if you already knew that, before you leave, I would suggest you read the last couple of sections on why you should stop using Captcha on your site.
Captchas - more than spam filtering
Did you ever wonder why Captchas ask you to identify cars, buses etc?
Most people would probably think that this is because they are things that are hard for a computer to identify automatically (and they are - kind of) and so it helps stop spam.
But the truth is that Google shows you whatever it's image recognition system is struggling to identify at the moment, and feeds the information you provide back into the system to improve it's accuracy.
Nothing new
This isn't something new in case your were wondering.
Google has been doing this since 2009 when they used Captcha to digitise printed materials.
They started with digitising the New York Times archives in 2009, before moving on to Google Books in 2011.
Captcha back then was two words that you had to type into a box, designed to be difficult for computers to recognise and stop bots from submitting the form.
The thing is, one of those words was also used to help Google's Optical Character Recognition (OCR) system when it couldn't identify a word correctly.
So for example, you would type "Woodward peacock" into the text box and the first word would be used to check that the computer had guessed the word correctly in the text it was scanning (or tell it what word it didn't understand).
If enough people agreed it would then add that to it's algorithm (if it guessed correctly) and use that to find further uses of the word "Woodward". If it guessed wrong it would also take that into account as something that needed further improvement (This is massively simplified).
Then you helped with image identification for Google images
Google wanted to identify pictures of cats.
The exact same principle applied, they asked you to identify pictures of cats to help train their model.
Obviously it wasn't just cats, it was everything. And so Google used Captcha to help identify pictures it wasn't sure about.
Then you helped with Google Street View
Once Google had nailed Optical Character Recognition and recognising images of cats and such, a new challenge appeared.
How do you recognise house numbers in Google Street View? These are massively complex images that may have random numbers anywhere (phone numbers on signage etc.)
Captcha to the rescue once again!
This time you would be presented with blurry pictures of numbers to identify. This helped Google identify house numbers on Street View and increase the accuracy of it's results.
So how are you helping with autonomous cars?
Nowadays they are showing you pictures of buses, street signs, trains etc.
But as far as Google is concerned you are feeding it's training set and model with valuable information.
Their ML model isn't quite sure if that thing is a bus or a train, you complete the captcha and tell it what it is, the model learns and improves.
Ultimately you are partially responsible for teaching this model how to identify things.
Why would a model need to identify trains, buses, fire hydrants etc? So that when it is used to make decisions in a self driving car, the decisions are accurate!
Is Captcha actually a terrible idea?
Personally I think Captcha is one of the worst things that happened to the web.
Not very effective
Nowadays Captcha isn't very effective.
While these images may still be difficult for computers to identify, they are a lot easier than they used to be.
Google tells you "identify the buses in these pictures".
Feed what you are looking for (a bus) into a (relatively) basic Machine Learning algorithm and it would soon be able to pick those out when it knows what it is looking for.
For this reason captcha is not as effective as it once was.
And don't forget that you can pay about $5 to get actual humans to complete captchas for you. That is $5 for 1000 solves by the way, which isn't much money if someone really wants to spam a service!
They ruin accessibility
They are still, to this day, one of the least accessible parts of the web and exclude people with disabilities on a daily basis.
I am still not sure how someone hasn't brought a lawsuit against Google for Captcha, but I wouldn't be surprised if it happens.
It ruins your conversions
It gets even worse though, not only do Captchas offer little protection from spam and make your site less accessible, captchas also introduce a lot of friction into completing forms etc.
How many times have you faced a captcha and got annoyed?
Someone getting annoyed is the last thing you want if somebody is signing up to your newsletter or signing up to create an account.
We invest hours into optimising sales processes, trying to squeeze every last conversion out of visitors to our sites and then introduce an "are you a robot" roadblock that is significant enough to cost a sale.
Rounding Up
I personally believe that if you are using Captcha you should stop.
Not because it is feeding our Alphabet (Google) overlord, as avoiding that is next to impossible!
No, you should ditch it as it is a massive accessibility problem still and will add an extra hurdle into any conversion process.
It is also pretty ineffectual nowadays, if somebody really wants to spam you they will find a way. You only have to look at the number of bot accounts on YouTube to realise Captcha doesn't stop bots!
What do you think? Should we still be using Captcha?
Let me know in the comments!
P.S. My "Shower Thought" on Captcha
Just as an aside, I wonder how Captcha is legal?
I didn't agree for Google to use my image recognition abilities to train its algorithms? You would imagine there would be something in the law about that?
Anyway, I know nothing about that and the legal side of it, it was just a thought I had that perhaps someone can educate me on (is there something in the terms that site owners are meant to have on their site policies?)!
P.P.S.
Starting in December my content is changing (for the better I hope!)
My new series on building perfect UI components is massive, and will be released on a Tuesday every week!
So be sure to follow me for more on that!
See you soon!
I hope you enjoyed this article!
Have a great weekend everybody!
Top comments (13)
Perfect! 🤣
I was a big supporter of google's captcha when they were digitizing written works, but now they aren't helping humanity and are super annoying, so I agree they should be avoided.
To your shower thought, no, google using your captcha answers to train their algorithm isn't illegal. Fundamentally, it's no different than a site keeping track of which articles get the most clicks and using that information to order a feed of popular articles. Honestly, I'm at a loss for any reason such a use wouldn't be legal.
Like I said, it was just a silly shower thought as I hadn't given permission to use my "data". I should perhaps have made that more clear that it was just a silly thought!
No worries, you were clear enough. My bad if my response came off as overly serious. I probably could have worded the last sentence better as an inquiry for your reasoning. I'll blame it on a leafblower outside making it hard to think clearly. :P
You referring to it as your data does give me an idea where you're coming from.
The difference is between
account A clicked item B
anditem B was clicked
. What your account has clicked would be your data, but what things have been clicked would be the company's data.If they were using specific or personal information from your account together with the clicks to train their bot, then that would probably fall afoul of things like the GDPR or at least need to be covered in their ToS, but if they are using generic information (ie thing was clicked) then they aren't doing anything illegal and arguably aren't even being unethical as far as information usage goes.
Hope that helps answer your shower thought, now it's time for me to stop procrastinating on this site and try to actual accomplish some work for the day. :)
I suppose it's OK for me to plug my own alternative captcha here that does not rely on tracking users or them labeling data, hCaptcha is owned by a data labeling company too :(.
It's a bit different as it's based on proof of work - so instead of paying in human labor your device does some computation. If we think you're a bot this task gets more difficult. This means that we don't have to collect user data and as there is no task for the user, it's as accessible as a captcha will get :).
friendlycaptcha.com
Interesting, without digging through the site how does the proof of work prove it isn't a bot?
Could I not use puppeteer to navigate the page, how does it prevent that? Or does it not work in a headless browser?
I think the most straightforward answer is that it doesn't prove that it isn't a bot. Instead it adds a small cost to whatever the action is on your website you want to protect (e.g. submitting a form). Some of our customers set the difficulty to be quite high if their goal is to "slow down" scrapers to make them uneconomical, as it runs in the background anyway real users don't really notice as they are busy filling the form.
As you mentioned in your article if you have even a modest budget you can pass any captcha ($5 per 1000 human reCAPTCHA solves is actually on the higher end.. with a bit of searching you will find <$1 per 1000). Our captcha doesn't lock out anyone with disabilities or other accessibility concerns, doesn't kill conversion with forced tasks, and doesn't sell out your customer's privacy. Other than the competitors we are on the same side as you: your visitors' data is a burden for us, not an asset. It's in our interest to collect as little as possible of it.
There are some anti-headless checks clientside, but those are pretty basic so I don't think they would pose a huge obstacle. So if you script yourself around those and are willing to pay xx to xxx seconds of computation per request eventually, you can get through,
Sorry for the long answer, I just hope it highlights the difference in tradeoffs.
No that is a great answer.
Seriously, I never had an idea about how google gets benefited from it. Implementing reCAPTCHA is free, and that naturally made me wonder what was Google's business model.
Thanks to you, I've cleared my long time doubt.
It is one of those things that the second you hear it you go “ah, of course”! I didn’t put the self driving cars part together until recently, I thought they were just training Google images again 😜
Google is going big brain
And that's why Hcaptcha is there for us :D