About a year ago in my work, I had the chance to spent a great time on something called "FedEx day". It's an event like hackathon when you try to delivery software solution just in one day like FedEx (great marketing btw ;p)
I worked with one of my teammates with strong knowledge of various topics (Przemo it's about you) so we decide to work on something unusual. My team manage authentication in multiple services in a few EU countries (Switzerland, Germany, Poland, Belgium), we have about 100 million auth requests per month and we are still working on improving the security of our systems. So we had decided to prepare a simple captcha mechanism based on canvas & proof-of-work mechanism with leading zeros like in hashcash.
CAPTCHA is another, the more popular name of software genre called HIP, which stands for Human Interaction Proof. This software is basing on the reverse-Turing test, where the computer tries to distinguish a real user from artificial.
HIPs are a very popular topic because, with the growing potential of the Internet, more bots are trying to break into systems. They are using a different mechanism, from brute force attack to various types of dictionary attacks to crack passwords. We can limit the number of requests per second, setup firewalls to allow connections from only specific IP zones and do a lot of other things to secure login forms and applications. We can also use HIP to ensure on the opposite side of the monitor is human.
HIPs have different forms. Among them are some basing on pareidolia, the effect bad perception and associating things by similar shapes and details. This type of HIPs is usually generating an image with text on the server-side, store text value temporary and compare response from the front. However, on the market are tools like Captcha Sniper which solve most of those CAPTCHA tests with high accuracy.
Other HIP software is basing on audio where on the server-side is generated short audio track with text to speech, then little distorted and returned to the user. However, there are speech recognition services. Even Google has it's own GCC Speech Recognition service.
In our case, we are going to make captcha which will require interaction with some of Web APIs like Canvas and Web Crypto. User will drag jigsaw to the right place, then solve a little leading zero challenge. To prevent bots like this we are going to check the response on the server-side only when both two factors will be sent.
Our project will be written in TypeScript with Node.js as a webserver, React and Pixi.js on the frontend side. Of course, there will be a lot more libraries, but for now, that's enough. On the server-side, for each request-id, we will store a position where the puzzle is placed on the image and an array with challenges for leading zero proof of work.
On the frontend side, we are going to use React for rendering CAPTCHA widget and displaying an image from the backend side. React will render the canvas object from Web API and handle user input. In this case, it will be just moving puzzle on the four directions on the canvas. When a user submits his response, the widget application will have calculated prefixes for leading zero challenge by the time it sends to the server.
Below is a sample, universal flow chart for submitting the form (registration, login, comment or any other form) with our new HIP mechanism.
In the next article, we'll prepare an environment for our work and start coding. If you want to be notified about the next part, follow me on DEV.to. Stay tuned, this is going to be legendary series! 😉