Introduction
As we traverse the digital frontier, the importance of robust security measures, devised to shield against harmful elements, has become paramount. CAPTCHAs – an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart – serve this purpose, playing a pivotal role in authenticating online interactions. Since their introduction in 2003 by L. von Ahn and colleagues 1, CAPTCHAs have become a commonplace presence, attempting to discern humans from automated bots, thereby safeguarding digital spaces from malicious activities.
Since their inception, CAPTCHAs have long been the go-to solution for differentiating between human users and automated bots on online platforms. These tests typically involve tasks like deciphering distorted characters, selecting specific images, or solving puzzles. While CAPTCHAs may have been initially effective in combating bots, they have become increasingly inadequate and burdensome for users in recent years.
Picture of an early CAPTCHA, as shown in the paper by von Ahn et al.
The rapid growth of the internet, combined with the increasing sophistication of malicious actors, has led to the widespread adoption of CAPTCHAs by countless websites and online platforms. These tests have become the gatekeepers of digital spaces, serving as the first line of defence against automated attacks, fraudulent activities and data breaches.
By presenting users with challenges that are easy for humans to solve but difficult for machines, CAPTCHAs aim to accomplish two crucial objectives. First, they seek to authenticate human presence, ensuring that real users can access and utilise online resources while maintaining a secure environment. Second, they aim to hinder or altogether prevent malicious bots from infiltrating websites, safeguarding sensitive information, and maintaining the integrity of online interactions.
In recent times, the prevalence of CAPTCHAs seems to have risen considerably, and it’s not hard to understand why. The growing reliance on digital platforms for various tasks, such as online shopping, social media engagement, and financial transactions, has made them attractive targets for actors seeking to exploit vulnerabilities and gain unauthorised access.
To keep pace with these evolving threats, CAPTCHAs have been continuously evolving as well. From deciphering distorted text or numbers to identifying specific objects in images or solving logical puzzles, the variety of CAPTCHA types has expanded to offer different levels of difficulty and security. However, these conventional CAPTCHAs are not without their limitations and drawbacks, often leading to frustrating user experiences and accessibility challenges for certain individuals.
Picture reCAPTCHA version 2, a typical modern CAPTCHA
Issues with CAPTCHAs
The rise of sophisticated bots armed with advanced algorithms and machine learning techniques has rendered traditional CAPTCHAs significantly less effective. Bots can now solve CAPTCHAs with surprising accuracy, compromising the security of online platforms and leaving them vulnerable to various malicious activities. As a result, the need for a more robust and reliable solution has become paramount.
Furthermore, traditional CAPTCHAs often lead to a subpar user experience. Users frequently encounter distorted images, complex puzzles, or illegible text, causing frustration and hindering their ability to access the desired online content. Not only does this affect user satisfaction, but it also leads to potential abandonment of platforms, resulting in lost opportunities and revenue.
There is yet-another concern with CAPTCHAs. Building any system at Internet scale that can with answer with certain certainity: ‘is this a real person?’ is inherently a very hard problem. Following up with the continuous arms race that occurs between CAPTCHA designers and CAPTCHA solvers is simply beyond the reach of most individuals and organisations.
In turn, this has lead to the consolidation of CAPTCHA-as-a-Service providers, i.e., third party services that offer integrations for embedding CAPTCHAs in our sites and applications. The largest such provider is Google’s reCAPTCHA, advertised the following way: ‘reCAPTCHA is a free service that protects your site from spam and abuse. It uses advanced risk analysis techniques to tell humans and bots apart’.
The centralized nature of mainstream CAPTCHA systems concentrates vast amounts of user data in the hands of a few entities. This concentration of data power undermines the principles of decentralization and user autonomy, reinforcing the existing power dynamics in the digital landscape. The potential for data misuse or abuse looms large, especially when considering the far-reaching influence these providers possess over individuals’ online experiences.
These privacy concerns are inherently difficult to address, because the reason that these services are able to work is likely in large part because of the large amounts of data they collect, which they can analyise and use to make inferences into what looks like typical human interaction and what looks like typical bot behaviour. This point is further reinforced by services like reCAPTCHA v3, that have largely done away with puzzle-solving and instead assign each visitor a score, with minimal or no interaction. Hence, it is probably safe to say that a privacy-friendly CAPTCHA is unlikely to be effective against sophisticated attackers, and that, even if it were, it may very well be difficult for humans to solve.
Introducing Privacy Pass
Cloudflare are a prominent Internet infrastructure company that offer a range of services, including protection against automated attacks. Among their services is a protection feature called ‘Challenge’, that enables website owners to verify human visitors, protecting their sites against automated attacks. The specific factors taken into account when issuing a challenge are unknown and dependent on configuration, but in many cases, it used a CAPTCHA as one of its signals.
Because a vast swath of the Internet is behind Cloudflare with such challenges enabled, many sites started in practice requiring solving a CAPTCHA on first visit. This was particularly inconvenient for private users, such as those utilising the Tor Browser, who would often need to complete one or several CAPTCHAS for every site visited. ‘Regular’ private users are in many ways indistinguishable from malicious users, which led to the unfortunate result of forcing them to solve a vast number of CAPTCHAS or worse, blocking access entirely.
Circa 2017, CloudFlare in collaboration with other actors in the industry and academia, developed what came to be known as ‘Privacy Pass’ 2. In its original form, it would be a browser extension that would allow users to prove their humanity by solving a CAPTCHA, which would result in a special signed token that can later be sent to servers proving that a challenge had been successfully solved.
The main innovation of Privacy Pass was its ‘blind’ nature. This means that servers cannot link the tokens they sign with the ones redeemed in the future. As a result, users can maintain their privacy without compromising their communication with the server.
Moving on and into the browser
As innovative as Privacy Pass may have been, its lack of native browser support in practice limited its ability to take off. Installing a browser extension requires users to take additional steps to download it and configuring it, and that’s not taking into account that they must be aware of it in the first place. Adding an extension is also not always possible due to policy or platform limitations, and, as with all extensions, comes with an increased attack surface.
The IETF PrivacyPass working group are now in the process of standardising the protocols used in PrivacyPass so that it can be natively supported in all devices as a CAPTCHA alternative.
At a high level, it leverages the existing mechanisms in the HTTP protocol for authentication. When a server requires proof that the user is not automated, is transmits a challenge in the WWW-Authenticate
header, just like it would for any other authentication scheme. It looks something like this: WWW-Authenticate: PrivateToken challenge="<base64url encoded data>", token-key="<base64url encoded data>"
.
The challenge part is a serialised and base64url encoded representation of the following structure:
struct {
uint16_t token_type;
uint16_t issuer_name_length;
char issuer_name[1..2^16-1];
uint8_t redemption_context_length;
char redemption_context[0..32];
uint16_t origin_info_length;
char origin_info[0..2^16-1];
} TokenChallenge;
The TokenChallenge structure as shown in the latest draft of the Privacy Pass HTTP authentication scheme, with some changes to the types used to more closely resemble C syntax.
With the issuer_name
field meant to include the hostname of the token issuer, the and the optional origin_info
meant to include the hostname of the party issuing the request, note that there is little space left for including personally identifiable information about the user. The place where a unique identifier could fit is in the optional 32-bit redemption_context
field, which is not visible to the token issuer.
Once a supporting client receives a challenge, it initiates a protocol with the issuer (as specified in the issuer_name
field), first by requesting the resource at /.well-known/token-issuer-directory
and then requesting a token be issued. The issuer is not provided with the raw value of the TokenChallenge
issued to the client, but rather with a digest of it, meaning that it does not have access to read what little information can be encoded in the request.
After receiving a token from the issuer, the client relays this value to the origin that requested it using the standard HTTP Authorization
header, for example, like this: Authorization: PrivateToken token="<base64url encoded data>"
The structure of the token is the following, where Nid
and Nk
are fixed integers depending on the token_type
value:
struct {
uint16_t token_type;
char nonce[32];
char challenge_digest[32];
char token_key_id[Nid];
char authenticator[Nk];
} Token;
The Token structure as shown in the latest draft of the Privacy Pass HTTP authentication scheme, with some changes to the types used to more closely resemble C syntax.
The origin requesting the token must verify that the authenticator
field includes a valid signature over the entire value of the token, not including the authenticator
itself (i.e., the first 66 + Nid
bytes) and that the key ID identified by the token_key_id
is valid.
This approach is deceivingly simple and yet tremendously powerful, because it enables providing an often required facility (closing off certain resources to automated requests) in a privacy-friendly manner that does not leak information about the user, and in a way that is also resistant to correlation attacks by the issuer. What is more, by implementing it as a standard using existing authentication mechanisms as building blocks, it is possible to support this feature while providing graceful degradation for clients that do not support the standard.
When and where can I use this?
Although the standard has not been finalised, it is already used in the wild and it can be used right now. Apple announced support for PATs in June last year in iOS 16 and macOS Ventura. It requires, however, being logged in at the device level to iCloud. As the specification is finalised, it is likely that other vendors will include support for the standard.
PATs are already being used by CloudFlare, for example for clients using ‘managed challenges’ or using the Turnstile service. Likewise, the feature is available to Fastly customers.
However, you do not need to entirely rely on a third-party service to use PATs, and we have developed an open source JavaScript library, PrivacyPass, also available on NPM to support issuing PAT challenges and redeeming tokens. The only, but crucial, part missing is the issuer. For this, you’ll need to use an existing open one, like the ones operated by Cloudflare and Fastly, or you can develop your own after gaining access to Apple’s attester service.
How will this not lead more centralisation?
The problem at heart, thwarting automation and automated attacks, is a difficult one to solve anonymously and at scale, and it is not surprising that existing solutions prior to PATs, like CAPTCHA services, require data collection to tell ‘good’ and ‘bad’ users apart. What is perplexing is that a solution that purportedly addresses the issue privately requires a specific brand of devices and being signed in to the vendor’s account.
This is unfortunate but not unsalvageable. For one, the PrivacyPass protocols do not mandate any particular vendor and is intended and easy for several and parties to perform the attestation part.
Then, there is the issue of why there needs to be an attestation at all, which introduces a bottleneck and closes off the protocol. The reason for requiring attestations is that without a trusted third party vouching for a particular user, it is impossible to enforce any system against automation. In particular, with self-attestation, an attacker can simply mint as many tokens as they desire, and then a few more.
We have explored some alternatives, such as Proof of Work, that would obviate the need for ’trusted’ parties. The main downside of these alternative approaches, even after ignoring the potential environmental concerns of ‘wasting power’, is simply that compute power is too cheap. Let’s do some simple calculations to drive this point home. Let’s say that we want to build a system to prevent spam, and that a cost of just 1 cent would be enough to deter spammers. Hence, we need to implement a proof of work system where producing a proof costs at least 1 cent. Let’s say that a spammer can get a server for $100 per month (likely an inflated number, and the server could in fact cost far less), which comes down to shy of 0.004 cents per second of server time. Therefore, to have a proof of work system that expends the equivalent of 1 cent, we’d need some algorithm that uses up around 250 seconds to construct a proof, or over 4 minutes. Such long-running proof of work would likely be unacceptable in any real-world scenario.
Going back to attestations, we believe that they can be done in an environment that fosters competition, remains privacy friendly and avoids vendor lock-in. For better or for worse, most consumer hardware these days comes equipped with some form of trusted hardware, such as a TPM, that could be leveaged to produce the rate limited tokens required for this protocol, without any changes that compromise privacy from the way those devices work today. As an alternative, external modules (such as a cheap USB dongle) could be made specifically for this purpose.
Closing remarks
We have explored the innovative solution known as PrivacyPass and its potential to revolutionise the way we approach CAPTCHA-related challenges. We began by examining the limitations and drawbacks of traditional CAPTCHAs, acknowledging their impact on user experience and their vulnerability to automated attacks. This led us to the introduction of PrivacyPass, a cutting-edge approach that combines cryptographic techniques with user-centric design principles.
PrivacyPass offers several advantages over traditional CAPTCHAs, most notably in terms of user experience. By generating tokens that vouch for a user’s human identity, PrivacyPass eliminates the need for repetitive CAPTCHA-solving tasks, allowing users to access online content swiftly and effortlessly. Moreover, the cryptographic proof systems employed by PrivacyPass provide enhanced security against automated attacks, ensuring a higher level of protection.
While PrivacyPass presents an innovative and promising solution, it is important to address potential concerns or criticisms that may arise. Most notably, PrivacyPass is reliant on remote attestation, which shuts off users that do not have or do not desire hardware with such capabilities.
Looking ahead, the field of CAPTCHA mitigation is ripe with possibilities for future developments and improvements. As technology advances, we can expect further refinements in the design and implementation of PrivacyPass and other similar solutions. Continued collaboration between researchers, industry leaders, and privacy advocates will undoubtedly lead to advancements that strike a balance between security, user experience and privacy.
In conclusion, PrivacyPass represents a significant step forward in the world of mitigation of automated threats. By prioritising user experience, security, and privacy, PrivacyPass offers an elegant solution to the challenges faced by traditional CAPTCHAs. With real-world applications already implemented by leading organizations, it is clear that PrivacyPass is shaping the landscape of online security. As we move forward, we eagerly anticipate the continued progress and evolution of CAPTCHA mitigation techniques, paving the way for a safer, more user-friendly online experience.
Conclusion
In an era where online security and user privacy are paramount, it is crucial to find innovative solutions that strike a balance between protecting sensitive information and ensuring a seamless user experience. Throughout this blog post, we have delved into the world of CAPTCHA-related challenges and explored the remarkable solution known as PrivacyPass.
PrivacyPass offers a unique approach to combating the limitations and drawbacks of traditional CAPTCHAs. By leveraging cryptographic techniques and user-centric design principles, it revolutionises the way we authenticate human users online. By generating Private Access Tokens that vouch for a user’s human identity, PrivacyPass not only improves user experience by eliminating repetitive CAPTCHA-solving tasks but also enhances security by mitigating automated attacks.
As we conclude, it is essential to recognize the significance of PrivacyPass in the broader context of online security. By adopting PrivacyPass, organisations can create a safer and more user-friendly online environment. Users can enjoy a streamlined browsing experience while knowing that their privacy is protected. PrivacyPass has already garnered attention and adoption from prominent organisations, demonstrating its effectiveness and potential impact.
We encourage readers to explore PrivacyPass further and consider its adoption as a viable solution to CAPTCHA-related challenges. By implementing PrivacyPass, organizations can enhance their security measures while improving the user experience. Stay updated on the latest developments in this field, as ongoing research and advancements will continue to shape the landscape of CAPTCHA mitigation.
In conclusion, PrivacyPass represents a significant advancement in the pursuit of robust online security and user privacy. By recapitulating the importance of online security, emphasizing the role of PrivacyPass in improving user experience, and encouraging its adoption, we can collectively strive for a safer, more user-centric online world.
L. von Ahn, M. Blum, N. J. Hopper, and J. Langford, “CAPTCHA: Using hard ai problems for security”, Lecture Notes in Computer Science, pp. 294–311, 2003. ↩︎
A. Davidson, I. Goldberg, N. Sullivan, G. Tankersley, and F. Valsorda, “Privacy pass: Bypassing internet challenges anonymously” Proceedings on Privacy Enhancing Technologies, vol. 2018, no. 3, pp. 164–180, 2018. ↩︎
Top comments (0)