Wake-up call: why it's urgent to deal with your hardcoded credentials

#git #security #devops #opensource

It is clear today that the year 2021 will go down in the annals of IT security as the year when organizations really became aware of their inevitable dependence on open-source, and more importantly, of the risks posed by unsupervised supply chains.

High-profile security incidents like the SolarWinds, Kaseya, and Codecov data breaches have shaken enterprises’ confidence in the security practices of third-party service providers.

Today corporations, open source projects, nonprofit foundations, and even governments are all trying to figure out how to improve the global software supply chain security. While these efforts are more than welcome, for the moment, there is hardly any straightforward way for organizations to improve on that front.

On the other hand, despite being largely acknowledged as one of the most common entry points for hackers, a type of vulnerability remains largely unwatched: hard-coded credentials in source code.

In this article, we want to defend a simple fact: focusing on what you can control now can greatly improve your organization's security posture.

Secrets-in-the code is something you can and should start monitoring now. Here's why.

Hard-coded secrets have never been easier to find

On July 3, 2022, the CEO of crypto-currency giant Binance warned of a massive breach:

The bug? A fragment of source code containing the secret for a titanic database of personal information was allegedly copied and pasted onto a developer's blog of the Chinese CSDN network. Source

Code snippet copied to the CSDN blog contained a critical secret possibly causing a massive breach.

No matter if secrets have been left in code because of malintent or negligence, they are always a boon for hackers. All sorts of playbooks can be deployed upon finding leaked credentials, even the most innocuous ones, such as a Twitter API key: from spearphishing, to privilege escalation and data exfiltration.

Secrets (username and passwords, API tokens, encryption keys, etc.) are the most sought-after digital assets for cybercriminals and they have never been easier to find: last year, we detected that 6 million secrets were pushed (mostly inadvertently) as commits on public GitHub, twice the amount detected in 2020. According to the latest DBIR report, the “use of stolen credentials” is by far the most common way to breach web applications, with more than 80% of the breaches attributed to this attack vector, while “vulnerability exploitation” is directly responsible for less than 20% of the cases.

The tactic of using stolen credentials was particularly persistent in BWAA (Basic Web Application Attacks), which the DBIR team defined as an actor "directly" targeting exposed instances, such as web servers and email servers. They also referred to it as a "low-cost, high-pay-off strategy," which is attractive to an array of attackers.

Source

Of course, here the term "stolen credentials" encompasses a variety of cases, including probably phishing and user info bought on the dark web. But if we consider an organization crafting digital services and products, this conclusion applies to secrets-in-code. In fact, for a code-producing organization, making sure secrets are kept out of source code should be as evident as implementing SSO and MFA.

In the same report, we can read that:

There’s been an almost 30% increase in stolen credentials since 2017, cementing it as one of the most tried-and-true methods to gain access to an organization for the past four years.

Another report from IBM came to the same conclusion:

Use of stolen or compromised credentials remains the most common cause of a data breach. Stolen or compromised credentials were the primary attack vector in 19% of breaches in the 2022 study and also the top attack vector in the 2021 study, having caused 20% of breaches. Breaches caused by stolen or compromised credentials had an average cost of USD 4.50 million. These breaches had the longest lifecycle — 243 days to identify the breach and another 84 days to contain the breach. Phishing was the second most common cause of a breach at 16% and also the costliest, averaging USD 4.91 million in breach costs.

How to explain then that secrets-in-code is still one of the most overlooked vulnerabilities in the application security space? We have come to the conclusion that hard-coded secrets are still poorly understood compared to other application security vulnerabilities.

Secrets are not a runtime vulnerability (they’re much worse)

Looking at the 2022 CWE Top 25 Most Dangerous Software Weaknesses list, we can see that "Use of Hard-coded Credentials" (CWE-798) is in position 15, up from 16 in the previous year. But the most interesting fact here is not so much the ranking: it is rather that, unlike all the other "weaknesses" on the list, the use of hard-coded secrets is not an execution vulnerability. In other words, it doesn't require running software to be a vulnerability.

When we hear about application vulnerabilities, we are used to thinking about Cross-Site Request Forgery (CSRF), Server-Side Request Forgery (SSRF), XML External Entity (XXE), logic flaws, etc. They all require the software to be running in order to be exploited. With hard-coded credentials, it's the source code itself that can be exploited. Therefore, your attack surface comprises your repositories and your entire software factory. This is a truly unique characteristic that has big implications.

First, hard-coded credentials go where source code goes, making tracking almost impossible. Source code is usually cloned, checked out, and forked multiple times a month on machines inside or outside an organization's perimeter—not to mention leakage incidents.

Leaks do happen. Last year, after their codebases were exposed publicly, we examined Twitch and Samsung’s repos with the same tool we use to protect our clients. In both cases, we found between 6,500 to 7,000 secrets ready to be employed: from company email passwords, to cloud services API keys, third-party tokens, or internal services authentication. If this also happened to NVIDIA and Microsoft, do you think it can’t happen to your organization?

Second, let's not forget that code under VCS control has a permanent history. A VCS such as git will keep track of any modifications done to a codebase and is also used to propagate these changes. Coupled with the fact that hardcoded credentials will be exploitable as long as they are not revoked, it means that still-valid secrets can be hiding anywhere on the VCS historical timeline. This opens a new dimension to the attack surface, that most security analyses will never see because they are only concerned with the current, ready-to-be-deployed, state of a codebase.

Therefore, unlike any other kind of vulnerability, hardcoded credentials accumulate in time. Much like technical debt, managing this vulnerability is a Tetris game: “It’s ok if it builds up a bit, as long as you have a plan to reduce it later.” Except that you don’t see the bricks until you put detection in place. And the more developers (in the past, present, or future), the higher the probability that a secret has been or will be committed. The bigger the codebase, the higher the number of potentially exploitable secrets. Past a certain point, it simply becomes unmanageable.

From our own report, this is the average situation a typical software shop will be facing:

On average, in 2021, a typical company with 400 developers and 4 AppSec engineers would discover 1,050 unique secrets leaked upon scanning its repositories and commits. With each secret detected in 13 different places on average, the amount of work required for remediation far exceeds current AppSec teams' capabilities (1 AppSec engineer for 100 developers).

These are the reasons why the right moment to start taking action against secrets sprawl will always be now (if not yesterday!).

How to avoid hard-coded credentials?

What actions can you take? Implement the right controls to stop credentials from entering into the codebase in the first place (we often use the analogy “stop the bleeding”). Catching hardcoded secrets in real-time is the best way to curb the progress of secrets sprawl. Catch them before they even become commits.

Let’s take a concrete example: a developer commits by mistake a secret in his local working environment. If he’s alerted at that moment, the unitary cost of remediation is less than a minute. The credential has only been exposed on the local workstation and the developer can run a few commands to "amend" their commit. Conversely, if the credential reaches the central repository, it should be considered compromised since it became available to anyone with read access, unauthorized developers and potential APTs included. From that point, a full remediation cycle should be triggered.

Revoking and rotating keys typically involve multiple teams. A developer will probably need to address a request to a Cloud Ops team, who will in turn have to halt a few workflows in the process, such as CI/CD pipelines or even production workers.

Time adds up quickly and in our experience average cost of remediation is often at least in the 2 man-hours ballpark.

The maths are straightforward: for a 400 developers shop where we uncovered 1,050 unique secrets, we can approximate that at least 2,100 man-hours would be required to reach “zero secrets-in-code”—assuming that no more secrets would leak in the future!

The bottom line is that your ROI will be much better by starting small but quickly and growing than by trying to fix everything from the start (in particular the “stock” of past incidents).

Focusing on resolving past incidents while doing nothing to prevent new ones will not have the intended results of lowering the total count of exposed secrets and, in turn, reducing the effort needed for remediation in the future.

In our experience, setting up the right guardrails frees considerable resources for the AppSec team without creating unnecessary friction, since it no longer has to deal with hundreds of verified incidents every month. You can read more about it in this article:

How to remediate thousands of hardcoded secrets incidents

Don’t sleep on secrets

Keeping up with cybersecurity threats in real-time is not just difficult, it is impossible. Not only do new adversaries and TTPs emerge daily, but in recent years the bar to a successful attack has been lowered so much that it is not a surprise anymore to see teenagers take over some of the biggest tech companies.

We reckon information security is a complex field and no decision should be made based on opinions, intuition, or simply fear. But we think that too often the basics can be overlooked although they are consistently pointed at for being the root cause of most breaches. For organizations where developers write code, secrets must be well supervised so as not to offer low-hanging fruits to malicious people.

There is an urgency to deal with hard-coded credentials because the more an organization waits, the riskier the situation gets, and the costlier the security debt.