The architecture of modern software development has changed. We now rely on hundreds of microservices, SaaS platforms and cloud infrastructure to build our applications. Secrets like API tokens, credentials and security certificates are the glue that connects these services. Think of these as the modern-day keys to the kingdom. Developers that understand how sensitive these keys are, also know that if these keys leaked onto public git repositories it can lead to serious data breaches. But how worried should you actually be if you leak secrets into public git? This is a question that was given a great answer recently by Andrzej Dyjak in a popular experiment posting to Twitter.
What was the experiment
To understand how attackers find and use secrets that they find, Andrezj set up a simple experiment:
- Generate a secret
- Commit to a public repository
- Monitor the results
In order to be able to monitor the traffic from the leaked secret, a canary token was used. This is a great service that allows you to generate tokens for popular services such as Slack or AWS that track and alert when someone is trying to use them either through a webhook or email.
For this experiment, an AWS token was generated from canary tokens which were then uploaded to public repositories in both GitHub and GitLab.
As it happened
Because this experiment happened across two platforms, we will break down the results separately.
See original timeline and post from Andrzej Dyjk
GitHub
- 15:27 The token was pushed into a GitHub repository.
- 15:34 (7 minutes) An email from GitGuardian was delivered to the commit email address alerting of a potential data breach.
- 15:38 (11 minutes) The AWS token was compromised for the first time.
- 17:40 (2 hours +) The token was compromised an additional 5 times with traffic coming from Germany, Netherlands, United Kingdom and Ukraine.
According to User-Agents the bots used by the malicious users were Python and Node.js SDKs
GitLab
- 16:24 The tokens were pushed to GitLab
- 17:26 (62 minutes) The token was compromised for the first and last time with traffic coming from France.
The attacker used a Python SDK bot according to User-Agent
Results
Firstly comparing the results from both GitHub and GitLab what we can see is that there is significantly more black hat activity on GitHub when we compare it to GitLab. There are two contributing factors to this. The first is most certainly the considerable size difference between the two repository hosts in both users and activity. GitHub boasts 50 million users with 2.5 million commits made per day while GitLab sits at only 1% of this with 500,000 users. The second contributing factor is GitHub's open API which has a read-only feature for all public events. While this API creates powerful community features that can be used for legitimate and helpful applications, it does make it easier for malicious actors to scan code. Although GitLab does have an API, they do not have an API for public events in the same way GitHub does making it more difficult to scan through public code at scale.
This experiment highlights what many security professionals know which is that GitHub and GitLab are both well known to malicious actors as places that contain a trove of sensitive information like secrets. This, of course, is backed up by many breaches including the now infamous data breach of Uber which had an amazon S3 bucket exploited when credentials were uploaded into a public git repository. But, this experiment also highlights something much more important, and that is malicious actors are indiscriminate in who they target. It is a problem that not only affects large corporations but one that also affects everyone, right down to an individual developer. This is why it is not only important to have security tools and infrastructure in place for organizations, but also community tools that alert developers too.
Prevention
Preventing such a data breach can be difficult because it is often the result of human error. While this is difficult to eliminate, there are fortunately tools out there that can reduce the risk and impact.
Check out this great resource for API best practices to help prevent a data breach in the future.
Secrets detection
Leaking secrets is a unique challenge because it is one caused largely from human error, this is very hard to defend against. In addition to implementing best practices, you should also have secrets detection in place. Not just in public repositories but also in private repositories. When it comes to secrets detection you have the options of going for commercial detection methods like GitGuardian, or, open-source solutions like Truffle-Hog.
While open-source projects are appealing, in the case of secrets detection you can get a large number of false positives which can disrupt workflow. They also come without alerting capabilities which means any vulnerability may be missed completely and as the experiment outlined, time is key. Check out this comparison for comparing Truffle-hog and Gitguardian and decide for yourself what way to go. The only thing to say for sure is to make sure a protection layer is in place.
Zero Trust
Zero trust infrastructure is a great way to add another layer of protection, but it in itself is not enough protection. Zero trust outlines that even someone with valid authentication credentials needs to still prove they are an authorized party. This can be done with IP range validation and 2-factor authentication for example. This, unfortunately, is not possible with all services and not a failsafe, but used within a strategy can prove to be effective. Read more about zero trusts with regards to leaked credentials.
Wrap up
What we can see without question is that malevolent actors are scanning GitHub and GitLab with automated tools to find and exploit leaked credentials. It doesn't matter if you are a fortune 500 company or developer working on personal projects, a leaked credential can affect everyone and it only takes a matter of minutes for attackers to find and exploit vulnerabilities.
To prevent this implement secret scanning within git repositories with automated alerts and where possible implement zero-trust environments.
Top comments (5)
I hate to be pedantic, but: GitHub and GitLab aren't two version control systems. Git itself is the VCS that both use. GitHub and GitLab are two repository hosting services.
Cheers for the note. I will make an adjustment now :)
Now you are being pedantic Oh You : D
It's not listing two-factor authentication. Is there a reason why?
I guess 2FA more fall towards credentials and not so much API tokens. But if you consider other objectives from a zero-trust framework IP whitelisting is considered in the article.