DEV Community

loading...
Cover image for What actually happens when you leak credentials on GitHub: The experiment

What actually happens when you leak credentials on GitHub: The experiment

mackenziejj profile image mackenziejj Originally published at thesecuritybash.com ・5 min read

The architecture of modern software development has changed. We now rely on hundreds of microservices, SaaS platforms and cloud infrastructure to build our applications. Secrets like API tokens, credentials and security certificates are the glue that connects these services. Think of these as the modern-day keys to the kingdom. Developers that understand how sensitive these keys are, also know that if these keys leaked onto public git repositories it can lead to serious data breaches. But how worried should you actually be if you leak secrets into public git? This is a question that was given a great answer recently by Andrzej Dyjak in a popular experiment posting to Twitter.

What was the experiment

To understand how attackers find and use secrets that they find, Andrezj set up a simple experiment:

  1. Generate a secret
  2. Commit to a public repository
  3. Monitor the results

In order to be able to monitor the traffic from the leaked secret, a canary token was used. This is a great service that allows you to generate tokens for popular services such as Slack or AWS that track and alert when someone is trying to use them either through a webhook or email.

Try Canarytokens for yourself

For this experiment, an AWS token was generated from canary tokens which were then uploaded to public repositories in both GitHub and GitLab.

As it happened

Because this experiment happened across two platforms, we will break down the results separately.

See original timeline and post from Andrzej Dyjk

GitHub

  • 15:27 The token was pushed into a GitHub repository.
  • 15:34 (7 minutes) An email from GitGuardian was delivered to the commit email address alerting of a potential data breach.
  • 15:38 (11 minutes) The AWS token was compromised for the first time.
  • 17:40 (2 hours +) The token was compromised an additional 5 times with traffic coming from Germany, Netherlands, United Kingdom and Ukraine.

According to User-Agents the bots used by the malicious users were Python and Node.js SDKs

GitLab

  • 16:24 The tokens were pushed to GitLab
  • 17:26 (62 minutes) The token was compromised for the first and last time with traffic coming from France.

The attacker used a Python SDK bot according to User-Agent

Results

Firstly comparing the results from both GitHub and GitLab what we can see is that there is significantly more black hat activity on GitHub when we compare it to GitLab. There are two contributing factors to this. The first is most certainly the considerable size difference between the two repository hosts in both users and activity. GitHub boasts 50 million users with 2.5 million commits made per day while GitLab sits at only 1% of this with 500,000 users. The second contributing factor is GitHub's open API which has a read-only feature for all public events. While this API creates powerful community features that can be used for legitimate and helpful applications, it does make it easier for malicious actors to scan code. Although GitLab does have an API, they do not have an API for public events in the same way GitHub does making it more difficult to scan through public code at scale.

This experiment highlights what many security professionals know which is that GitHub and GitLab are both well known to malicious actors as places that contain a trove of sensitive information like secrets. This, of course, is backed up by many breaches including the now infamous data breach of Uber which had an amazon S3 bucket exploited when credentials were uploaded into a public git repository. But, this experiment also highlights something much more important, and that is malicious actors are indiscriminate in who they target. It is a problem that not only affects large corporations but one that also affects everyone, right down to an individual developer. This is why it is not only important to have security tools and infrastructure in place for organizations, but also community tools that alert developers too.

Prevention

Preventing such a data breach can be difficult because it is often the result of human error. While this is difficult to eliminate, there are fortunately tools out there that can reduce the risk and impact.

Check out this great resource for  API best practices to help prevent a data breach in the future.

Secrets detection

Leaking secrets is a unique challenge because it is one caused largely from human error, this is very hard to defend against. In addition to implementing best practices, you should also have secrets detection in place. Not just in public repositories but also in private repositories. When it comes to secrets detection you have the options of going for commercial detection methods like GitGuardian, or, open-source solutions like Truffle-Hog.

While open-source projects are appealing, in the case of secrets detection you can get a large number of false positives which can disrupt workflow. They also come without alerting capabilities which means any vulnerability may be missed completely and as the experiment outlined, time is key. Check out this comparison for comparing Truffle-hog and Gitguardian and decide for yourself what way to go. The only thing to say for sure is to make sure a protection layer is in place.

Zero Trust

Zero trust infrastructure is a great way to add another layer of protection, but it in itself is not enough protection. Zero trust outlines that even someone with valid authentication credentials needs to still prove they are an authorized party. This can be done with IP range validation and 2-factor authentication for example. This, unfortunately, is not possible with all services and not a failsafe, but used within a strategy can prove to be effective. Read more about zero trusts with regards to leaked credentials.

Wrap up

What we can see without question is that malevolent actors are scanning GitHub and GitLab with automated tools to find and exploit leaked credentials. It doesn't matter if you are a fortune 500 company or developer working on personal projects, a leaked credential can affect everyone and it only takes a matter of minutes for attackers to find and exploit vulnerabilities.

To prevent this implement secret scanning within git repositories with automated alerts and where possible implement zero-trust environments.

Discussion (5)

pic
Editor guide
Collapse
webbureaucrat profile image
webbureaucrat

the two version control systems (VCS)

I hate to be pedantic, but: GitHub and GitLab aren't two version control systems. Git itself is the VCS that both use. GitHub and GitLab are two repository hosting services.

Collapse
mackenziejj profile image
mackenziejj Author

Cheers for the note. I will make an adjustment now :)

Collapse
eelstork profile image
Tea

Now you are being pedantic Oh You : D

Collapse
jankapunkt profile image
Jan Küster

Check out this great resource for API best practices to help prevent a data breach in the future.

It's not listing two-factor authentication. Is there a reason why?

Collapse
mackenziejj profile image
mackenziejj Author

I guess 2FA more fall towards credentials and not so much API tokens. But if you consider other objectives from a zero-trust framework IP whitelisting is considered in the article.