How leaky can it Git
This amazing “octoverse” gathers more than 50 million developers working on their personal and/or professional projects. So when 60 million repositories are created in a year and nearly 2 billion contributions* are added, some mistakes can happen, such as leaked secrets, Intellectual Property or PII.
Some companies may think: I don’t really care about public GitHub, we are not open sourcing our code, everything is stored on our private repositories. But what about the developers of these companies… they most likely have open source repositories and can leak secrets.
Let’s now focus on secrets. You would say that secrets stored in internal Version Control Systems is a very bad practice but in fact it is much more frequent than you would think. But why is that?
API keys, database connection strings, private keys, certificates, usernames and passwords… As organizations move to cloud architectures, SaaS platforms and microservices, developers handle increasing amounts of sensitive information, more than ever before.
To add to that, companies are pushing for shorter release cycles, developers have many technologies to master, and the complexity of enforcing good security practices increases with the size of the organization, the number of repositories, the number of developer teams and their geographical spread.
At GitGuardian, we’ve been monitoring every single commit pushed to public GitHub since July 2017. Three and a half years later…
millions of secrets
and sent nearly
1 million pro bono alerts
to developers in 2020 alone.
created last year
more contributions to open source projects
We launched this audit, and several leaked secrets were brought to our attention. What was very interesting and what we didn’t anticipate was that most of the alerts came from the personal code repositories of our developers.
Anne Hardy, CISO
Usually these leaks are unintentional, not malevolent. They happen because:
• Developers typically have one GitHub account that they use both for personal and professional purposes, sometimes mixing the repositories.
• It is easy to misconfigure git and push wrong data.
• It is easy to forget that the entire git history is still publicly visible even if sensitive data has since been deleted from the actual version of source code.
Human error exists,
but the key is to be alerted
and be able to take appropriate action when a leak is found.
Secrets are digital authentication credentials that grant access to services, systems and data (API keys, usernames and passwords, or security certificates). The volume and diversity of these digital authentication credentials is growing fast as architectures move to the cloud but also rely on more and more components and apps.
Our larger customers, with 2,000 or more employees, deploy an average of 175 apps per customer, while our smaller customers, with 1,999 or fewer employees, deploy an average of 73 apps per customer.*
All these categories of secrets expose companies to easy and direct attacks. Cloud provider and data storage secrets by data loss but also by allowing infrastructure suppression. Identity provider and messaging system by allowing legitimate identity usage.
Publicly disclosed examples of recent data breaches through leaked credentials.
Hackers discovered credentials in a personal public repository on GitHub that granted access to a database containing private information of thousands of Uber drivers.
JumpCloud API key found in GitHub repository.
Such knowledge of leaked credentials comes with a great responsibility. We alert developers in a pro bono manner. Here is an idea of the volume of alerts we sent in 2020.
ALERTS WERE SENT PRO BONO
DEVELOPERS WERE ALERTED PRO BONO
GitGuardian’s algorithm reaction to a leak is 4 seconds (Mean Time To Detect). The alert is sent right away.
25 minutes Median Time To React. The developer is on the front line of the issue, which allows to nullify most of the potential damage very quickly, if the developer takes. immediate action after the alert.
When a secrets detection solution is in place, security teams also receive dual alerts to make sure they can follow up, remediate and report easily on security incidents.
If you leave your keys to your house in the lock and you notice they are gone then you change the locks.
Gitignore allows you to tell what file you don’t want to commit. Your files containing your secrets should be listed in your gitignore file but your secrets should not be described in plain text in your gitignore file… Hundreds of developers committed this mistake in 2020.
If you search GitHub for “removed AWS key” you will see thousands of results. Removing a hardcoded secret and pushing a new commit only buries the secret in the history, making it harder for you to find but still accessible to attackers.
Companies can’t avoid the risk of secrets exposure even if they put in place centralized secrets management systems. These systems are typically not deployed on the whole perimeter and are not coercitive as they do not prevent developers from hardcoding credentials stored in the vault.
Solutions are available for them to automate secrets detection and put in place the proper remediation, but the market is far from mature on this subject.
Companies need to scan not only public repositories but also private repositories to prevent lateral movements of malicious actors.
Some best practices can be followed to limit the risk of secrets exposure or the impact of a leaked credential:
• Never store unencrypted secrets in .git repositories
• Don’t share your secrets unencrypted in messaging systems like Slack
• Store secrets safely
• Restrict API access and permissions.
Developers training programs should be put in place although these do not eradicate the risk of leaked credentials.
Following best practices is not sufficient and companies need to secure the SDLC with automated secrets detection. Choosing a secrets detection solution they need to take into account:
• Monitoring developers’ personal repositories capacities
• Secrets detection performance* – Accuracy, precision & recall
• Real-time alerting
• Integration with remediation workflows
• Easy collaboration between Developers, Threat Response and Ops teams.
There are millions of commits per day on public GitHub, how can organizations look through the noise and focus exclusively on the information that is of direct interest to them? How can they make sure their secrets are not ending on their developers’ personal repositories on GitHub?
They can’t avoid that developers have personal repositories, they need automated detection and efficient remediation tools.
In this state of secrets sprawl on GitHub analysis we focused on secrets although this is not the only sensitive information that can end up being publicly exposed: Intellectual Property, personal and medical data are also at risk. But this is for another State of Report!
About GG detection engine,
data gathering & methodology