What are secrets in the software development world?

In everyday language, a secret can be any sensitive data that we want to keep private. When discussing secrets in the context of software development, secrets generally refer to digital authentication credentials that grant access to systems or data. These are most commonly API keys, usernames and passwords, or security certificates.

Secrets exist in the context of applications that are no longer standalone monoliths. Applications nowadays rely on thousands of independent building blocks: cloud infrastructure, databases, SaaS components such as Stripe, Slack, HubSpot… 

Secrets are what tie together these different building blocks of a single application by creating a secure connection between each component.

What is secret sprawl?

Secret sprawl is the unwanted distribution of secrets like API keys and credentials through multiple systems. 

In modern software development, secrets are regularly used by developers and applications. As a result they often get shared through different services like Slack or email and can be stored in multiple locations including different machines, git repositories or inside company wikis. This is a phenomenon we call secret sprawl.

What are some of the best practices to securely manage secrets like API keys?

Storing and managing secrets like API keys and other credentials can be challenging. Even the most careful policies can sometimes be circumvented in exchange for convenience. We have put together a helpful cheat sheet and article explaining the best practices for managing secrets.

As a minimum here are some points you should consider:

Never store unencrypted secrets in git repositories
Avoid git add * commands on git
Add sensitive files in .gitignore
Don’t rely on code reviews to discover secrets
Use automated secrets scanning on repositories
Don’t share your secrets unencrypted in messaging systems like Slack
Store secrets safely (method to be used depends on every use case)
Default to minimal permission scope for APIs
Whitelist IP addresses where appropriate
Use short-lived secrets

What are the threats associated with secret sprawl?

When secrets are sprawled through multiple systems it increases what is referred to as the ‘attack surface’. This is the amount of points where an unauthorized user could gain access to your systems or data. In the case of secret sprawl, each time a secret enters another system it is another point where an attacker could gain access to your secrets.

Most internal systems are not an appropriate place to store sensitive information, even if those systems are private. No company wants credit card numbers in plaintext in databases, PII in application logs, bank account credentials in a Google Doc. Secrets must benefit from the same kind of protective measures.

As a general security principle, where feasible, data should remain safe even if it leaves the devices, systems, infrastructure or networks that are under organizations’ control, or if they are compromised. This helps prevent credential stealing, which is a well-known adversary technique described in the MITRE ATT&CK framework:

“ Adversaries may search local file systems and remote file shares for files containing passwords. These can be files created by users to store their own credentials, shared credential stores for a group of individuals, configuration files containing passwords for a system or service, or source code/ binary files containing embedded passwords. ”

Secrets accessed by malicious threat actors can lead to information leakage and allow lateral movement or privilege escalation, as secrets very often lead to other secrets. Furthermore, once an attacker has the credentials to operate like a valid user, it is extremely difficult to detect the abuse and the threat can become persistent.

What makes secret sprawl such a common problem?

We know that secrets are necessary to tie together different components of an application. Development and Operations teams need constant access to these secrets to build, connect, test and deploy applications. As a result, secrets represent a special type of information that needs to be both tightly-wrapped and widely-distributed.

It is no secret that enforcing good security practices at the organization level is hard. Developers today are faced with large turnover inside companies, and are often spread across many different teams and geographies. They must master a growing number of technologies and are under increasing pressure due to shortened release cycles. This makes secret management very complicated and an ever-changing challenge.

This is further complicated when VCS like git are introduced because secrets can be buried deep inside the history. The current version of source code might look clean, while the history might contain credentials that were added, removed, and then completely forgotten about. Valid secrets in the git history represent a real threat and any secret that reaches the VCS must be considered compromised.

What is the state of secrets sprawl on GitHub?

At GitGuardian, we’ve been monitoring every single commit pushed to public GitHub since July 2017. Three and a half years later, we’ve uncovered millions of secrets and sent nearly 1 million pro bono alerts to developers in 2020 alone. The State of Secrets Sprawl report measures this exposure of secrets within public repositories on GitHub and how this serious threat is evolving year to year.