The State of Secrets Sprawl report 2024 is now live!
DOWNLOADDOWNLOAD

THE STATE OF

SECRETS SPRAWL

2023
Download report
Arrow down
10M (+67%)

new secrets detected

in public GitHub commits in 2022

We have never detected as many secrets, and secrets sprawl has been accelerating yearly since 2020.

Hard-coded secrets increased by 67% compared to 2021, whereas the volume of scanned commits rose by 20% (from 860M to 1.027B commits between 2021 and 2022).

Hard-coded secrets have never been a more significant threat to the security of people, enterprises, and even countries worldwide.

IT systems, open-source, and entire software supply chains are vulnerable to exploiting keys left by mistake in source code.

As the world's digital footprint grows, millions of such keys accumulate every year, not only in public spaces such as code-sharing platforms but especially in closed spaces such as private repositories or corporate IT assets.
In other words, secrets sprawl on GitHub is only the tip of the iceberg.

This wouldn’t be so concerning if credentials theft weren’t the most common cause of data breach. The 2022 editions of Verizon’s DBIR and the IBM Cost of a data breach reports highlighted that this attack vector remains the top concern since 2021:

“Use of stolen or compromised credentials remains the most common cause of a data breach. Stolen or compromised credentials were the primary attack vector in 19% of breaches in the 2022 study and also the top attack vector in the 2021 study, having caused 20% of breaches. Breaches caused by stolen or compromised credentials had an average cost of USD 4.50 million.”*

*From the IBM Cost of a data breach 2022

Quote icon

A look back at 2022 major incidents

Secrets are found in one way or another in most of the security incidents that happened in 2022. We can classify them into three categories:

Secrets exploited in an attack

Sept.
15

An attacker breached Uber and used hard-coded admin credentials to log into Thycotic, the firm’s Privilege Access Management platform. They pulled a full account takeover on several internal tools and productivity applications.

Dec.
29

An attacker leveraged malware deployed to a CircleCI engineer’s laptop to steal a valid, 2FA-backed SSO session. They could then exfiltrate customer data, including customer environment variables, tokens, and keys.

Stolen source code repositories

Feb.
25

NVIDIA source code is leaked by the Lapsus$ group.

Mar.
7

200GB of Samsung source code is leaked, revealing 6,695 hard-coded secrets.

Mar.
22

250 Microsoft projects are leaked, revealing 376 hard-coded secrets.

Aug.
11

LastPass source code is stolen, leaking credentials and keys used months later to access and decrypt storage volumes.

Nov.
1

Dropbox disclosed that 130 stolen code repositories contained API keys.

Dec.
21

Okta admitted a breach of its GitHub repositories resulting in source code theft.

Dec.
27

Slack employee tokens are stolen and misused to download private code.

Secrets exposed publicly

Sept.
1

Research reveals 18,000+ Android apps leak hard-coded secrets.

Oct.
7

Toyota disclosed a contractor exposed a credential giving access to user data on GitHub for five years.

Nov.
16

Tom Forbes disclosed Infosys leaked FullAdminAccess AWS keys on PyPi for over a year (and then 57 other AWS keys on PyPi).

1.027B commits scanned by GitGuardian (+20% compared to 2021)

About GitHub in 2022

94M developers (+27%)

HCL (Hashicorp Configuration Language) is the fastest-growing language on GitHub.

PUBLIC MONITORING

85.7M+ new repositories (+20%)

From the Octoverse 2022

How leaky was 2022?

10M

secrets occurrences detected in 2022 (3M unique secrets)

1 in 10

authors exposed a secret in 2022

5.5

commits out of 1,000 exposed at least one secret (+50%)

GitHub's organic growth and the improvements of our detection engine (including +35 new detectors in 2022) partly explain the growth in the number of detected secrets. But all things equal, there is no doubt:

Secrets sprawl continues to expand worldwide.

Secrets sprawl over the years

Map of secrets leaks

For GitHub profiles mentioning location.

01
India
02
China
03
USA
04
Brazil
05
Germany
06
Nigeria
07
South Korea
08
Bangladesh
09
France
10
Russia
Countries ranking of most secrets leaks
Map of countries based on amount of secrets leaked

GitGuardian uses two classes of detectors: specific and generic.

Specific detectors match recognizable secrets, like an AWS access key or MongoDB database credentials.
In 2022, our specific detectors accounted for 33% of the secrets detected. Here are some of the top specific secrets caught in 2022.

Graph: Spread by category for unique specific secrets
Graph: Spread by detector for unique specific secrets
Graph: Spread by detector for generic secrets

On the other hand, generic detectors match a broad range of secrets, for example, a company email and a password that would end up hard-coded in a file.

In a detection strategy, generic detectors are essential to ensure that no valid secrets fall through the cracks of specific detectors. To maximize precision and avoid false positives, each uses a carefully crafted set of conditions (regarding the filename, the file path, the surrounding context, etc.)

In 2022, they accounted for 67% of the secrets detected, which shows their importance.

How does secrets sprawl threaten software supply chain security?

When weighing the risk posed by secrets sprawl, it’s essential to consider the ensemble of hard-coded plaintext secrets rather than individual secrets taken separately: the more secrets there are, the more potential attack vectors there are for a malicious actor.

The keychain [...] symbolizes the collection of one or more scattered secrets the attacker finds throughout the target environment. Although both components are individually unhygienic, they form a fatal compound when combined.

Quote icon
Quote icon

Hell’s Keychain illustrates how scattered plaintext credentials across your environment can impose a huge risk on your organization by impairing its integrity and tenant isolation. Moreover, the vulnerability emphasizes the need for strict network controls and demonstrates how pod access to the Kubernetes API is a common misconfiguration that can result in unrestricted container registry exposure and scraping.

From code to cloud: Infrastructure as code

A single misconfiguration in an IaC manifest can break a security policy and make the deployed infrastructure vulnerable to attacks.

Infrastructure as code is an entirely new attack surface to protect.

We estimate that 21.52% of all Terraform repositories have one or more security policy vulnerabilities.

Graph: Number of vulnerabilities per Terraform repos

The most common IaC vulnerabilities are:

  1. Networking misconfigurations: unrestricted egress or ingress traffic can expose assets to attacks such as remote code execution. The use of HTTP instead of HTTPS is also frequent.
  2. Data exposure misconfigurations: S3 buckets without encryption can lead to data leakage.
  3. Secrets: exposing a sensitive environment variable in the configuration can lead to a plain text credentials leak.
  4. Permission misconfigurations: using the default service account on a compute instance allows an attacker to spread through the network.

Conclusion

Secrets sprawl continues accelerating and there will never be a better time to act. With 10 million secrets discovered in public GitHub commits and countless more silently accumulating behind closed doors, this is one of the biggest threats to the security of the digital world.

With attackers recognizing that compromising machine or human identities yields a higher return on investment, especially when targeting software supply chain components, this trend will unfortunately not dissipate soon.

Companies understand that source code is one of their most valuable assets and must be protected. To scale these efforts will require bringing security and development closer and advancing towards an application shared responsibility. It is valid for the detection and remediation of incidents as well.

The very first step is always to get a clear audit of the organization's security posture regarding secrets: where and how are they used? Where do they leak? How to prepare for the worst? You can start right away by taking the (anonymous) secrets management maturity questionnaire and learn where to go from there with this white paper. 

Or you can request a complimentary audit of your company’s exposed secrets on GitHub.

Organizations must be prepared for secrets sprawl and have the right tools and resources to promptly detect and remediate any issues.

It’s time to take action!

About GitGuardian

GitGuardian is the leader in automated secrets detection. The company has raised a $56M total investment from Eurazeo, Sapphire, Balderton, and notable tech entrepreneurs like Scott Chacon, co-founder of GitHub, and Solomon Hykes, co-founder of Docker.

GitGuardian Internal Monitoring helps organizations detect and fix vulnerabilities in source code at every step of the software development lifecycle. With GitGuardian’s policy engine, security teams can monitor and enforce rules across their VCS, DevOps tools, and infrastructure-as-code configurations.

With more than  230k installs, GitGuardian is by far

Banner: The #1 security application on the GitHub Marketplace

Its enterprise-grade features enable AppSec and Development teams to deliver secret-free code in a truly collaborative manner. By pulling developers closer to the remediation process, organizations can achieve shorter fix times.

Its detection engine is trained against more than a billion public GitHub commits every year, and it covers 350+ types of secrets such as API keys, database connection strings, private keys, certificates, and more.