Why is it hard to detect secrets like API keys and other credentials?

Secrets detection is probabilistic—that is to say that it is not always possible to determine what is a true secret (or true positive). Because secrets share very few common, distinctive factors, decisions must be taken by aggregating weak signals to make strong predictions.

One of the most common factors is that almost all secrets are strings that look random. We call these strings high entropy strings. The issue is that this common factor is not very distinctive: 99% of strings that look random in source code aren’t secrets. They are for example database IDs or other types of false positives.

Of course, some secrets have fixed patterns: AWS keys often start with AKIA for example. But most secrets do not. The portions of code surrounding them are also very different, depending on what the secrets are used for and how they are used by the developers in the context of specific applications. Usernames and passwords can be used to authenticate in many different ways, and it is really hard to distinguish between real and fake credentials used as placeholders for example.

All this makes it extremely challenging to accurately capture all true secrets without also capturing false positives. At some point, a line in the sand needs to be drawn that considers the cost of a secret going undetected (a false negative) and compares it to the outcome that too many false positives would create. Different organizations with different people, cultures and workflows will draw different lines!

Why do code reviews fail at finding secrets in source code?

Code reviews are great overall for detecting logic flaws or maintaining certain good coding practices. But they are not adequate protection for detecting secrets, mostly for two reasons:

Reviews generally only consider the net difference between the current and proposed states. Not the entire history of changes. If a commit adds a secret and another one later deletes it, this has a zero net effect that is not of any interest to reviewers. But the vulnerability is there.
Reviewers prefer to focus on errors that cannot be automatically detected, like design flaws. As a general principle, security automation should be implemented wherever it can be, so that humans focus on where they bring the most value.

What is a "good" secrets detection algorithm?

Detecting secrets in source code is like finding needles in a haystack: there are a lot more sticks than there are needles, and you don’t know how many needles might be in the haystack. In the case of secrets detection, you don’t even know what all the needles look like!

Ideally, you want your detection system to achieve at the same time:

A low number of false alerts raised. We call this high precision. Precision answers the question: «What is the percentage of the secrets that you detect that are actual secrets?». This question is perfectly legitimate, especially in the context of security teams being overwhelmed with too many alerts.
A low number of secrets missed. This is what we call high recall. Considering that a single undetected credential can have a big impact for an organization, some organizations prefer to triage more false alerts but make sure they don’t miss a secret.

Balancing the equation to ensure that the algorithm captures as many secrets as possible without flagging too many false results is an intricate and extremely difficult challenge. Read more on evaluating secrets detection algorithms.

What is a false positive in secrets detection?

A false positive in secrets detection refers to when a secret candidate is wrongly marked as a true secret when it is in fact a non-sensitive string.

Typical examples of strings that can be mistaken for true secrets are:

UUIDs (Universally Unique Identifiers) that are used for example by databases as unique keys.
High entropy URLs or file paths. They often contain random strings that look like keys.
Test keys. Service Providers sometimes provide developers with a set of test credentials with a very restricted scope so developers can exercise parts of the API without charging their account.
Public keys. As surprising as it is, a very small number of keys are really meant to be public (like Firebase keys, see this Stack Overflow conversation).

Are secrets detection algorithms language-dependent?

Secrets detection is, for the most part, not language specific. Of course, there are some subtleties to take into account, like the way variables are assigned in any programming language. But there is no need to support all the different syntaxes in their greatest details. This means that the same algorithms can be applied to any project, in any programming language, without using things like Abstract Syntax Trees.