What is considered to be a sensitive information ?

Sensitive information can be a lot of things, such as a password, a SSH key or an authentication key.

Where does the data come from ?

All the data is obtained through the GitHub API (scanning public commits).

What do you do with sensitive information when discovered ?

Every time we flag a sensitive leak, our first and only action is to write an email to the developer who did the commit.

How is sensitive information detected ?

Some authentication keys share a given prefix and thus can be easily matched with regular expressions. But sometimes it is not so obvious that a given information is sensitive. For example, authentication keys can often be confused with database ids (both strings have an high-entropy). We disambiguate these cases with machine learning algorithms, through an automatic analysis of the context in which the presumably sensitive information appeared : all the code surrounding it, the filename, and the commit message.

There is some sensitive data on my repository. What should I do ?

Once you have pushed sensitive information to GitHub, you must consider that this information is public, even if it took you only a few seconds to delete it. If you committed a password, change it. If you committed a key, delete all the rights associated with this key, and generate a new one.

Why is it that you flagged my commit as sensitive, whereas it’s perfectly regular ?

Because we use some machine learning algorithms to automatically detect sensitive information based on a lot of variables, it happens sometimes that our algorithms misjudged the situation. Inform us about it so that we can take action.

How do you finance this project ?

As stated on our home page, GitGuardian is entirely free for developers and will always be. Companies and Organizations can subscribe to a paid plan.