What is considered to be a sensitive information ?
Sensitive information can be a lot of things, such as a password, a SSH key or an authentication key.
Where does the data come from ?
All the data is obtained through the GitHub API (scanning public commits).
What do you do with sensitive information when discovered ?
Every time we flag a sensitive leak, our first and only action is to write an email to the developer who did the commit.
How is sensitive information detected ?
Some authentication keys share a given prefix and thus can be easily matched with regular expressions. But sometimes it is not so obvious that a given information is sensitive. For example, authentication keys can often be confused with database ids (both strings have an high-entropy). We disambiguate these cases with machine learning algorithms, through an automatic analysis of the context in which the presumably sensitive information appeared : all the code surrounding it, the filename, and the commit message.
There is some sensitive data on my repository. What should I do ?
Once you have pushed sensitive information to GitHub, you must consider that this information is public, even if it took you only a few seconds to delete it. If you committed a password, change it. If you committed a key, delete all the rights associated with this key, and generate a new one.
Why is it that you flagged my commit as sensitive, whereas it’s perfectly regular ?
Because we use some machine learning algorithms to automatically detect sensitive information based on a lot of variables, it happens sometimes that our algorithms misjudged the situation. Inform us about it so that we can take action.
How do you finance this project ?
As stated on our home page, GitGuardian is entirely free for developers and will always be. Companies and Organizations can subscribe to a paid plan.