Gartner®: Avoid Mobile Application Security Pitfalls

GET REPORT

Gartner®: Avoid Mobile Application Security Pitfalls

GET REPORT

The State of Secrets Sprawl 2022 - Breaking down the largest research project on leaked secrets

To participate in questions, polls and chat register for the event on CrowdsCast https://www.crowdcast.io/e/state-of-secrets-sprawl-2022-gitguardian On March 2nd GitGuardian will be releasing its annual ‘State of Secrets Sprawl Report’. This report analyzes all public commits made to GitHub throughout the year: How many secrets do you think GitGuardian detected in 2021?Highlights from last year report:2 million secrets (API keys, credential pairs, security certificates)India - The number 1 country to leak secretsGoogle API keys - Number 1 most leaked secretThis year we are going deeper. Not only are we looking into the number of secrets leaked in public GitHub repositories (we have some big surprises here!!) but we are also looking into secrets leaked in public Docker images and even private git repositories. Mackenzie Jackson, GitGuardian’s Developer Advocate, will be joined by Henri Hubert, Head of R&D, to discuss the big trends observed in 2021 and how to explain them. There will also be opportunities to ask questions directly to presenters and chances to win great prizes like SwagBags and Amazon GiftCards.

Video Transcript

[Music] uh [Music] [Music] [Music] [Music] uh [Music] [Music] [Music] all right hi everyone we are live today so really excited to be here with you and uh excited to have uh a guest with us who's well been with us before henry welcome to the to the live stream thanks right now if you give me a few minutes i'm gonna sort out this presentation um so welcome everyone very exciting day and uh i apologize for anyone that was here last week we did have a we did have a a quick change of the date as we're waiting for some things to come through but uh we're here now and we're super excited to be able to release the 2022 state of secret sprawl so give me a minute i'm just going to share my screen will this be available on demand yeah we'll be able to watch this uh on demand now i do want to let anyone know watching on youtube we're streaming this in two places crowdcast and youtube so if you want to participate there's going to be a lot of polls prizes you can ask questions you have to do that in crowdcast because we're not going to be monitoring youtube because i can only monitor one thing at a time there is a link in the youtube channel of how to get across to crowdcast if you want to come across over and ask some questions and participate so we're going to do some intros we're going to talk about the state of secret school report and we're going to break it into three categories public git repositories public docker images and internal repositories so while we wait a few minutes just for everyone to uh kind of come online i can see this there's quite a few of us now but just to let everyone know that there's going to be prizes now we have swag bags usually but this time we've upped it with amazon gift cards henry you cannot win an amazon gift card and anyone can guardian you're banned but uh we're going to be giving away swag bags for participation and we're going to be asking a lot of questions if you get the questions right you can win an amazon gift card just to clarify not everyone that gets it right we'll get one we'll pick a random winner so let us know in the chat because it's my favorite part i always like to know where everyone's tuning in from so get let us know in the chat uh where you're tuning in from love to find out we've already got a couple here i can see sunny pa mama whoa cool welcome welcome manhattan denver florida london netherlands uh tel aviv greece rwanda hey that's awesome alabama munich in germany canada france bonjour the avenue california canada india wow kenya some cool cool a few people from the african continent that's great paris we are also tuning in from paris denmark lyon wow some cool places iran oh wow we're really getting across the board here so uh and a lot of people from canada uruguay ah very cool all right so let's kick this off so what is the status secrets rule report so each year get guardian does an annual report where we look at all the secrets that have been leaked that we've discovered and we publish our findings about that now this is a little bit different last year we looked only at public github repositories so all the commits that were made to github this year we extended that we're looking on docker hub docker images and we're also looking at uh internal private repositories that get guardian is monitoring so we're extended this analysis so it's going to be pretty exciting and we're going to have some great uh results so some quick definitions uh you can read on the screen if you're unsure but when we talk about secrets we're talking about digital authentication credentials so henry what what are some digital authentication credentials that uh that we have you have passwords at first yep that can be used either by humans or by computers yep uh you also have database credentials and finally you can have like security certificates your api keys obviously right right and that's most of the secrets that you can find so anything that kind of authenticates a person or a system to another service yeah okay and secret sprawl we talk about this a lot what's secret sprawl secret sprawl is about the secrets being um sprinkled everywhere like you have a secret and then it ends up in your code in your systems in your computers in someone else's computer everywhere okay so like when we think about secrets it's something we obviously want to protect and we want to know where our secrets are yeah and secret sprawl is when we basically lose control of that okay so there's some quick definitions uh going through now the next thing i want to talk about henry is uh a methodology in trying to find these secrets so we're going to release some pretty staggering numbers and some pretty large numbers how can we be sure that they're actually correct what what is the process that we go through to be able to detect these secrets yeah so the process we go through for repositories is that we take every commit of a repository and we will scan its content so basically code and we will try to find patterns that are related to secrets like patterns for aws keys for google keys for security certificate we will go through that and and so we will find secrets so that's the first method that we use it's called specific detectors because we know uh to which service it relates and then we have what we call generic detectors that acts like a safety net that allows us to find secrets for which we don't know the the services they're related about and it's just based on context like you find something called secret with a high entropy and it's in a code that is related to secrets because it calls let's say a database and then because of that you can be pretty sure it's a secret okay okay so specific detectors generic detectors how many specific detectors do we have we have about 350. so 350 different types of secrets uh not including the generic detectors that we're uh discovering how many did we have last year on last year's secrets we had 250 okay so we have a lot wider search this year than last year will that affect the results at all yeah it will affect the results a lot okay so that's something to keep in mind when we're comparing these uh especially to last year is that because we've widened our capabilities of detection we've also increased the amount of numbers that we've discovered okay now what's up with the final step validation there's a lot of developers out there that will put kind of what looks like a secret in the code but maybe change some characters just so that this is an example is there a way that we can filter out those results yeah in some cases we do actually use what you call validation in order to filter those dummy credentials right and this is done by making a non-intrusive api call to the provider so obviously we have to know the provider it doesn't work with generic detectors and depending on the result we then know if the secret is still valid or if it's invalid because it has been revoked or if it's a dummy credentialed by a developer trying to obfuscate it got it okay so we can be pretty confident in the results that we have uh that we that we hear and and obviously in our results we remove multiple occurrences so when we give out numbers these are unique occurrences not multiple occurrences no we count occurrences okay so let's start with public monitoring so public monitoring is github.com is what we're monitoring on github.com and we scan every single commit is that right every single commit that's made yeah every new commit goes through a scanner and we detect secrets in it right and how many commits a year are we looking at here more than a billion more than a billion with a b yeah okay well it's a lot of commits okay so we have our first poll here so uh just coming across how many secrets do you think were discovered in 2021 so here are the results down the bottom there's a polls tab so if you click on that you will see there and uh we have a bunch of polls um and they'll become visible as we go so we have 21 votes here uh 23 a few okay so we're getting in consensus seems to be five to six million so we'll see now just as a reminder let's have a look at what we found last year so last year we found two million secrets a billion commits scanned and this was about a 20 increase from the year before that we could we could detect okay so i'm going to give everyone a few minutes to get their poles on and um i will do a cutoff of that okay so henry i'm going to let you do the honours if you give me one minute how many secrets did we find in 2021 in 2021 we found 6 million secrets 6 million secrets so that's i'm actually interest that's interesting because that's they can when you're looking at the polls here that's the consensus 5 to 6 million 51 percent of people thought uh that that would be it so i'm impressed well done um uh so that's yep a huge huge increase in secrets that we see so we have here six million detected and we're saying that this is a 2x increase from the previous year now i'm not known to be amazing at maths but 2 million last year 2 times 2 does not equal 6. so can you walk me through how we computed this and how we ended up with that figure of 6 million and 2x yeah um there are several effects the first one is that last year we announced 2 million secrets but it was a conservative estimation and we took this number and recomputed and we found 2.6 million secrets actually right so that's the first part of the increase and the second one is that the volume on github raised also by 23 so we had more commits to scan and we found more secrets this year okay this leads us to the six million so when we compare apples to apples then obviously we get to a 2x increase of six million okay perfect i'm on board now okay so i have so we have some other numbers on here 56 million users now we have here 25 increase in repositories created last year so 25 more percent more repositories on github and we increased our scanning by 23 now are these numbers correlated why aren't they both the same if we're talking about increasing repositories shouldn't we see the parallel increase in commits they are obviously related but they can be different and in fact they are different because we could say that developers may have not created any new repositories this year but still continue to commit on the existing reapers which could have led to zero percent increase in repositories and 23 percent increase in commits got it so those numbers are quite similar because the developers haven't changed the way they work like they still continue to create new repositories with new commits but they're not necessarily completely correlated so we can see a big difference come a bit closer you're falling out of the frame there [Laughter] all right so we get to the first exciting part of this webinar we can announce the first winner of an amazon gift card so i am just going to find here a random person that we can have so i have joanne g from washington united states congratulations you just won a 50 amazon gift card now i will uh get in touch with you after the after the webinar to let me know you got the correct number right along with about 50 of you so sorry uh to uh to everyone that missed out on that but don't worry we have more opportunities uh as we're going through we've got a lot more to discuss so let's go through here so the next what we're looking at here on we is we're looking at uh the secret to elite by categories okay so obviously we have 350 individual detectors we have generic detectors we can't list everything because it's an extremely long uh graph so we've broken it into into categories what can you talk me about the categories what can we find in each of these categories for example yeah i think most of the categories have a name that is kind of self-explanatory right like that storage is related to databases got it the one that is interesting is the other category okay the other category first has all the secrets that are in none of the previous categories but there are also the generic detectors where we cannot really know the topic they are related to okay got it got it all right so how does this compare to last year's state of secret sprawl so i mean right off the bat i can see we had a huge we got a huge difference in the other category so how can you explain that massive jump from 12 percent was other last year 32 32.8 percent is other this year how do we get to that computer computation well in the past year we had um huge effort on secret detection about generic detectors right because they allow us to increase by vast amounts the the range of detection so this is why you you have those numbers for the other category okay excellent i understand that and then we've seen some other shifts in there but i guess because the other category has changed so drastically it's kind of hard to provide any analysis on on the rest that we've discovered okay so we have another poll here i'll just check that it is live how many github tokens so we're speaking specifically talking about github tokens were discovered in the commit message so this isn't the code this is in the message of the commit so i'm gonna give everyone a few minutes to kind of go through this uh all right all right interesting 31 votes for plus 600 11 votes for 200 to 599 one vote one vote for one vote for none surely not i like that person's optimism [Laughter] uh a couple of votes for just a few all right so if you give me one minute we're going to open up here and we're going to find someone with the correct answer here we're going to actually actually i lied we're going to combine this with the next poll as well okay but if you want to know the answer how many did we find we found 500 commit messages containing github tokens 500 yeah 500. can you walk me through why a github token would be in a commit message i have no id strictly no idea could it be like a mistake thinking that we need to authent authentic uh authentication mistake thinking that the commit message we authenticate ourselves with github perhaps it's a copy and paste era all of the above anything yeah i would go for the copy paste okay but i have no idea what does everyone think does anyone have in the comments section any any reasons why someone would commit obviously by mistake but uh maybe someone did maybe someone said you can let us know don't admit to that all right let's move on okay so now we're gonna have a look at cloud providers because here's some interesting stuff so we had cloud providers of the category so let's uh let's hold uh hold this for fun elliott i'm not liking the way your definition of fun okay uh so let's have a focus on some cloud providers i was distracted uh we see here a bunch of different cloud providers so we've got google cloud aws azure alibaba scale way now uh here's something interesting there's a lot of kind of massive developments in this but we can also see that there is a decrease in aws keys can you walk me through why we're seeing a decrease in this is it positive or are people using aws less is it something to do with the market or are we actually seeing improvement on this i don't think people are using aws less i think it's growing it's always growing but on the other hand i think that people are more aware that aws keys are sensitive right because everyone is advertising about aws keys being sensitive like it's always the example we take for secret detection and people get used to it and they understand that all aws keys should be kept private right i think there is a definitely a huge movement there and it is always the my go-to example i did a video on what actually happens when you leak in aws key in github so it's on our youtube channel if you want to check that out spoiler alert it was started to be exploited within a couple of minutes uh from bots so the fact that everyone's looking for these keys the fact that there's a lot of information about sensitive so we're we're winning a battle here we're starting to see that uh some improvement in this can we pull that can you give me that piece of good news from this slide yeah we are seeing some improvements okay unfortunately only on aws keys okay here we go here we go now moving along now here's some names maybe you don't recognize everyone in here we've got planet scale and super bass so these are cloud providers correct and they're they're fairly new and we're actually seeing some some increase so here this graph here uh same as the other one is the number of detected secrets per 1000 commits and we're seeing uh an increase in this what can we kind of extrapolate from this data what are we kind of seeing here with the increase in these secrets we are seeing an increase of popularity of the systems okay first and then we also see that it's important to have secret detection that is aware of the new providers or have ways to detect new providers such as generic detectors right okay so what we're seeing here is that we can't just focus on aws we can't just focus on tcp we need to have very wide range to capture these new types of secrets that we that we're using um and as we see so i mean this is a good thing that we're seeing an increase for the companies because we're seeing an increase in their services being used but then we also have now the danger of these keys being leaked and kind of the opposite of aws that we don't have that awareness around these keys yet okay new poll so i think this one's visible yes it is okay what days of the week did we lease the most most of this year this is a fun one okay there's a prize at the end of this one i tricked you on the last one i promise a prize for right answers this if you've got both of them right we'll increase your chances okay what days do we leak the most amount of secrets so we have mondays because we're tired from the weekend depending on what your weekend is like i'm tired on mondays wednesdays with the hump day blues fridays because we're getting ready mine's back on those good old friday releases we love them weekends we're working on some personal projects or all days are relatively equal there's no clear data on what is winner what do we think okay interesting fridays is winning at the moment fridays is definitely winning at the moment so we'll have a we'll have a quick i'll give you a quick minute so 24 votes for friday eight votes from monday okay okay here we go the winner of this is weekends so weekends we have the most amount of data committed this seems counterintuitive to me why are we seeing so many leaks happening on weekends and as the slide shows if we exclude weekends then we're seeing a lot on holidays so is there anything is there any relationship here why is this phenomenon i think i'm not sure of that but my hypothesis is that people work more on open source project or personal projects that are more likely to be public during the weekends okay so that makes sense also for the the holiday phenomenal if they're working on personal projects is there still a risk of a lot of these keys being uh corporate keys or would these would we see that these are mostly nearly all personal keys what what can we kind of see there is there's still a risk of corporate keys leaking on the weekends yeah there is still a risk especially in those covet times where people tend to work from home and maybe share some devices between work and their personal usage and because of that the risk of making a mistake like copying the wrong file using the wrong secret is huge okay that makes sense all right i found someone that got both the last two questions right so we have another aws uh gift card to to give away all right so we have par from bangkok thailand par congratulations you just won a 50 amazon gift card so congratulations par from bangkok thailand now i just want to send out a quick reminder to those that may be viewing on youtube is that where if you want to participate if you want to win prizes you have to come across to the crowd cask livestream uh there's a link in the description so if you're screaming into the youtube chat and wondering why no one's listening to you that's why okay and uh you can ask a question uh we'll get to those at the end there's a questions tab i see that we have a couple so feel free to ask questions we will get around to them but also you can ask questions in the chat i want uh we won't see them at the moment but we do have members that can answer that there too so feel free to to ask some questions in the ask questions section that will help you win a swag bag and and also in the chat if you wish all right so let's talk about docker so docker this one is quite interesting can you explain what a docker image is to start let's start there well docker image is some sort of copy of a file system of a computer that can run on mostly any computer this allows you to build an app and then share it with other people other computers and it will run perfectly okay i got it so it's kind of like a virtual machine but a lot lighter yeah a lot simpler okay okay i got it so we started scanning docker images last year as well now unlike github where we scan every commit we don't scan every docker image that comes through correct no okay but there are 8.8 million docker images on docker hub and we scanned 10 000 of them now we also did a smaller experiment earlier on in the year which you were part of that webinar with a much larger experiment this time can you walk me through these results of what percentage of docker images contain secrets yeah so on those 10 000 images we found that 4.6 of the images contained secrets at least one and this was this result in 4 000 secrets found about 4 000 occurrences okay got it that's a lot so four thousand secrets one point two thousand unique secrets we have here an interesting stat so we have six secrets per 100 layers okay so what is a layer and why are we computing secrets per layer layers are well docker images are composed of layers that are built one upon each other and so we thought this that would be interesting because um you can compare it kind of like with comments okay like comments are built on top of each other and lay so do our layers and the secrets can be hidden in a in a layer that is not the top one so that you don't see in the current state of the image but it's still there in the history okay okay i got it so layers like github so if i make a commit on a repository i submit a secret i remove that secret and commit over it that secret still is in the history layers are similar if i'm building a docker image i add a secret in to connect to let's say a package manager i remove that secret it's still in my image layers correct yeah okay so comparable to commits is that layers really interesting stuff do we see the same type of sequence in docker that we do in git repositories it's not exactly the same type of secrets we tend to find a bit more infrastructure related secrets such as terraform tokens aws keys because docker is more tied to um to infrastructure than code in general is okay i got it so more infrastructure type secret and that makes sense another poll now i saw some chat about this too and i and we we have some we have some uh conversation with this particular result so what country leaked the most amount of secrets so india was last year so we're going same as last year usa coming up brazil nigeria and france so these were all in top results of last year perhaps we're going to see a completely new player so uh let me just check we have the poll yep what countries india and usa france no one's guessing france there we are all right so let's have a look here india still number one usa has moved up actually germany france so we've had some different results here the main ones brazil's moved down from second to ninth so it's a big drop what does this tell us about uh what does this tell us about like the countries that are leaking these secrets is it india bad or is it is there other factors that are handling this it's absolutely not what it's saying like india is not bad actually india is probably the country that commits the most on github right okay this map is more related to the activity of the developers in those countries and the number of developers rather than if they are good developers or bad developers so when we see secrets being leaked do we see it from a specific type of person is there a specific profile that is it junior senior appsec or is it everyone it's everyone okay so a lot of it comes from human mistakes so uh we can get some good information from this so that india is actually committing a lot uh from this they're they're making they've got large engineering populations they've maintained that number one spot and i'm going to argue that this isn't necessarily a bad thing it doesn't reflect that indian developers are bad or leaking more it just indicates that there's more developers uh in india so i think that's a important process to make when when when this slide here happens okay uh some questions okay so we're actually uh going back to this question here is this docker or is this public so this is the public uh information just trying to spread out some of the polls uh here so not just docker this is across the board all right so who is going to win so let me have a look at this let's find another winner it takes me a few minutes to sort through uh the results so uh give me one second and we're going to find someone okay i have one so we have lush from nepal n-l-u-s-h i'm sorry if i'm pronouncing that wrong uh you can ask anui uh my name's uh quite bad but anyway congratulations you've won a 50 uh amazon card uh gift card i'll contact you uh in in uh i'll contact you after swimming out and let you know how to process it so congratulations you got the question right and graduating to everyone else that got it right and didn't win a prize so apologize about that okay so let's move away from public stuff so we talked about public docket we talked about public github repositories let's let's change the conversation let's have a look at internal so maybe a quick just in case i'm sure we're all aware but what's an internal repository compared to a public repository it sounds it sounds obvious but yeah what we call an internal repository is a repository that is not open to anyone on the internet right mostly they are from private projects from organization projects um yeah it's everything that's not open source okay okay i got it so this is something you obviously see a lot of you're working in a lot of companies you're working on a personal project you've got to make it internal okay on this analysis we only focused on repositories owned by organizations so repository is owned by organizations and this is uh organizations that are scanning the code with good guardian if you're wondering if we have access to your private repositories uh we we do we don't but if you're using get guardian we do get we we take those results from there all right so let's have a look at some of the information about internal repositories so uh we've made a big focus of this in this year's report internal repositories and we're focused on the epsec engineers a lot so there's some big numbers here but i want to focus on to the left of the page first so we have uh 1050 unique secrets uh we're leaked upon scanning a repository and commit so this is referring to a company with 400 developers and let's say roughly four apsec engineers so uh is that a lot is that not a lot what what what can we see from this well 400 developers is quite a big company it's not a huge one but it's already a big company and for a company that size is it okay to have a thousand secrets in your repositories is it bad practice what what can what can we say and then yeah so maybe answer that then i'll have another follow-up obviously it's a bad idea okay but it happens and it's in the let's say i i won't say like normal but uh it's usual as usual so why is it a bad idea i have my private repository let's say i even have multi-factor authentication on it so it's not public and i trust github or i trust the version control system that i'm using uh that they're not going to share my they're not going to leak that information why is this a problem why shouldn't i have secrets in the private repository in an internal repository in most companies the code is shared like across the whole company or at least all the developers but your production secrets you don't want them to be accessible to anyone like only the the people that needs those should be allowed to access those credentials so if they are in the code first they will end up in many places in places in the code in many computers and also your developers will have access to it on their computer that they can bring home they can move to a bar or right okay move to a bar yeah i don't do that we don't encourage that behavior guardian developing at the bar uh okay so i have some some a little bit a couple of more questions around this so i mean i is the source code secure then can we rely on these vcs to to hold our scores code and i mean i guess the obvious one to look at is is twitch which recently had their source code exposed so should we kind of assume that anything in our private repositories has a possibility of being made public i think you always should think that everything can be public at some point and it's more or less likely and you should do whatever you can to hold it private but you never know what's gonna happen okay so now we have some uh some some additional maths here that i need to do so i've been struggling with the maths of this for so hopefully you can explain this to me we have here and a big bold letters 3.4 000 occurrences of secrets are detected per apsec engineer so if we have four appstick engineers and uh they're discovering a thousand unique secrets how do we get to three point four six thousand secrets per apsec engineer oh it's because it's one thousand unique secrets and then you have occurrences it's the number of places in your code that a secret will appear okay the number of files or places in your in your file and so that gives us that each secret is in average copied like 12 times or 13 times wow okay now let's have a thing about this so i'm an app stick engineer in this company uh shout out to all the app tech engineers watching this three 3.4 000 occurrences so i do some quick maths now i'm dealing with about 10 a day and i'm working weekends is that achievable no i don't think it is achievable and is this the only vulnerabilities that apsec engineers have to think about definitely not okay so there's a i mean this seems like such a mammoth task the only thing i can think of that's harder than this is learning french [Laughter] so you know like it is there a fundamental do we need to make some fundamental changes to be able to deal with this amount of in information do we need more aptic engineers do we need to make less mistakes what's the solution to how we can bring down this number well i think to bring down this number we must make less mistakes and enable uh allow developers to have tools that help them reducing the the number of mistakes okay and then they must work together with the abstract engineers in order to solve those issues right so this is really kind of seeing devsecops come to life and shift left we need to kind of all be responsible for security to help reduce that at the end let's have a look at of the type of secrets that we have here so uh a couple of interesting things on this graph this is comparing public and private secrets in a number of number of repositories so uh obviously we have a huge number more in internal repositories but they're kind of closely uh correlated uh so you know do these does having secrets and internal repository you know reflect the amount of secrets that could potentially leak out in a public repository why are these seemingly so closely related that's a good question i think having secrets in your private repositories may that may end up public at some point like i don't know if you're a company building an awesome project an awesome library to help you manage your infrastructure for instance and you want to share it with other people because you think it's so amazing that it could really help other people people's job and if you have secrets hidden in history then they will end up public when you release it okay so we can see a correlation there if we want to prevent secrets leaking being public we've got to start with our internal repositories as well uh got it interesting okay so we've talked a little bit about solving the problem now let's let's talk further about that we've talked about the the solution needs to be uh kind of widespread but uh let's start with some abstract engineers with security teams and move down to developers i'm an accept appsec engineer what can i do to help reduce this this problem well the first thing you have to do is to be able to monitor what's going on on your perimeter like your your repositories right that's the first thing so uh using a solution that will scan your commits arriving on the on your vcs and maybe the history will help will help you understanding the problem and then you can [Music] implement feedback loops with your developers because with the developers you will be more efficient to solve those problems like replacing the secrets where it needs where where they are needed in the in the code like let's say if your developers uh put the secrets directly in the code they have to change the code and they know the code so they will be much more efficient at doing so got it got it okay really interesting so let's let's now talk about developers okay so get guardian office solutions and we offer solutions primarily aimed at security teams app stick engineers what can developers do how can they be part of the shift left movement i think if they want to be part of it they have to automate the parts that's verified that they don't make mistakes so it's kind of not about giving them extra things to think about it's about giving them tools that enable them to kind of take action from this yes okay and what kind of tools uh for developers uh are out there what can we do uh for this well currently at git guardian we have a pre-commit hook that allows any developer to have his commit scan just before he commits those and this is a great tool to be sure that you never have a commit containing secrets okay so pre-commits or pre-push hooks they can be automated you don't need to think about it but it will block something if there's a mistake made yeah okay you only think about it when there is an issue does good guardian have a tool like that and is it free yes it is okay that's important uh okay so interesting so i have another poll here that i'm curious about so this is this this slide on my on my screen here is talking about where these the kind of workflow of this so we you can detect sequence pre-commit pre-push you probably don't need to do both uh you know but uh having one there's slight variations uh there's slight advantages and disadvantages to both we have videos on how to do this on our youtube channel so uh check that out make sure you subscribe to the channel it helps the google overlords know that we exist and uh then as we kind of go down we can see that the the remote repositories this can be for app stick engineering teams and integrate with the geek guardian's other products okay poll time uh do you guys have any git hooks to detect secrets i'm really curious so if you have a pre-pre-commit to detection pre-push detection you don't have any secret detection but you have other git hooks or you have no good hooks um and if you have none on both of those then let us know in the chat why uh that'd be interesting so we have some results here 11 people with uh pre-commit get hooks one person with pre-push you know i'm personally i'm a fan of the pre-push controversial topic but i'm pre-push man uh it sounds like an oxymoron for something let's move on quickly uh and then uh you know 18 votes for no like the honesty let's uh get onto that um what's the youtube channel address i'm not sure the uh it's if you type in get guardian i'm sure you'll be able to find it i'm not sure if we have a specific url for that uh but yeah definitely in there okay so really interesting this is a personal curiosity i wanted to see uh wanting to see if all the videos i've been making about pre-push and pre-k uh and pre-commit videos have been making a difference i'm gonna take it as a personal win 15 people have uh have pre-commit detection if you don't have one get on board team pre-push okay so now uh we've come to the kind of end of the formal slides of the presentation and uh thanks thanks everyone for for being here and we have some q a so we have a lot of questions uh to go through um people have posted in the chat the link to the youtube channel by the way i'm seeing that so we're gonna get to some of these questions now uh with questions you can upvote so to help so we'll we'll kind of go with the ones that have the most amount of upvotes but uh uh here we go question number one i haven't filtered these so let's let's let's see if there's hopefully there's nothing nasty in there so why do we consider entropy for a generic password entropy tells whether a password is weak uh or or strong we should detect secrets whether strong or weak why do we need an entropy score interesting question actually for to detect password we don't use that much entropy because uh most of the many passwords um have a very low entropy like they are composed of words from the english english dictionary [Music] and so if you use entropy on methods like that you don't get good results so you end up losing a lot of secrets how do we how do we detect passwords then and what and when we're referring to generic detectors and entropy what are we detecting with entropy for entropy allows us to detect um strings that were generated by your computer okay most of the api keys that you used are generated randomly by your computer and so they will have some properties especially about entropy and they won't look like something that you've created yourself um i don't know mackenzie's super strong password one tell everyone my password so we don't use entropy that mentioned on passwords okay uh so we're just taking you with a high entry now we're talking about generic detail we have specific detectors we have 350 specific detectors do we just have one generic detector or do we have like generic detectors for passwords for entropy for different things yeah we have different generic passwords especially the two that you mentioned okay so the generic high entropy secret and uh and the generic password okay what are some of the difference between say container security scanning tool like sneak anchor exit and get guardian for container images interesting question again don't say gigarian's better different answer yeah gitguardian scans for secrets the container and it scans the history well if i'm not making any mistakes the other scanners are scanning the the top layer so the final state of your repository and they're not especially looking for secrets but more for other vulnerabilities like you're running your docker as rude you have open ports that should be closed and vulnerable vulnerabilities like that so these other tools complementary to guardian not necessarily competitors in that space yeah okay great uh all right could the github tokens and commit messages number be connected to incorrect pipeline setups oh yeah i have no idea so i'm totally fielding this to you i have no idea but that's an interesting uh interesting option i think i'll have a look at that after the meeting yep okay very interesting thank you for your input yeah yeah thanks robin oh we have another one from robert that last one was hard i don't know if i wanted to let robert we're gonna i gotta go for it do you have any resources other yeah i like it already do you have any resources on how best to include or not to include secrets in docker images so actually uh i can uh i can answer this one uh so uh you may have noticed a handsome fella in the chat called uh thomas segura so thomas actually uh wrote a great cheat sheet for docker images how to prevent security so this is on our blog thomas give yourself a shout out post a post a link to to that uh to that blog post it's great uh resource there's a long article that explains all the steps why they are and there's also a cheat sheet which is a one pager just to help you uh remember this um i'm gonna stop i should have stopped sharing my screen there we are uh have your thoughts about scanning public cloud images amis what are the thoughts of that public cloud images um perhaps public uh s3 buckets or um no it's some it's not something that we've done yet basically we've been focusing for now on on source code but that's definitely something we could do in the future yeah okay maybe next year uh is it possible to scan docker images on private registries in the same way you can scan private github repository yeah this can be done using the or cli tool gg shield it allows us it allows you to scan any docker image that you that you have and find the secrets that are in it okay great and so we can scan docker images you can plug it in so that it scans the docker image after it's created so you can automate the process yeah of course you can automate the process in your ci or do it manually if you if you prefer oh i like this i like this next question is there a public api call that can uh they can i can make to just scan automated to generate reports and so on basically it's getting exposed exposing the required apis i don't the question is i believe it is does it have an api where they can use the sequence detection yeah yeah sure there is one um you can find the documentation on our website to use it you just have to create an account and get an api key it's free and then you can scan whatever text you want of course it's optimized for source code but it will work on anything so you can scan it i created a video if you're interested of how to use this api key there's a video on a there's a tutorial on a youtube channel a while ago which is me writing a little python script to scan directories with the api key now there that in itself probably isn't the most useful thing to do with the api but we did it that way because it was very clear and showed you how the api worked so that you can build your own stuff so hopefully that helps here will the webinar be recorded after yep we will uh we will have this webinar here all right last secret last question how do you relate a secret found in a random public repository to the company oh this i like this question too so maybe i'll provide some context before you give your answer so one of the things that guardian does is we can detect uh we can detect secrets and the public that aren't related to an organization so the organization doesn't own the repository and link that back to an organization so how does guardian do that andre currently i'm not working on this part of the of the software so i'm not sure of the details but mainly it works by analyzing the contribution of a of a developer to the repos of the company and to the company's activity in general so uh i can i can jump in here for a little bit more context so basically what guardian does is we create what we call a perimeter around an organization so this will include all the employees and uh it will also include the activity this is all automated so we can automatically find all the employees and then we start monitoring uh the commits made and alerting if there are secrets in those so that's kind of how we relate to it now they could be personal secrets they're not always connected to the company but it's still good practice to not to not not do that so that's how we kind of cross-relate that okay are there any more questions did i miss any is what is the number of credentials found in code so uh six million six million this year we found in code anything else no well this is great we do have more prizes though it was some uh for me not for you i said i'll buy you a beer calm down [Laughter] all right let's have a look uh at some of the most active people so we have some swag bags to give away so we have two swag bags to give way swag bags have some cool merch we have rubik's cubes we have hats we have other merch in this nicely packaged box so you can be the most stylish uh billboard for get guardian imaginable uh we're gonna give these away to the the people that were most active um so i just have to check robin have you got a swag bag before is that i recognize that name so otherwise if you haven't you if you haven't you've won a swag bag but if you have i'm gonna randomly pick someone else too but i will pick another person uh as uh as as well okay so david's asked most active so we have uh i'm not actually sure how the analytics works but in my control panel here uh i can get a little uh button of people that were the most active uh and uh david good thing you asked that is uh you so you have won uh a swag bag so congratulations um uh and robin hasn't either all right sorry robin i was getting confused of another robin i thought i'd recognize that name so robin and david you are the two winners of the swag bag congratulations congratulations to everyone that also won an amazon gift card as i said i'll be sending out emails following on from this about the process and uh that so uh yeah this was fun on me i'd like to thank you uh thanks for sharing your expertise um i know we had some questions scripted but i think like eighty percent of them i just threw at you so you did fantastic i was trying to stump you up it was a personal game but uh it didn't happen so congratulations to you two you've deserved your beer uh and uh thanks to to everyone that tuned in and uh yeah if you like the content uh stay tuned we will be doing another webinar at the end of march so please uh uh look out for that if you haven't already like us on our youtube channel subscribe that will be a great way of appeasing the algorithm lords so thanks everyone