Finding leaked credentials in Docker images - How to secure your Docker images

Docker can be a blind spot for security, in this video we look at leaked credentials inside docker images. We evaluate how leaked secrets like API keys and certificats are leaked into docker images, how we can detect them and how we can protect our own images. Resources:Research into leaked credentials in docker images: https://blog.gitguardian.com/hunting-for-secrets-in-docker-hub/Dive, tool to view docker images: https://github.com/wagoodman/diveGG-Shield, tool to scan docker images: https://github.com/GitGuardian/ggshieldGitGuardian, Secrets detection solution: https://dashboard.gitguardian.comCheatsheet, protecting docker images: https://blog.gitguardian.com/how-to-improve-your-docker-containers-security-cheat-sheet/Intro: 0:00 What are secrets: 0:49What is docker: 2:10Inside docker images: 3:24Examples of leaked secrets: 5:19 How secrets leak in docker images: 7:08 Docker security research: 10:00Scanning Docker for secrets: 11:40Wrap-up: 16:41

Video Transcript

[Music] hello and welcome back to another video today we're going to be talking about docker images and more specifically we're going to be talking about protecting our docker images and making sure we don't leak sensitive information through them now docker is an amazing technology and we'll talk a little bit about it but it can also be a bit of a blind spot and a bit of a mystery to even those that can sometimes use it so we're going to unpack some of the mechanisms that make up docker and have a look at how this can be used against us from attackers now first maybe people with different levels of understanding so i want to talk a little bit about what kind of sensitive information we can leak and predominantly i'm talking about secrets so what am i talking about when i'm referring to secrets well secrets can really be defined as digital authentication credentials now typically in software development these are things like api keys you might have credential peers like database password and usernames you might have security certificates so anything that authenticates us or our application or our services with each other now these are highly sensitive and we know that they're highly sensitive because we put a lot of effort into protecting these and if these accidentally leak out perhaps through code maybe in a git repository that's misconfigured or they can leak out in docker images and if they do then an attacker can use these to penetrate into our services move laterally or elevate their privileges one of the scary things about secrets is that really the barrier to entry is quite low what i mean by that is you don't need a lot of technical skill to be able to use them in a malicious way and once you're properly authenticated into a service it's hard to be able to detect that a malicious actor is there it's what makes it so problematic so what does this all have to do with docker now i'm assuming that most people here have a pretty good understanding about what docker is so rather than talking about how docker is used let's talk a little bit about how docker images are made up so we have really three components we have our docker file we have our docker image and we have our docker container so the docker file is really what describes our docker image all of the things that our application need to run are described in our docker file our docker image is compiled from that file and it contains uh all the information about our application it contains all the dependencies that it needs to run and that docker image creates a docker container and this is what our docker image is running within so doggerfile we describe our image docker image contains everything that it needs to be able to run and once we run that application it turns into a docker container or it runs within a docker container now to get a better understanding about all of this let's actually take a look at a docker image because it's important to be able to visualize where the sensitive information may be leaking now up on the screen now i have an awesome open source project called dive by a guy called alex goodman and this is available on github and i encourage you to use this tool and of course give alex a star because it's an awesome tool that really helps understand docker in a in a new way now in the top left you'll see a bunch of lines now these are actually layers of a docker image and as we go through these layers you'll see that the information on the right is changing what these layers are is essentially our docker file every time we add new instructions new dependencies or add or remove files will create a new layer and we can actually move through these layers now near the bottom we'll see this layer that has a bunch of green files which indicates that they've been added into the docker image on this layer now what these files are are our application i'm in an image called ggshield which is a git guardian project and within this docker image you can see all the files from that project so this is our actual application so sometimes when we're using docker it can feel like it's this confiscated thing that really doesn't make any sense or it's just kind of black box actually it's a file with a lot of layers that contains our code to make our application run it contains extractable code now i really want to kind of harm on that point that it contains our code because we're talking about leaking sensitive information one of the key areas where sensitive information leaks is in our code so if our code is in our docket image our docker image is public or we distribute it widely then the secrets that are in our code are going to be exposed but this isn't just hypothetical leaking sensitive information like credentials through a docker image is something that actually does happen you may remember earlier on in 2021 codecov reported that they had a serious security incident at first we didn't understand much but we began to understand that this was actually a supply chain attack where code cov was the target which allowed attackers to move into other victims we started realizing when companies like rapid7 monday.com and even twilio had to disclose that because of this bad actors had exited their private code repositories now i have a full video on the kodkov security supply chain attack but essentially this is what happened codecov had a misconfigured docker image essentially they had a leaked secret in their docker image this was a public docker image that anyone could pull down now attackers search through this realized there was a credential and this credential gave them access to the code repository i'm assuming it was a git repository attackers were then able to authenticate themselves insert malicious code into codecov once they inserted that malicious code they were able to move laterally from code cov into their victims private git repositories they did this because when codecov was running the attackers took the environment variables namely within those environment variables there was a git credential and it sent all those variables all those secrets back to the attacker the attack was then able to use those credentials to correctly authenticate themselves into private repositories but essentially all this started because of a secret that was inside a docker image so how do secrets end up in docker images well there's a number of ways of course one way is through the source code this is when you have secrets either hard-coded into your source code or perhaps you have a emv and environment variable file that gets captured when you're building your docker image and that file is essentially there attackers that know what they're doing can extract the sensitive information and then use them to authenticate themselves but there are some docker-specific ways that files and that secrets end up inside these images now one of those ways is actually through the docker file now remember the docker file describes our docker image now the secrets that you might include in your docker file would be something to a package manager depending on the privileges of these credentials or this access this could actually be really damaging as an attacker could update your packages and turn your applications malicious without you even realizing it but this feels like a fairly obvious one if you know how docker works then you maybe you don't want to put secrets inside your docker file well there's actually another way that we've seen where secrets have ended up inside docker images again this is through a docker file but it's surprising because it shows that there is an understanding about how docker works but also a misunderstanding at the same time now in this example essentially what we have is we have a developer that's got credentials again let's say to pip in a file called net rc so he store these credentials in this dot net rc file now the problem is is that this dot net rc file is in our docker image and so obviously we don't want that because it's going to expose our secrets so then we go ahead and we remove this so you'll see that the last command in this file is to remove that dot net rc file the logic here being that we're going to pass the credentials through to the image and then once we've used them we're going to remove them so they're not in our final image well this may work in theory but it doesn't work in practice because of the layers that docker uses remember when we looked at our docker image we could move back to different layers so we could essentially just go back to where that r net rc file existed and access it now this is very similar to git history and if you've seen any of my videos before you'll know that i talk a lot about security and get so i won't hamper on about that in this video now of course we wanted to really kind of quantify how big of a problem this was is it something that we really need to be concerned of is it widespread we know that secrets inside places like git repositories are widespread but how does that translate into slightly more advanced technology like docker while there's still a huge amount of sensitive information in docker images if we take dockerhub.com the largest public repository host of docker images we downloaded two terabytes of data to analyze to give us a benchmark of how many of these images actually contain secrets and it turned out that seven percent of these docker images contained some at least one secret now if you want to learn more about this particular research project or there's a link down below for that but just to give you some additional idea without going too deep in this video i think you might find it interesting that the type of images that the type of secrets that we find are very different to what we find in say git repositories so as you know we've done massive research projects in public get repositories and if we break these kind of secrets into different categories what we'll find is that the secrets that relate more to an application are very widespread in git repositories but appear very little in docker images instead it's those private keys that relate to internal services internal security and your infrastructure these are typically the secrets that we find largely in docker images all right so now that we've gone through how secrets can end up in docker images and what can happen when they get there and how they're exploited let's take a look at how we can protect ourselves so we're going to be using a tool called gg shield to scan docker images for secrets and it's a very simple tool to use it's a tool by get guardian that leverages the get guardian sequence detection engine and it does this via an api so the first thing we need to do obviously is install this tool so i'm going to use pip as a way to install this but you can use brew or any other tools that you have and obviously i already have it installed so now that this is installed obviously as i mentioned that this uses the get guardian detection engine via an api we need an api key to authenticate ourselves with that service so this is very simple to create you just head over to getgarden.com and create an account if we don't have one or head to dashboard.guardian.com create an account is free and takes about one minute and we're going to hop on down to the api tab create new api called key and we're just going to give this to scan scope for the moment because we don't need to worry about creating other permissions in our api we just wanted to scan some information so we're going to take this key and copy it over into our clipboard and we need to add this key as an environment variable so we can do this however we normally would i'll just use the export command and that will save that key in there as an environment variable so that we can authenticate ourselves all right so we've installed ggshield we got ourselves an api key now let's scan some docker images so really simple we just run the command ggshield scan docker and then we need the name of our docker image now obviously because we've scanned a lot of docker images i know of some that have some secrets within there so i'm going to blur out the name here because i don't want to expose any sensitive information and we're going to go ahead and scan this docker image now if we already have this saved locally then gg shield will scan it locally and if not it will pull the image down from docker hub all right so we've completed scanning this and it's found two incidents so we can see the incidents have been found both in a dot env file so what's happened is obviously they've created their docker image and they've accidentally included their environment variable file in this image and so now that's exposed in the docker docker image we have here an aws access key id an access secret key so now we can gain access to their aws services and we also have a send grid api key so obviously very sensitive files here and these are the exact type of files that we can often see in a docker images when sensitive files have been caught up in that code let's run another scan of a different image and see different types of sequence that can come up so we've finished scanning our next image and we can see here we've got some different types of secrets so we have a secret token here now you'll notice the difference between this and the last one is this isn't a name server so this could be a secret token to an internal service it could be a secret token to a package manager or anything else so that infrastructure and then we also have a username and a password as a credential pair here so again we don't know exactly what this gives access to but if we looked into the code i'm sure we would get an idea about what this did so two different types of secrets that we can find in a docker image using some real life examples the great thing about this cli tool is that we can actually insert this into our software development lifecycle we can inject it into our ci cd pipelines we can scan docker images immediately after creation and block them from being uploaded if they contain secrets we can stop commits coming in if they contain secrets into our git repository and we're going to uncover a lot of these more advanced features in later videos so if you want to know how you can automatically scan your docker images for secrets after they've been created then make sure you subscribe and we'll give you a notification when we release that video so that's it for today i hope you enjoyed the video if you have any questions ask them to me in the comment section or reach to me reach out to me on twitter at the handle advocate mac and until next time hope you consider liking this video if you found it useful thanks

Finding leaked credentials in Docker images - How to secure your Docker images

Table of Contents

Video Transcript