CodeSecDays 2024 - Join GitGuardian for a full-day exploration of cutting-edge DevSecOps solutions!

Save my spot!

CodeSecDays 2024 - Join GitGuardian for a full-day exploration of cutting-edge DevSecOps solutions!

Save my spot!

Introduction to Wolfi OS & Building Declarative Containers - CodeSecDays

Wolfi OS is a community Linux distro designed for the cloud-native era. It relies on the environment to provide a kernel and allows for smaller, more granular containers. Learn how to use Wolfi as a base container image and build your own images using open-source tools.

Video Transcript

yeah so uh thank you all for joining me thank you get Guardian for having me I'm excited to talk about Wolfie which is a open source project that we started over at chain guard uh we'll get into that in a second uh quick bit about me you can find me on the internet at Eddie Zane I live in Denver Colorado where I like to climb big mountains uh and I'm a maintainer for the kubernetes and six store projects so I can answer lots of questions about those things if you have any quick agenda for today we're going to talk about software supply chain security kind of Define it talk about some scary stuff and then we'll talk about how we build containers today and then we'll look at Wolfie and why we built Wolfie and feel free to drop any questions in the chat as things are going so software supply chain security it's definitely the modern buzzword you may have heard this in a bunch of different places uh this is AKA spooky time where we kind of get to talk about why you've heard it so much uh and if you were in the previous talk by Sonia you would have seen quite a lot of these facts in very similar um position on stuff I like to start with this slide this is the original kind of announcement that uh Linus torvald sent about the Linux kernel uh he sent this to the Minx Minix uh mailing list and I'll give you all a second to read it but the important part I like to call out here is this first sentence where he said I'm doing a free operating system just a hobby won't be big and professional like gnu and it's this is from August 25th 1991. it was just kind of wild to think about that like this started as like a hobby project for him and now it Powers the modern economy in all companies right like it's very difficult these days to not find Linux somewhere in your stack whether it's a provider or you're running it yourself so uh open source is awesome and how powerful it is and thankfully we know lots of use around Linux but that's kind of the root of the problem with this software supply chain secure that we talk about so roughly 70 90 of any software stack consists of Open Source software it is basically impossible today to build any sort of application or any software that doesn't use open source in some form uh whether it's your your kernel your compiler your programming language it is just relatively impossible for you to do that without using open source and that's just kind of where we are as a society and why we be able to progress as an industry so far is thanks to open source and this is a slide that kind of puts that in reality for us like your open source software is the underlying part of that iceberg that you don't see and all the proprietary custom stuff you add on top is you know just a tip uh it is it is a awesome industry but the problem that I have is an open source maintainer is I have no idea who's using my software right I can build or you can build an open source Library put it up on GitHub for anyone to use change or modify uh and you don't know who's really consuming it unless they tell you right so you have no idea who to reach out to when you have a security issue or some sort of reach the cost of a data breach in the US in 2022 according to IBM's research report was 9.44 million dollars and that's kind of just the starting point there's a whole lot that goes into different uh countries and regulations based on what data was leaked and whose data so it is it is a very expensive thing to deal with you might hear this turn software supply chain attacks quite a bit this is really a term that you know became mainstream when it came to uh the solarwinds attack and the colonial pipeline attack this is the idea that I as a open source maintainer uh or a uh even as a regular application developer I'm not building every piece of software in my stack so maybe a dependency of mine or a transitive dependency of mine gets compromised and that slowly makes its way through the dependency web into my application into my artifacts that I'm shipping to my customers so it's like I said earlier if you're a part of the panel it's something that you know developers kind of thought in the back of their head like oh yeah this could be bad one day I'm just downloading code from the internet and executing it but it's now bad and it's now that day so we're trying to figure out how to secure it and make this a better ecosystem for everyone a bunch of other things that popped up Pi Pi has had quite a few attacks and PM uh it's it's becoming super prominent and it's honestly it's a surprise that it took this long uh back in the day when I was going to school uh we found that GitHub actually had a policy that you could request a GitHub user handle for a a user that was no longer active so a bunch of my friends from school managed to get Single Character GitHub accounts uh from like uh I and x and K and GitHub just kind of handed it over back in the day that was the policy if they determined that the count was inactive they would release the username and and that's like kind of crazy today like you you would never be able to just release a GitHub username if anyone writes go like go import paths are based entirely on GitHub usernames and paths to repositories so it's just it's kind of wild that you know all of this was possible but it didn't really start being exploited until you know recently uh some quick numbers we've seen a 742 average yearly increase in these types of attacks since 2019 uh when it comes to open source uh we're expecting 3.1 trillion total requests for for downloads consumption and usage and uh when it comes to your dependencies and your open source stack your transitive dependencies so not your direct dependencies but the dependencies of your dependencies and so down the web account for six out of seven vulnerabilities affecting open source projects so you know you may be vetting the one project and the code that you're looking at but unless you're inspecting all of their dependencies and transversely from there you you're susceptible to this somehow and so with all the spooky stuff out of the way I like to talk about you know how we build containers and kind of how we arrived at you know where we are today as shipping software through um these container artifacts you know the idea of containers came from the the shipping industry or the nomenclature at least where containers for for cargo ships are standardized right you have to have a standard unit that people can transport their products across the ocean with and so we have a standardized shipping container that you know what you put in there gets put onto a boat and it can fit X number of these and it's it's all standard and all the tooling and cranes and and companies that make them all follow the same spec uh and that was the idea behind containers but where we're at today is how we build them or what we put inside those containers still has no standardization and you'll wind up with kind of all sorts of different things so if you've used Docker before uh this is a Docker file if you're a go developer this probably looks 90 like your current Docker files just about every go developers Docker file is going to look like this uh quick walkthrough we have the we're starting from that golang base image this is a multi-stage build so I'm going to build in one step and then put my artifact in another to ship that so I I add my dependencies I insert I add my code I I run my build file my build command this is a command that I can't get rid of right when I think about building my code or building a Docker file ever there's always like that one step that is the build my stuff command whether that's like your npm install dependencies or or Pi Pi install requirements.txt pip install requirement.txt there's always this big can't get rid of or abstract away this is the build my stuff command everything at else in here and like copying over these SSL certs and we're not even copying the time zone database over from scratch is uh you can kind of start from like an empty image this is you know if you're doing like a a go build or a rust build you can usually start from an empty image that it's a more secure and smaller image for sure but everything else that's kind of in here is just I call this bash script and duct tape right just like all modern CI modern CD is All bash script and duct tape strapped together right we can't get rid of this line but everything else in here is uh you know tacked on and when we talk about these Docker files I I always think that they just do too much right they do that build my stuff stuff uh build my stuff step they provision the entire environment right so thankfully with go we can run from that scratch image but you know I have to provision my my environment which is you know there's tools that are built purposely for that like ansible right install my packages for my OS install image magic install libzip uh it's you're doing those multiple things and then at the end you always have to start my service right so this is a job of systemd or some other image system and your Docker file is doing all of these things and it's not really doing any of them well right the Unix philosophy of do one thing well is to do one thing well and so this is this is kind of the problem when we look at Docker files and when you're analyzing these and figuring out what goes into them you know when you run those arbitrary run commands like apt install uh libzip or something you know that is all very hard to attest and track to as it's going into your container so without complaining right what else is there uh well if you're a go developer you may have heard of Co KO it's a tool that can build a Docker container using a uh no Docker Daemon it builds the layers of the docker image for you knows how to compile that I think all it requires is a go tool chain under the hood that it calls out to and and this will just be able to build you across uh cross architecture containers and you pretty much just run Co build and you get a container without Docker so super great if you're a go developer not so great if you're you're using a different language jib is kind of similar to co it's the same thing with Java you still have to have that Java compiler but you can still build that container but these are purpose-built tools for each language and so every ecosystem or every language could build a tool like this but then you have all this fragmentation and different opinions and you know when you're doing tools you can be opinionated but this is a lot to learn if you're you're a polyglot programmer or you know making uh policy decisions for how you build things across your org Basil's another great one for for cross-platform this is based on Google's build internal tool called bit uh Blaze people have lots of mixed opinions on bazel it builds Docker files I mean builds Docker containers declaratively mostly by being able to build everything itself build your python compiler build your your GCC compiler using bootstrap and it's it's a tool that is kind of difficult for orgs to get started with and you kind of need to be an expert to learn and maintain it so uh bazel definitely uh gets a lot of people uh the groaner upset when you suggest it or want to use it you may have heard of distro list before if you're in the container world a gistroless image contains only your application and its runtime dependencies they do not contain package managers shells or any other programs you would expect to find in a standard Linux distribution so District list was a started a project started from the Google containers team it was an idea of how do we pull all those things out of a container that we don't need like a package manager it is based on Debian and it was a great start it kind of started as like a five percent project at Google and then at kubernetes said they were going to switch to it and uh the people working on it were like oh we should probably put some more effort into District list so it was a great start in Pioneer in this area uh summation of this release is you don't need a Linux distribution in your container if you're running a container you are using the the host's kernel so the container runtime will share in that that host kernel so you don't need a internal you don't you don't need a Linux distro for your container you only need what's required to run your stuff uh in the event of like a go binary or rust binary you typically don't need anything if it's all statically built but if you're running python you need your python interpreter you need your node interpreter so and so that brings us to Wolfie so kind of that's the history and background of why we built Wolfie and so Wolfie is what we call distross V2 uh the people who built distrollis at Google kind of all left and started chain guard so this was the continuation of that work and what they think is done the right way from the start so we refer to this as discordless V2 it's an undistro which kind of just means it doesn't have a kernel yet uh there's you know the we'd like to add a kernel one day to be able to run this in more secure systems uh like uh like real-time operating systems and 5G and iot stuff but as of today there's no kernel so we call it an undistro and when you think about really what is a Linux distribution is it much more than a package manager uh maybe some decisions made so that's a philosophical one we can have later Wolfie has a rolling release so we are continuously building and shipping latest versions of packages uh all Upstream supported stuff so go 119 go 120 when go 119 falls out of support it would be pulled out of Wolfie so that's the idea is that as things are are released we'll keep stable versions around and supported versions around there's no version of Wolfie it uses the Alpine Linux package format so apks there's a bunch of reasons for this watch I'll talk about in a second it ships s-bombs for all the packages so software bills and material this kind of has all the bits that went into that package some of the hidden files that are around we call that dark matter so it has some of the files we keep like a receipt of everything that goes into a package so you can easily find and look at this and wolvey is declaratively and reproducibly built with some tools that we'll take a look at later and it also supports AMD 64 and rm64 right now so uh two flavors of Wolfie for your your architecture so why APK well APK is declarative and reproducible uh we declare the package format and it goes and builds the package and Zips it up and kind of when you're building an APK package you get a directory that starts as the root directory and anything you put in there kind of just gets squashed at fast so you start with your folder and add in a user folder and a bin folder and all that gets extracted to root so pretty straightforward uh APK will not leave the system in a broken state so APK has this concept of the world so if you look in your slash Etsy Slash APK World file it's kind of just a list of dependencies and then the when you go to add a file it adds it to the the world file and then it runs a resolver so we'll be able to check if anything cannot be installed before it actually goes and starts it and this means that there's no rollbacks so if you've ever had to do like a a Debian upgrade and you've got a broken package in the middle like maybe a permission on a file was walked up or something uh the rollback for that is is kind of it just leaves the system in this broken state so APK has no rollbacks because it will resolve first before it does anything so this is perfect for automated Pipelines uh we have a Wolfie SDK uh image that you can use and look at I have a link to these slides at the end too but this is kind of what you would run to get all the tools to work on Wolfie and then I have uh an image scanner so uh here at chain guard we uh Wolfie is an open source project with open governance and you know we're hoping to grow the community a lot more than we currently have but the uh at chain guard one of the products we offer is we take Wolfie packages and we build container images out of them so these images are free and open source for folks to use now I'm just going to run a quick uh gripe scan this is a container scanner tool you can use a tool like sneaker trivia they're all going to do the same thing they're going to take that container image and they're going to look at all the packages and kind of find any known vulnerabilities or cves in it so running this to pull down that python image which I think is about a gigabyte in size it might take a second then it's going to go through scan all the packages and then spit out the results so I have found 638 vulnerabilities and this is just the the regular Docker Hub Upstream Library image so if you run Docker run python this is the image you would get so there are 678 vulnerabilities one critical 47 High 464 negligible and then looking through the list here we see here's a get vulnerability that's labeled as won't fix and this kind of Boom blew my mind when I first looked at it like what is a high vulnerability that won't be fixed this has to do with the how Debian does back ports and Debian won't pull from Upstream they'll actually cherry pick patches onto their Branch for building Debian packages which has its pros and cons but you wind up with this is all noise to me right I don't I have to sit here and vet all these different versions and all these different cves and it's organizations have told us that they they get mad at the results of these container images the scanners and they kind of just stop running them so this just helps nobody and so if we compare that to uh one of the images that we've built with Wolfie so this is our python image much smaller and has no known vulnerabilities in here and so we rebuild these images every day they're free and open source for you to use on GitHub so I have some links to that at the end but you can start using these today uh the important thing that I like to call out other than the the vulnerabilities like sure that's a cool big flashy number but there's only 45 packages in the image that we built and these are include like python packages and system packages if you compare that to the Upstream one that has that entire Linux distribution in there there's 435 packages so it is a pretty significant difference and what you wind up is is just a bigger surface area for for Vector of attack and issues and all that stuff so I like to you know point out the vulnerability thing but just having a small container that only has what you need to run your python application in there is is super great so that's scanning those images uh here's those commands that you can run known image will be the same I think the node image has a lot more vulnerabilities in it quick graph of what that looks like when compared you know all of the images that we build for out of Wolfie for our chain guard images are a lot smaller and have significantly less dependencies which is the best part if we look at the size of those containers too you can see that the Upstream python image is a gigabyte in size and then our python image is 458 megabytes and that just is has to do with the size of python when it's compiled same thing with node our node image is 109 megabytes and the Upstream one is a gigabyte and you know there there are ways to uh pull smaller images from Docker Hub you can like pull like node slim or node Alpine but they all kind of have the same problem it's just it's a little exaggerated here so looking at what Wolfie is if you ever wind up on this Wolfie repo so it's the GitHub org is just Wolfie Dash Dev and slash OS so this is Wolfie what you come to find are kind of just a bunch of yaml files and folders these are all the packages that actually make up Wolfie so again what is a Linux distribution other than a package manager and packages and repository to pull from so you could see we have repackaged all these things from Source into Wolfie so we rebuild all these images there's actually I mean all these packages there's actually a huge bootstrap process that we wrote about on our blog that's really cool because when you go to bootstrap something like a GCC compiler well you can't compile a c compiler without a compiler so you have to bootstrap an old one that's untrusted so download one that's built already and then use that to build yours and then you kind of go down this chain where version one can build version two but version two can't build four only three can so then you have to build three and it's a real interesting problem to solve that we wrote about on the blog but taking a look at some of these right like here's the python package for 311. I'll explain more about what this package format is but you can see here that we're just downloading python from upstream and then kind of compiling it and building it myself so all this is declarative and it's all reproducible so that's what Wolfie looks like in in a repository form again it's just kind of yaml files for packages and then we take all those packages and we build the chain guard images that I was talking about so this is the chain guard images repo uh all of these images here are free for you to use for open source and other use we limit them to kind of pulling by the the latest tags so if you need something like uh you know python27 we offer a catalog but you can you can have a subscription to to pull those but our main focus is providing current supported Upstream uh versions for free to developers so you know we build versions for python go and Gen X we have you know I call these like Appliance images so definitely take a look at these these are the ones that we built using Wolfie and the cool thing is that all of these are the configs and how we build them is all public and open source and the GitHub actions pipeline you can go back and kind of check any sort of build to build any sort of image and it's all a testable and it's all done with s-bombs and signatures so if we look at what that python file looks like for building python 311 you know all it has is really the package that's in that image and then the command to run and then a bunch of user information so yeah we'll talk more about this file format in a second but this is our declarative and reproducible pipeline so using these these images built with Wolfie is a drop in replacement for pretty much all of your your applications we have examples of most of the repos there uh you can see here that I've been able to cut out some of that that SSL certs by using this static image static just kind of has SSL certs in there and then a few other bits like the time zone database and really just kind of what you need to run a statically compiled binary take a look at that config right it has time zone data and that's pretty much it and then this will also pull into casert bundle so and then yeah so drop and replacement easy to get started with significantly cut down on your image size and so now I just kind of want to show you a comparison of what this looks like to build I have a repo that you can grab all this in at the end but if we look at like a regular Docker file for building a python app right I have a very simple flask application here which is just you know importing flask it's setting up the hello world listener and Port 80 right so if we look at what the docker file looks like for that we start from python 311 copy in our requirements text do our dependency install copy in our main expose Port 80 start up the application right so building this you got a Docker build this is going to go out download those dependencies run through it all do that pip install kind of putting in a bunch of untracked files we don't know what we don't know what all these individual python files are and so I'm not going to run that so if we run that it should work let's just make sure yeah we got our hello world pretty cool excuse me uh and then so that's that's the docker build and if we look at how big that is it is a gigabyte right so starting from that base python image and then adding on our dependencies and source file and so we wind up with this pretty big image to be running and that's obviously a bunch of resources to be taking up so moving from the docker image we can do that drop and replacement for Wolfie so this one is a bit of a less of a drop in and more of a you have to know how python loads dependencies and this is just kind of the nature as running as rootless all of our images by default will run as rootless uh one of the things I meant to show you up here is in our Docker build uh pip is actually yelling at us because we're doing a pip install dependencies as root and so this it even warns you here this can leave the system in a broken State and uh you should use Virtual end for other stuff right so so all of our images that we've built run as rootless by default same kind of looking thing this is using a multi build step so we start off with our our Dev variant so we have the latest and latest Dev Dev usually has a shell and some other stuff in there and then we have our our install dependencies we have our copy dependencies you kind of have to know where those dependencies go which is our non-root user and copy those over and same kind of deal so let's build that and we're going to see that this is a lot smaller and what do we wind up with yeah 56 megabytes versus a gigabyte so significant dramatically smaller image because it has Python and it has my code in it um so that's Wolfie and that's chain guard images in a nutshell uh I have a few minutes left I'll quickly show you kind of the power user mode that we talked through and how we build Wolfie uh we have two tools that we've built called APK on melange uh these are also open source and free to use though we don't recommend most people jump into using these right there it's again this is like the power user mode uh APK is a an apko is an APK based oci image Builder it's fully reproducible by default it's a testable so everything that runs and gets installed to APK we generate s-bombs for and you have full attestations for everything there's no run statements all APK does is take a list of packages and install them into a container which you wind up with those super small images it builds images super fast this is what an APK o file looks like this is what I was showing you before we have our CA cert bundle for my python image here my go image and then I have my other my code package which I add and then set some other stuff uh melange is how we build a APK package from your code or from uh Upstream code uh it does multi-arch builds by default generates signatures and s-bombs and it it has signing keys this is what I was showing as well so this kind of has a package uh the the build dependencies of that package and the pipeline to go through so you can see here that this is the build my stuff command so I've taken that out of my Docker file put it into a single thing that does one thing well which is build my code into a package so if we run that real quick we'll see what it looks like so if I do a melange build this is going to go it's going to build my python app into a uh a package so it's basically zipping it up with all the bits in there that it needs install it does my pip install but it actually has generated s-bombs for all of those pip dependencies so it has a receipt for every file that has gone into this package so I know it exists right and showing you what that looks like apko at yaml same kind of deal build my python application copy my dependencies over and if we look at what those s-bombs look like uh it's in a smaller directory and so I can take that that apko I mean that melange package and now I can turn that into a image and so this is building a python image and it kind of flies by in the blink of an eye and all this does is it takes a list of packages and installs them into a container image so you can see here there's the list of dependencies and somewhere in here it has my python app as a dependency so I can load that up with docker you'll see it's 27 megabytes super small and we can run it the same way and it should just work and it does right and so we wound up with a smaller more secure image and again we don't recommend or tell you to use these tools you can if you want to they still have a bit of usability and work to be done but they were built to solve the problem of how we build containers some other resources that you can take a look at as I'm running out of time we have our our wolfy community wolf is a open source project that anyone can use so we're trying to grow the governance there we have an education site that you can use to learn more about Wolfie and images in general and then there's a link to that repository and then here's a link to the slides if you want to grab them you can scan the QR code or grab that bitly link but that's all I have time for and thank you all so much for joining and having me