The Power of SecretOps:Automating Secrets Workflows - CodeSecDays

Learn about secrets management and automating workflows to enhance security in DevOps. Covers storage, governance, orchestration, lifecycle management, and observability. Gain insights and best practices for SecretOps. Empower your team with tools and knowledge for streamlined secrets management and robust security. Speaker: Nic Manoogian, Senior Software Engineer at Doppler.

Video Transcript

uh so my name is Nick Manoogian I'm a software engineer at Doppler and I'm going to be talking today about secret Ops um I'm gonna dive in uh not talking about secret Ops but about applications so uh every app has three parts it's got your code which is your application logic right everybody likes the code uh you got compute which is where the code actually runs usually this is like a VM or some some serverless uh you know platform someplace and then you've also got your secrets and secrets are the configuration that your app needs to run these could be sensitive values like API keys for third-party services it could be um encryption Keys like an SSL certificate or private key or it could be just configuration non-sensitive port numbers log levels things like that um all these things are required to run your app and I want you to take a minute to think about uh how your org deals with workflow around each of these components for code we've got really mature processes as an industry for managing changes to our code right we've got like at the base layer git right that's tracking absolutely everything um you've got that the whole repository is just a series of diffs stacked on top of each other right then you've got um a layer on top of that which is something like a GitHub or a gitlab which allows you to your developers to submit changes for review and then a layer on top of that which is CI CD that actually makes sure that the code is acceptable and actually facilitates shipping it right which is pretty amazing in multiple environments most of the time staging and production or wherever you're going right so we've got really mature processes around our code and then on the compute side it's a little bit newer but we've got similarly mature processes for managing our compute we've got systems like terraform and pollumi that allow us to manage our infrastructure as code um with similar processes right it's very rare that you uh have to like R sync your code up to a server and it's very rare that you have to go into like the AWS dashboard and click around and make resources we don't manage our resources that way anymore right uh but think for a minute about how your organization manages Secrets um these processes are a lot less standard uh you know some some orgs have more mature processes than others but for the most part it's safe to say that they're pretty scattered and this is kind of where this idea of secret Ops comes in it is the like Ci CD for equivalent for code or the infrastructure as a code for infrastructure um secretops is the orchestration layer for secrets um so I work for Doppler um Doppler makes a product in this space my goal is not to sell you on Doppler but it is to sell you on the idea of secret Ops and there's a couple of layers to it that we're going to go through in the course of this talk and uh it's they're not really like features or boxes to be checked they're they're sections or categories in which features would fit and your organization is doing some form of this uh as it stands right but like you necessarily if you if you're operating an app it's got configuration and you're necessarily using some kind of secret um so it's helpful just to contextualize each of these layers as a place where you can be doing work right or a place a place that may or may not need Improvement so uh without further Ado let's dive in uh to the first layer which is secret storage and I'll start with the story when I was fresh out of school I got my first engineering job and I got my laptop and I I cloned the repo there's a python app and uh I want to go spin up the server right I did python app.pi enter right and I immediately got an error right the uh the Google underscore Maps underscore API key is missing right so I I talked to my manager I'm like hey I got this error I said oh you need to go talk to Steve right I'm like oh okay so I ping Steve and after a few minutes he comes back to me and he's like oh you need this dot EMV file so it gives me the DMV file right and immediately there's some questions that are going through my head like uh are these like your secrets that I have to change like like do I have to update any of the values in here and like how do I like if my first assignment was to add something related to email so it's going to add the send grid API key right do I just when the time is right I'll just add that in do I need to like Drop in slack like hey everyone here's the seven grid API key make sure you add it to your dot EMB file uh you know that aside like how does this work on staging in production right like how like who do I need to talk to to get the secret landed in staging in production and are there any processes that would prevent me from accidentally shipping code that requires a secret that I haven't submitted and I hope that the answers to these questions aren't like submit a ticket with the Ops Team right so the the point of secret storage is not just to like make sure that secrets are stored securely because that's that's stable Stakes right like I I would hope that everyone has a secure place to put their secure Secrets the the challenge is is it one place and do your developers know where that one place is right um how do I get the secrets to run my app how do I add a secret and this all ties into this idea of sprawl right um we're talking about the local development scenario with uh with Steve right and the EMB file but in staging and production if you're running a more complicated app it's unlikely that your secrets are only consumed in one place right maybe you're running you know some server that is going to need to consume secrets from AWS secret manager so that's where some of your staging and production Secrets live and then you know you've also got a front end that's got some Edge stuff associated with it and those secrets are in something like ever sell right it becomes really hard to get a handle on where your secrets actually live which Services actually consume them how like what work needs to be done to add new ones it's a bit of a mess so the first thing to do is to centralize all this information um so I promise not to sell you a Doppler but I do have to show you some screenshots um to show you what these things look like so uh with that disclaimer out of the way this is how we do it in Doppler um so here we're looking at a uh a project in Doppler we've got the back end project and we're in the local environment or Dev config and we've got some secret Secrets right like our database password is a secret we want to keep our user and host secret that's arguable but then there are some completely non-secrets right we call those unmasked values like the port number and the log level um so that's great uh if I had this when I was starting out as an engineer I would add a send green API key um I gave this a dry run talk to somebody and they were like you better call out that that's not a real API key so I'm doing that now uh that's a random hack string you can try it against this ingredient API if you want it won't work um but with that out of the way um so I would have added this API key and I would hit save and when I go back up and look at my higher order environments I get a warning that the assumption is made that you should have those secrets in all of your environments and you get a warning saying hey the secret is missing from this environment so that's the kind of benefit that you get from centralization right so you've got one source of Truth one place where all of your secrets live and your developers know that your developers know like I know I know exactly where I need to go to manage my secrets once you've got Secure Storage out of the way everything is together the obvious next question is who has access to these secrets and that's kind of for each team to decide uh Doppler no Engineers have access to production that's how we like it right there are trade-offs to that but no nobody on the engineering team um or like effectively the operations team has access to production uh all requests have to go through a very select number of of individuals um but regardless of what your policies are your tools should be able to build this policies and that absolutely means setting up automation so you should have some construct like roles in groups so that you can easily say when an engineer joins they should be in this group you should be able to automatic hopefully automatically put people into those groups with tools like saml and scim so that when you're scaling you don't need to tell every engineer to go to Steve right that's the benefit um and then when you're off boarding uh you know a team member leaves the team you know exactly that they're going to lose access to all the secrets that are in that one place this isn't possible if there's 20 places that you need to manage Secrets right you'd have to go set up 20 automations but when you've got everything together you can add the the access control layer um the other piece of this is tracking um you know sort of going back to this idea of we store all of our code in git like I can't imagine going back to a world where like I didn't know what code had changed right so that's absolutely a layer of this is being able to see what the changes were over time so this is again what it looks like in Doppler I make a change that change is tracked it's time stamped and you should be able to roll back to any version so if something breaks in production you've got a a clear indication of what changed so you've got everything stored in the same place it's all locked down um Steve and I had this figured out right like uh in staging in production after I was working there for a little while we just put everything in one password right all of our production secrets were in one password and to actually get them into production or staging uh Steve would uh go into one password and copy the value like that was our source of Truth and then we would SSH into the production machines and he'd copy that value into secret.json uh their processes have since matured but uh that that was what it was for a time so that is the orchestration piece right like that was our version of orchestration delivering the secret to the compute instance that actually needs to consume it um that's not a good way to do it right if if Steve forgets to copy that value we're we're in trouble um so a better way to do that is you've got your source of Truth you've got something like Doppler let's set up some automation to sync those Secrets where they need to be and it shouldn't be a process where where you know a tool asks you to completely overhaul how you consume secrets and it takes you months to embed this tool everywhere um we're kind of of the opinion that a tool like this should meet you where you are and that typically involves sinking secrets to where they need to go and what that actually concretely looks like is you put your secrets into it like Doppler and then you set up automations where those secrets are synced to where you need them and those places are going to be different depending on the environment and the context of what the code actually does maybe in local development maybe your developers are you know building a server the CLI should inject those secrets into your app running locally right that's probably the easiest way your developer is going to have access in staging and production maybe the easiest way for you to consume Secrets is through AWS secret manager like that's that that might be the easiest way in which case let's just sync the secrets to that place nobody needs access to those except the the compute instances that need to to consume from them but if that's the easiest place and it's you know it meets your high availability requirements and it meets your cost requirements why why would you change that right um so get the secrets to where you need them and the way that we achieve this is with Integrations we try to implement as many as we can but that's something to be looking for is how easy is it to actually deliver the secret into your production environments and the other piece of this which is really handy an area where they're where we're continually investing in is automatic redeployment so we support this right now specifically in kubernetes um where if you set up the operator our Doppler operator we can sync secrets into a native Secret in your cluster so it's available for you to mount as a file or to inject as environment variables which is Handy but after that operator completes the sync from Doppler into a native secret it can go identify all the workloads that are relying on that secret and restart them so end to end what this looks like is the developers that have access to the production environment can make a change and with one click deploy that change right that a few slides a few slides back we updated the like worn to info for our log level as soon as I click that button my apps will restart with whatever processes I have in place for health checks and they'll come back up with that log level that is really what what we think is the the Holy Grail right of orchestration right you're effectively you've set up a CI system for your configuration um once you have orchestration in place I think I I just have this graphic to show how we think of it a Doppler it's sort of like a hub and spoke we're trying to uh set up syncs with as many integration sources as we can because we want to meet developers where they are um so that's something to be looking for is in the number of Integrations that a tool like this has um once you've set up storage and governance and orchestration it becomes really much easier to manually rotate Secrets right um folks generally know that rotating Secrets is a good idea you want to reduce the surface area of your secrets um you know the round table we were talking about uh you know long-lived keys that were you know generated on a developer's laptop years ago that's not a good story right you don't you don't want to have long lived Secrets um so you know Steve and I we knew we had to rotate Secrets but this is kind of where we stopped in the secret Ops process because we just didn't have a really good handle on all of the apps that were consuming the secrets that we would want to rotate like when the time came to rotate that send grid secret we just weren't sure which workloads were consuming it and we didn't have the confidence that if we updated the value in all the places that we knew about that we would actually get everything so we had to kind of pick between putting ourselves in an insecure position by not rotating or putting ourselves in a risky situation from an uptime perspective because of rotating it might mean we bring down our apps um so that's not a good situation to be in once you've once you've got confidence that your secrets are being used in the places that you know about and that you've got the appropriate orchestration to deliver those secrets to those stores just going into like the send grid API or the sengrid dashboard generating a new key and plopping it in Doppler is enough to give you the confidence that rotation is going to work right if you've got that end-to-end set up and you know that updating the secret in the Doppler dashboard is going to trigger redeployments and and get your apps restarted then it becomes way easier just to do that but you can actually take it a step further if you've got the confidence that changing values in your Doppler environments will actually propagate you can set up automated rotation so this is again something where we're continually investing in but in this example instead of generating an API key for sendgrid and just dropping it in my environment I've actually set up a rotation policy so that this API key gets rotated every 30 days so after I created it I set up the policy for what I want this API key to be able to do the secret values they live alongside the static values in my config and they're they're rotated so they can't be modified they're sort of managed by the Doppler engine which is pretty handy but what's also really cool about this and you'll notice that all these masked values here are hidden behind a um like a obfuscation there's like the six dots there they're not actually six characters long but the underlying values aren't actually in the Dom here they require an API hit on our servers to actually download um which means that until someone intentionally clicks that send grid underscore secret value that secret has never been seen by a human which is really really cool um that's kind of really what you want that secret has never been in a in a developer's uh you know clipboard it's never been seen by human eyes um and that's really what we think is super valuable if you've got the confidence to be able to rotate on a Cadence like this um it becomes really really easy to set up this kind of automation um so that's managing the Secret's life cycle and the last sort of piece that sort of ties all this all up and this really wasn't even on Stephen I's radar to be honest with you um is is effectively tracking the the who has seen a secret that kind of goes back to that you know have any human eyes ever seen the secret before um that's an interesting question to ask right every time you download a secret or a you know secret is transmitted onto a server and is used it's a risk of a leak right uh that's an opportunity for that secret to end up in you know the logging system in an HTTP log or it's an opportunity for it to get scrolled away onto a bug tracker if it's embedded in like the off headers of an error uh those are all opportunities for a week to happen and if you want to answer the question like the secret was leaked where did it where was it leaked from it's pretty difficult to answer that question unless you've got some sort of observability in in fetching um so observability is about asking the question who has seen this secret and sort of on the I guess the converse of that question is what are all the secrets that this person has seen right if you're offboarding someone uh you know if they've seen the secret they can take it with them right they could easily have copied some value and now they've got that and there's tons of you know SAS products that don't have any kind of protection right like if an employee saw your stripe API key and they take that with them the the you know they can make request requests against your stripe API um so being able to ask the question how which Secrets has this person seen is a really tricky one to answer um so this is again an area that we're investing in but the functionality as it exists in Doppler right now is as I mentioned um Secrets aren't read until you actually click into them so here I've clicked I've actually clicked in to reveal this value for the password field again random hex um and it's I could open this this this View and see for the current version of The Secret all of the actors that have seen it so coupled with rotation if we discover that there's a leak and we have the original value we can in theory go back to this and look at the Val look at the the version that matches that that value and see every actor service or user that has ever seen that secret and if we're rotating on a Cadence like every 30 days that that list might be really small um so being able to to plug leaks like this is just not really something that most teams can do right now um so this is still pretty rudimentary and we're making big investments in this right now it's something that we're really really excited about um so that's those are sort of the layers um as they exist right now the sort of overarching ideology I wanted to take a couple of minutes to talk a little bit about um what's on the horizon things that we're really excited about um so identity auth was it came up in in the round table um we are in the secrets business but portable credentials are really not the best way to be operating um Open Standards like uh oidc allow workloads to authenticate to other services without a portable credential right anytime that a developer is accessing secrets in Doppler over the wire um they're doing so with a portable credential with like a API key um so we're really excited to see uh the elimination of some of those things and in the gaps that we can't do um identity off doing heavy rotation and then putting that behind identity off through Doppler this eliminates this idea the Zero Secret that is the the zeroth secret is that the Doppler API key that gets you in the door so that you can fetch your other Secrets there'll always be that Zero Secret not so if you're using identity off um and the second piece to this which is somewhat related is dynamic Secrets which you can imagine as um like really fast rotation I guess um imagine that your workload when it fires up instead of having a list of uh rotated secrets that it can use for a period of time it actually requests a lease for a new uh like a new resource so you can imagine like a migration that you have running or maybe a deploy job that's running in CI you would need some secrets to like interface with AWS for example to deploy the code to where it needs to be right um the way we have this feature right now it's in beta uh the one we currently support is AWS IAM users so you would configure Doppler to say whenever a workload requests this environment issue them a brand new AWS IM user with this policy and make sure that it does not live for more than 30 minutes or until the service itself requests its revocation so when the CI job is running it hits Doppler and asks for all the secrets and IT issues all these resources and some of them are are these Dynamic secrets and that actually creates a brand new AWS IAM user applies the policy and generates a key all in one shot and then returns that out to the user or out to the the workload once that work is done it can revoke it at the end uh or if it you know that job fails or whatever it gets cleaned up automatically what's amazing about this is that credential only lived for that small period of time so if you find that that credential is leaked you can trace it back to like this one 30 minute window right that that it must have happened in because nothing else has ever seen that credential no it's not shared across any other actors it's so isolated and if you're only running a deployment like every couple of days or every week or something like that there's just no need to have a loose credential hanging around um if it's only needed for a short window why wouldn't you just make that resource when you need it so this idea of like limiting access to third-party data is is what this is uh is really about that's just kind of up next for us on the horizon stuff that we're particularly excited about and uh I'm coming in a little under time um this is mostly what I had to talk about so I wanted to open up um to any questions y'all have um yeah thanks so much for listening

The Power of SecretOps:Automating Secrets Workflows - CodeSecDays

Table of Contents

Video Transcript