The Codecov breach was done by very sophisticated attackers who exploited a mistake in how Codecov built docker images. In this video, we discuss the Codecov breach, how attackers breached Codecov, how they stole sensitive information from Codecov customers for months, and what you should do if you are a user of Codecov.
In this video we're going to talk about the Codecov breach. We're going to discuss quickly exactly just what happened then. We're going to take a look at Codecov itself and why this type of attack or this type of breach is significant. We're going to talk about exactly the steps that the attackers took to compromise Codecov. We're going to discuss who was affected and what to do if you have been affected and finally we'll just spend a minute discussing whether or not Codecov or other companies in this similar industry are safe to use. So first let's just quickly discuss exactly what happened.
On January 31st 2021 malicious actors were able to update the bash uploader script in Codecov, they did this by leveraging credentials they were able to export from a docker image.
Between January 31st and April 1st the attackers were able to squat inside Codecov and extract all of the environment variables of Codecov's customers, we'll discuss more of why this is very significant in a minute. On April 1st it was actually one of Codecov's customers that noticed that the bash uploader had a different hash value to what was published on their website indicating that something was wrong. Codecov investigated and were able to fix the issue on April 15th after some thorough investigations Codecov then announced that they had been breached to the public and notified their customers. Now if that didn't make a whole bunch of sense to you or you wanted more information then don't worry we're going to unpack this step by step.
Codecov is a code coverage tool, essentially what this means is that they let you know how much of your application is being tested. When we're building modern applications and we're using continuous integration and continuous deployment we want to make sure that we have automated tests in place so that when we release a new feature we can be confident that it works as intended and that it hasn't unintentionally broken other features of the application. Now obviously we want to be able to test every line of code during this process every function and every feature but this requires quite a mature testing automation and Codecov can help develop that because it lets you know what lines of code aren't being tested in your CI environment. Now why this is really significant is that this type of attack is a supply chain attack it means that Codecov is part of your supply chain.
Using a very simple example of the process that goes on from when code is created to when it ends up in production we can see why this is significant we can create some code make a commit, push it into our repository, and then we can send it into our CI pipelines. Now what this is going to do is build our application and run some tests we're going to test the functionality of our app, maybe we have some security testing built in there too and we also could have some reporting, really we can do a lot of powerful things during this process. But everything that runs within our CI pipeline, because of how we build applications, they need to leverage sensitive information to do this. For example, they need to access a database, if you have one the payment system and most importantly the git repository, now hopefully during this stage we're not using production data we're using sandboxing or pre-product environments so that we don't inadvertently release sensitive information, but still, it is common to see production credentials within a ci environment.
What's significant is that Codecov has access to your git repository and git repositories are known to be high-value targets to attackers.
Well, the first step was that attackers were able to get access to a credential from a misconfigured way that Codecov was creating their docker images. Now using this credential they were able to update and modify their bash uploader script. Now a bash script is just a set of instructions similar to what you would write within your bash or terminal but written out in a programmatical way. They added another step into this list of instructions. That step was essentially to take the environment variables within the environment that we just discussed before and then send it to the attacker's remote server essentially taking the sensitive information that makes your application run sending it to an attacker.
Now this could have included a database or different tools that you're using if you haven't segmented these out but importantly it will likely contain access to your git repository and this is significant because this is where the attackers are known to have accessed and compromised other organizations and developers. So who was affected?
Well, Codecov has 23 000 customers or users. Now really anyone that was using the compromised version of Codecov between January 31st and April 1st would have been affected. Now large organizations such as Twilio, Hashicorp, rapid7 and Confluent have released their own statements about how this has affected the.
Now one interesting case that we can examine is that of Twilio and that is because it gives us good insight into what these attackers were after and how they leveraged the sensitive information that they stole via Codecov. And that is that GitHub let Twilio know that there was suspicious activity in certain repositories and Twilio were able to realize that there was credentials within these repositories that were being accessed and cloned. So here we can see that the attackers took git credentials from the ci environment cloned or accessed that repository perhaps squatted inside there, searched it for sensitive information such as Twilio keys and then compromise those accounts.
So here we can see the workflow of one possible way that the attackers ultimately used this breach. Now this isn't to say that this is the only way that the attackers were able to use the sensitive information. They may have been able to access databases, they may have been able to access payment systems, messaging systems really anything that was granted access to by either your ci environment or credentials that they further found within your git repository. But it really does illustrate the point of why a supply chain attack like this is so critical.
Well, if you were using Codecov between January 31st and April 1st then it's very important that you take action. Now the first thing that you should do is rotate all your credentials. This means revoke access to all the credentials that were used in your ci environment but also all the credentials that may be hidden in your git repository or other data stores that the ci environment had access to.
The next thing is we want to analyze our logs to make sure that we can see any suspicious activity. This will give an indication whether or not the attackers have penetrated into your systems. Now as i've just been talking about it's very important to make sure our vcs systems are clean, our git repositories are clean and free of secrets because this was potentially one of the targets or one of the high value targets that the attackers were after. So a great tool to do this is GitGuardian's internal monitoring system and there is a quick hack of enabling you to do this.
You can sign up for free and personal accounts are free for life but if you have an organization with private accounts that you want to scan well you can do this with a simple little hack is that activating the 30-day free business trial for GitGuardian. It takes a few seconds, you don't need a credit card and then you can scan all your repositories history in private shared repositories and make sure that no sensitive information was leaked and this will give you a good starting point and of course it's free.
So why not if you don't use GitHub or have on-premise installations you can also use cli tools such as gg shield , GitGuardian Shield to do the same thing and essentially search these whether it's file directories or whether it's git repositories or other vcs to make sure that there are no hidden credentials in there.
And the final thing that you can do which is really i want to put it as a bonus because it can be quite complicated is that to set up two-way authentication with secrets for machines. So what do i mean by this well using products like vaults like Hashicorp vault you can set up two-way authentication so that these credentials can't be accessed outside of their environments. This sounds like a great idea and it is but it can be complicated costly and quite intensive to set up.
Okay so now let's talk about something that's probably a bit uncomfortable. That is is code called or other tools in the ci that you use safe. Well I think what's important to say in security is that you can never have a zero percent chance of data breach. Any company cannot achieve this and this is because new vulnerabilities are discovered every day and as we can just see the modern software application process has lots of different procedures within our supply chain.
Well, what i can say is that the problem the underlying problem has been fixed and you can be sure that they've probably undertaken some serious work into making sure that their own supply chain that their own processes are much more secure than before. The other point to make out is that Codecov has been pretty forthcoming about this they've released new information as it's become available and have a pretty helpful FAQ on their website and I would highly recommend you go take a look at this. Link will be in the comments to decide whether or not this might be something that you're comfortable moving forward with but just to finish up with understanding the risks is always a good thing and we should be critical of all the companies that are within our supply chain however this risk is part of modern application development and we need to also mitigate this so this means that we also have to take responsibility to making sure that our supply chain and everything around it is protected. Making sure that we use pre-production environments that we separate our credentials and that we make sure our git repositories are clean. If we can do all this then the impact of a supply chain breach is greatly reduced.
There you go diplomatic answer!
if you like this video then please help us out by clicking the like button and also comment on things that you disagree with agree with or if you have other data breaches that you would like us to unpack and if you have specific questions you can reach out to me using the Twitter handle at @advocatemack or using the hashtag #askmack. I hope you enjoyed this video and I'll see you next time for the next breach analysis.