As more organizations, both small and large, begin to implement orchestrators like Kubernetes, many are faced with the same problem - security and compliance.
As all engineers know, security is typically not on the scale of importance from the start unless something catastrophic happens. Instead, organizations are usually more worried about features. If you take a look at the “Developer to Security Engineer” ratio, it becomes quite apparent.
Perhaps there’s a reason for that. Maybe security is looked at as too difficult to implement and organizations need tools, platforms, or a protocol that helps developers implement security best practices in an easier way.
To think about it from a different perspective, on the opposite side of the spectrum is compliance. Practices like SOC2, HIPPA, PHI, HiTrust, and others that make up government, healthcare, and other sectors for meeting regulatory requirements.
That’s where CIS comes into play.
In this blog post, you’ll learn about what CIS is, why it’s important, and a couple of great tools to get started with.
A recent State Of Kubernetes Security report from Red Hat came out, and although it covered a ton of great information, here are some highlights directly related to the security landscape:
- 93% of respondents experienced at least one security incident in their Kubernetes environments in the last 12 months.
- More than half of respondents (55%) have had to delay an application rollout because of security concerns.
- Around 70% of security issues in Kubernetes are due to misconfigurations (according to Gartner it’s 99%).
When you look at the percentages above, there’s a trend - security is a huge issue in the Kubernetes space.
Although Kubernetes is extremely popular and a “hot” topic in today’s cloud-native world, the amount of organizations implementing it is quite low. In fact, only 10% of environments using Kubernetes have fifty (50) clusters or more. The reason why is that although Kubernetes is a popular topic, engineers and organizations are still trying to figure out how to implement it. Do they want full Kubernetes? A hybrid solution? On-prem or in the cloud? The list of questions goes on and on, and because of that, they aren’t even thinking about security.
The number of security engineers compared to the number of developers in an organization is grossly low. Security is always an afterthought, which shouldn’t be the case. For a lot of companies, once the platform, features, and bugs are squared away, is when they start thinking about security. At that point, it’s most likely already too late. There will be so much tech debt that companies end up having to rip out 50-60% of what was built to implement security practices. Because of the lift, companies will negate the fact that security is crucial until a catastrophic security incident occurs.
In short, the current Kubernetes security landscape is nothing short of a mess.
What is CIS?
The Center Of Internet Security (CIS) is a set of best practices and standards that are used to target a systems security and confirm that it is configured in the best possible way from a security perspective.
CIS isn’t just Kubernetes specific, and because of that, let’s first talk about CIS as a whole.
If you’ve heard of “system hardening” or “hardening”, then you know exactly what CIS does. It gives you the ability to ensure that your security practices are up to date.
- Application patching
- System patching (like Windows Updates)
- Insecure logins
- Data encryption for both at-rest and in-transit and several other security implementation details.
As with all security practices in general, no one can ever stop all threats from occurring. It’s impossible to say “this environment will never get breached”. As we’ve seen from large organizations, like Microsoft, Google, Amazon, etc., any organization can be breached regardless of its security details.
Because of that, CIS isn’t meant to stop all threats. It’s meant to mitigate as many threats as possible with best practices that are constantly and consistently updated to reflect the current security landscape.
CIS is the security standard for many organizations and compliance implementations, including SOC2, HIPAA, PCI DSS, SRG, and NIST.
Although CIS is for way more than Kubernetes, there’s a huge need in the Kubernetes community for proper and common security standarts. Because of that, since 2017, CIS has been working with Kubernetes engineers to publish a specific benchmark around Kubernetes.
Like all CIS, it’s a set of best practices and security standards to follow when implementing Kubernetes in any environment. It covers both containerized applications from a developer perspective and cluster/infrastructure-specific implementations for infrastructure engineers.
Let’s take a common scenario in today’s Kubernetes world - Managed Kubernetes
Services. If you use a service like Azure Kubernetes Service (AKS), Elastic Kubernetes Service (EKS), or Google Kubernetes Engine (GKE), the control plane/API server is managed for you. You don’t have to worry about scaling, managing control plane components (Etcd, Scheduler, Controller, etc.) or anything of the sort. It’s managed by the cloud provider. However, the worker nodes are still very much your concern. Even though it’s a cloud service, a piece of that cloud service (the worker nodes) is still running on virtual machines. As with all virtual machines, they must be hardened. A lot of organizations are forgetting about this and not doing simple maintenance, like patching and updates. This leaves Managed Kubernetes Services extremely vulnerable.
Because of the increased popularity of CIS, vendors and cloud providers have started to create their own CIS sub-projects in response to the need for implementing CIS. For example, because CIS doesn’t directly cover Google Kubernetes Engine (GKE), Google created a “child” version of CIS to scan specifically for GKE-related CIS issues.
You can find it here.
If you’d like to see one of the full Kubernetes CIS benchmarks, take a look here.
Why is CIS Important?
As with all systems and applications, engineers must develop a baseline and benchmark. Because all security risks cannot be stopped, they should all be mitigated as much as possible. The thing is, these baselines are going to consistently change and there’s no “one size fits all”. If you’re running workloads in the cloud vs on-prem, your baseline will be different. On-prem would have a focus on physical security and where the servers exist, whereas in the cloud, engineers wouldn’t have that control because they can’t mitigate that risk. CIS allows you to not only mitigate security concerns but sets you up with a baseline.
As you learned about in a previous section from the Red Hat report, security concerns are a big deal in organizations. Security has always been a big deal, but the fact that data is now in multiple locations and workloads are running all over the world compared to 10 years ago when they were in a data center where engineers managed themselves, the security landscape is changing. Think about it - when data was in a data center, or multiple data centers, you knew exactly where it was. Now, you know it’s in a region, but you have no idea where the data actually is.
CIS is important because all engineers, not just security engineers, must follow a standard to keep infrastructure, applications, and data secure. Also, engineers must follow best practices to ensure that environments are running as expected. This mentality is the make or break between a thriving engineering department and layoffs across the entire engineering org.
Scanning is one of the most important security implementations an organization can perform. Every engineer that cares about security can agree with that. However, there’s a major problem with scanning.
When you run a scan, or even better, when you create any report at all, what typically happens? It gets sent to someone’s inbox, or a group of people, and it sits there. Do people look at it? Sure. Does anything usually come out of it? Not really. It ends up just being “another email” that’s left unread, or is read during morning coffee that gets forgotten about ten minutes later.
Another scenario is the reports are taken seriously, and get brought up in a meeting. The meeting ends up becoming a constant debate, or a back and forth debacle, and the concern that was brought up in the first place ends up getting pushed back. Whether it was a fix for the concern on the report that was constantly debated, or the lift was too big and engineers don’t have time to implement the fix. Regardless of what the reason is, fixing the concern on the report is a can that gets kicked down the road until everyone eventually forgets about it.
The scenarios above are a big reason why security concerns are overlooked and ultimately end up in a breach.
A larger issue is with scanning, there aren’t many actionable items and outcomes. Engineers scan, get a ton of alerts and don’t really do anything with them. Similar to the above mention of a report going into someone's inbox and it collecting dust. The primary reason is that there are so many alerts that come in that you have no idea where to start fixing them, and don’t have the time to do so.
This is why assisted remediation is needed.
If there’s a system that has automatic remediation, that means engineers don’t have to always think about putting out fires from a security perspective. Instead, they can focus on value-driven work while an automated system is taking care of the low-hanging fruit. Whether the automatic remediation is “true” automation, as in, it goes into your system and mitigates the risk, or it does something like open up a Pull Request (PR) to give you the code that will fix the issue.
Can Assisted Remediation Help all Use Cases?
The short answer is no, and here’s why. Every organization can use the same technology, but the way that it’s implemented, and the security concerns they have will be vastly different. Out of all of the organizations I’ve worked in and consulted for, they’ve used similar or the same technology, but the problems they were facing and how the technology was used/implemented was vastly different.
In short, there’s never going to be a platform that can fix everything and anything. As discussed, security isn’t about mitigating everything. It’s about mitigating as much as possible and minimizing the attack surface.
With that being said, there is, however, an opportunity for tools and platforms that are offering automatic remediation to give engineers the ability to create their own automatic remediation. That way, once a security issue comes up, an engineer can write the code or create the policy or perform the action to automatically fix, and continue fixing (with automation), the specific problem they’re having.
For example, one of the biggest standards is to not use the latest version of a container image. Although CIS can report on that, automatically remediating a container image version may not be realistic because the auto-remediation platform has no way of knowing which version of the container image an organization wants to use in production.
If automatic remediation follows CIS standards and protocols, and an organization wants to follow those standards and protocols, the organization should be able to implement the automatic remediation and cover roughly or over half of the security issues. Of course, there will be specific CIS protocols that cannot be automatically implemented.
In this section, you’re going to receive a breakdown of a couple of the current tools that scan against the CIS database. The information in this section was gathered from a hands-on, practitioner-led perspective by utilizing these tools in real Kubernetes environments. As with all platforms, each has pros and cons, of which this section should help you make a decision on which tool to go with.
The primary factors that went into this were:
- If the tool/platform has automatic remediation.
- Reporting capabilities.
- Automation capabilities. For example, running the tool in CICD.
- Overall Kubernetes support.
Automatic scanning with Checkov can be done, but automatic remediations do not exist.
In terms of automatic scanning - you can utilize the Checkov CLI in, for example, a pipeline and run CIS-related scans against your infrastructure. Checkov also has plugins that you can use depending on the CICD platform. As an example, below is a screenshot utilizing the Checkov GitHub Action.
Because Checkov runs as a Cronjob in your Kubernetes cluster, it may take a while to see the cluster come up in the Bridgecrew portal, and therefore, you must wait to start CIS scans. The current implementation of the Cronjob could also cause problems in later versions of the Kubernetes API if it’s not updated.
Gives you the ability to create custom policies, but it’s in the paid version of Bridgecrew, which can be a hassle for organizations attempting to evaluate which product to go with.
With Checkov, you can automatically generate reports as PDFs for SOC2, HIPAA and PCI-DSS.
Where Checkov shines is its vast array of CIS scans. As mentioned in a previous section, CIS doesn’t specifically target all aspects of Kubernetes (like a GKE cluster, so cloud providers like Google must create their own framework to cover it. Checkov scans and covers all of the “child” CIS projects.
All environments (AWS, Azure, on-prem, etc.) are supported.
Strengths: Checkov has been available for a long time and is known in the landscape.
It scans against both CIS and AWS’s Foundation Benchmark.
Weaknesses: Although you can create custom policies, it’s only in the paid model. No automatic remediation. There is also no container image scanning or compliance reporting without Bridgecrew.
Kubebench specifically targets all CIS checks in the Kubernetes CIS benchmark.
However, it does not scan outside of that, which means it won’t cover CIS child projects like the GKE CIS scope. If you’re looking for a tool that does that specifically, you may be better served to utilize Checkov.
Kubebench can be used in CICD. Although there aren’t specific plugins that you can get in CICD, like with Checkov or Kubescape to run CIS scans, you can utilize the CLI inside of a CICD pipeline.
Kubebench does not give the ability to generate reports and does not give the ability to have any type of automatic remediation. The best that you can get from a report perspective is the output from the terminal. That output can be pushed to some type of text file and be turned into a report, but that would be considered a “duct tape” option in comparison to other tools.
All environments (Azure, AWS, on-prem, etc.) are supported.
Strengths: The results and recommendations are split up into five categories - control plane security configs, Etcd configs, Control Plane configs, Worker Node security configs, and Kubernetes policies. The reason why this is great is because engineers don’t have to hunt down which category they’re attempting to fix. It has self-explanatory titles.
Weaknesses: No UI, no reporting, and no automatic remediation.
KSOC has a great method when it comes to remediation. Instead of attempting to automatically go in and fix code specifically, for example, in a Kubernetes Manifest, it will create a Pull Request with recommendations as to what should be changed. Then, as an engineer, you can go in and confirm that you want the changes, or decline the changes.
Although this is great, the current concern is that KSOC is geared to only working with
Kustomize. Although Kustomize is extremely popular, organizations may not be using it. Whether they’re using Helm or raw Kubernetes Manifests, utilizing KSOC will depend on the organization's current workload.
KSOC does not give the ability to generate reports and at this time, KSOC is all UIbased. Because of that, there’s no way to scan for vulnerabilities via the command line or an automated technique with CICD.
Although the KSOC ruleset is based on CIS, it’s not directly correlated with CIS, as in, it’s not scanning the CIS benchmark directly.
Environments outside of AWS and Azure are currently not supported.
Strengths: KSOC has the backing of many industry-led professionals in the Kubernetes and security space. Because of that, the features going into KSOC are coming from individuals that actually went through Kubernetes security issues in real production environments. Because of that, it’s safe to say that they’re solving real-world problems.
Weaknesses: There’s no community edition or a trial that engineers can utilize. They have to speak with KSOC first. At the time of writing this, KSOC is geared towards automatic remediation targeting Kustomize. Although it can work with other resources and objections in Kubernetes, it’s not the direct focus.
Kubescape, in my opinion and other engineer’s opinions, is one of the easiest
Kubernetes security tools to use in the space today. You’ll typically see tools that are geared towards just scanning Kubernetes Manifests, or just scanning clusters, and very rarely giving the ability to have automatic remediation. Kubescape does all of it, which also includes a great UI, great CLI, and an RBAC Visualizer that lets you know what’s truly happening in your environment from a user, group, and service account perspective.
Kubescape has both the ability to:
- Run automatic CIS scans with the CLI, which can be utilized in a CICD pipeline.
- Has multiple plugins for various CICD platforms to run CIS scans.
Kubescape doesn’t just cover the major cloud providers in terms of Kubernetes cluster scanning, but you can also perform CIS scanning on Kubernetes-based systems like OpenShift. Realistically, the only confirmation that’s needed is that wherever you’re running CIS scans from, has access. For example, if you want to run a CIS scan against an AKS cluster, as long as you have access to Azure, it will work successfully. Same thing with any other Kubernetes-based environment. You aren’t locked into specific clouds.
Kubescapes automatic remediation capabilities span from cluster configurations to application configurations. If it exists in the Kubernetes CIS Benchmark, it exists in Kubescape.
As an example, let’s see some security risks from CIS that came back on a Minikube cluster.
Once you see the remediation, you can highlight over it, or click it, and see the command that you should run based on the CIS recommendation.
You can also toggle on the remediation for your cluster, which automatically remediates the CIS security mitigation for you.
To wrap up the Kubescape section, there’s the reporting piece that can help upper management and engineering leads understand what’s happening in a Kubernetes environment without having to go and do the scan themselves.
By clicking the Export button on the configuration scanner, you have three options.
The two options are:
- Export Controls
- Export Resources
As an example, by clicking the Export Controls button, you can see that the output can be in an Excel spreadsheet and shared with management or other engineers on the team to understand what’s happening underneath the hood.
Strengths: Easily scan clusters, Kubernetes Manifests, and have automatic remediation from the infrastructure and developer side. No other tool/platform is giving this combination as the market currently stands.
Weaknesses: Kubescape doesn’t scan child CIS benchmarks like the CIS GKE Benchmark. It’s the publics understanding that it’s on the roadmap.
Kubernetes is still very new for a lot of engineers and will continue to feel this way for quite a while, but even so, proper security practices should be in place. Without them, a Kubernetes environment will never truly be ready for production.
In this blog post, you learned about what CIS is, how it’s useful in Kubernetes and a few key tools/platforms that you can take a look at for implementing scanning best practices and automatic remediation.