Kubernetes security considerations & best practices

Segregation of duties — Source: https://unsplash.com

As of now we are in the middle of a big boost in Kubernetes adoption. The number of organizations deploying containers in production using this popular orchestration tool grows every week. Gartner expects (source) that by 2022 more than 75% of globally operating companies run containerized applications in production. Kubernetes helps to ease the deployment of applications for developers. Democratizing access to containers for developers isn’t necessarily a good thing for security, though. Attention to security-related aspects should not wait until the first application is in production and serves customers. In this article I will highlight Kubernetes security considerations & best practices.

Secure by default?

One of the biggest benefits of Kubernetes over other container orchestration tools is the simplified and uniform way to deploy containerized applications. Being adopted by the CNCF, thousands of developers and other experts help to support this strategy. Default settings in Kubernetes help to quickly deploy applications. However, some of the default configuration options are not inherently secure. Just to name a few to get started:

Without network policies, all applications can talk to each other in an unrestricted way.
Even with Role Based Access Control (RBAC) enabled, the “least privilege principle” is not implemented. You need to implement the RBAC rules yourself.
The worker nodes on which the pods/containers run are not hardened by default.
Without implementing any rules (e.g. pod security policies) an application can use any container image coming from anywhere (e.g. untrusted sources like Docker Hub)
No default restrictions exists to run containers without optimized security configuration (e.g. run containers without resource limits, run containers in privileged mode, etc)

Scan container images

In a previous article, we did a deep-dive into container security both from a static and dynamic point of view. A lot of websites give the advice to only use trusted images from official vendors. Even those images can contain vulnerabilities and security bad practices. Those images are not secure by default.

You should also scan the container images of your Kubernetes cluster for security problems. For example, the container images which are used for your Kubernetes proxy, your scheduler or the network (e.g. CNI). Wherever your container images come from, be sure to check it using specialized tools, for example Clair or Sysdig.

Considerations

Kubernetes consists of a lot of (highly dynamic) components. These are created and destroyed on the fly. Sometimes even without any human interaction. This makes it difficult to secure them all. The following considerations should help operating a more secure cluster:

Technical aspects

Containers are everywhere: almost all of the Kubernetes components run as a container. Don’t forget the deployment patterns which also includes side-car containers (e.g. for logging, audits, pre-boot configuration, etc)
Container images can be downloaded from any (untrusted) source or created in-house. Both approaches can lead to the inclusion of CVEs and other bad security practices. Developers need to be trained, the organization need to adopt a way to quickly rebuild the container images to be able to resolve any problematic issues. It’s not just a technical challenge. Operators, security experts and developer need to work together to achieve this.
CVEs in Kubernetes (and other core components) are found regularly. Be sure to check your Kubernetes environment for these CVEs. See this website for the last information of those CVEs. It is important to have a good patch strategy in place which also adheres to zero-touch platforms and to prevent any downtime. Managed Kubernetes services like EKS, AKS and GCE can help to achieve this and to perform upgrades easier.

Containers — Source: https://pixabay.com

Organizational aspects

Pay special attention to monitoring containers since they are ephemeral by nature. Switch audit logging on and save all logs to an external system. In case a container is replaced (never patch a running container), the old container is destroyed and replaced with a new one. Keep this in mind when collecting evidence about a possible exploited vulnerability. You might need special forensics software such as Sysdig secure to do this. Take related triggers and events into account. Avoid creating a big stack of “false positives” which would definitively overload your Security Operations Center (SOC) teams.
Compliance issues are likely to popup if you do not define any standards and guidelines on containers and the underlying infrastructure. DevOps gives teams a lot of freedom to choose their own naming conventions, ways of working with regard to CI/CD processes etc. However, try to standardize on how to deploy and how to configure the (application) scripts and components which make up the deployment configuration.

It is advised that your organization practices a healthy “DevSecOps culture” since a lot of these security considerations demand a lot of flexibility and speed from the various teams involved. For example: it does not make sense to scan your container runtime environment for vulnerabilities and not being able to patch them in a timely manner.

Upgrades and maintenance

A simple tip to keep in mind for all everyone running applications in cloud native environments: upgrade to the latest version. The latest is most often the greatest. The CNCF releases new versions of Kubernetes every quarter. Old versions become deprecated quickly.

Upgrades should be controlled and executed only by authorized administrators. Avoid any downtime by upgrading the different components (e.g. worker nodes) one by one. All of the important administrative actions should be audited to trace any steps being executed. This way, suspicious behavior can be detected fairly easily. Kubernetes gives the option to create audit policies to log anything which passes the API server. The three major cloud providers (GCP, Azure, AWS) give you an option to switch on audit logging for the control plane level as well.

Authorization

One of the key aspects of securing any Kubernetes cluster is authentication and authorization. Kubernetes supports RBAC to handle permissions on the cluster. RBAC is the “new default” authorization method – always switch this on and be sure to switch off the deprecated ABAC. After switching on RBAC, be sure to apply fined-grained access controls to anyone who should have access to the cluster. For example: in EKS the person who created the cluster is an admin automatically. It is important to adhere to the least privilege principle here when adding more users and groups. Not everyone in the organization needs to be an Kubernetes admin.

Another great way to protect your Kubernetes API server (the most important component) is to restrict access only from private networks. In EKS, for example it is now possible to enable this. In AKS, a similar option is there in preview mode. This gives you a great level of protection. However, now there is a requirement to implement a safe connection between your own network and the endpoint (network) of your Kubernetes cluster.

One extra tip. Pay special attention when connecting your Kubernetes cluster to your CI/CD environment. Gitlab CI enables you to connect to your cluster, but only on an admin level. This makes any user who can access Gitlab CI an admin on your cluster.

You can use the RBAC API to create the right Roles, ClusterRoles (roles which apply for an entire cluster), RoleBinding and ClusterRoleBinding (both attach roles to users). Differentiate developers which should only consider deploying their applications in their dedicated namespace (section of the cluster) and operators who need to maintain and troubleshoot the entire cluster. It is best to start with no permissions and slowly add more rights to the users when needed.

Segregation of duties

Remember Kubernetes makes it easy for developers to deploy their containerized applications? Therefore, by default every container can access any other container in the cluster. So in case you have different (internal) customers which operate in a single cluster, those applications can all access each other without any restriction. This might not be desired.

Namespaces

One solution is to operate one cluster per customer to avoid this. The downside of this is that it will create a lot of overhead and costs. In case that is a concern, you can segregate your cluster into multiple namespaces. Each customer gets it’s own namespace to deploy their applications in. Even when there is only a single team which deploys multiple (not related) applications, it’s best to create separate namespaces. Given this solution, keep in mind that segregation is on another level. For some use cases this might not be sufficient.

Network policies

You can restrict access permissions on a namespace level and restrict containers from talking to each other when they are deployed in separate namespaces.

A network policy is a Kubernetes resource (component) which applies to the containers (Pods) in your cluster. An example of a network policy is to explicitly only accept connections to your application by from a certain network or IP address (range). Another example is to restrict outgoing traffic from your application to only a set of whitelisted Urls.

With this network policy in place, the level of isolation is increased. Furthermore, the blast radius (in case of a serious exploit) is limited so an attacker could do less damage. It is much easier to apply network policies to namespaces compared to an entire cluster.

Dedicated machines

Another way of segregation is to use a dedicated set of machines for sensitive workloads. Consider a worker node pool (the application containers actually run on worker nodes) which resides in a separate section of the network (subnet).

Prevent any non-sensitive workload from running on that (dedicated) worker node by setting taints, toleration’s and affinity. All of these are advanced Kubernetes controls, but it’s worth the effort to invest time in it to understand them.

Pod security policies

The containers which hosts the applications are grouped in Pods. Pod security policies enforces (security) checks of containers and Pods before they are deployed to the cluster. A couple of examples of Pod Security Policies for your containers:

Prevent a container which does not have any memory or CPU limits in place. This could overload and crash your entire cluster.
Prevent the deployment of a Pod which does not have proper labels and annotations. A benefit of proper labels and annotations is to be able to better schedule and monitor it. Consistency is king to keep control and to speed up a lot of DevOps processes.
Prevent a container from running privileged mode (e.g. drop all security settings)
Prevent a container which tries to mount “illegal” host-level directories inside the container. For example: prevent to mount /root read-write inside a container.
Avoid containers from acquiring extra capabilities (e.g. modify process capabilities, change the UID or GUID).

Pod security policies work together with “classic” Linux security mechanisms like AppArmor and SeLinux, so a good understanding of those technologies is essential to properly implement it.

Secure your (worker) nodes

Kubernetes uses worker nodes for the application containers. It is very important that you secure them as good as possible. If an attacker gains access to one of your worker nodes, he/she can do a lot of damage. Some scenarios of what could happen when an attacker gains access to your worker node:

access all Pods (and DaemonSets, ReplicaSets, etc) which are running on that specific worker node.
jump to another worker node and access and control the Pods on that node.
read information which is stored on volumes which are mounted on that worker node. Your valuable data is at big risk.

Worker nodes have a container runtime engine. This should also protected. Sysdig can achieve this by utilizing various detection and scanning mechanisms. For example:

run a CIS benchmark on your worker nodes and present problems with links to solutions to mitigate any CIS related issues. The same is true for NIST findings.
check your run-time environment for improper configurations. For example: permissions which are too wide or wrong paths of security configuration files. It can also detect files which are modified without being audited.
check for viruses and malware on the operating system level.

Restrict access

You should restrict direct access to your worker nodes as much as possible. It should not be able to (manually) login to these worker nodes via the internet. Instead, place your worker nodes in a private subnet and use a bastion host to access them (if needed at all). Troubleshooting can be done in DEV environments, in higher environments you might want to restrict logging in at all.

Security fence — Source: https://pixabay.com

In case you need to login using SSH, you can bind SSH keys to specific devices for your users using special software. In AWS you can use the session manager to login to your worker nodes. This way you can close down another port in your firewall 🙂

Conclusion

Security is never finished, especially not in complex Kubernetes environments which run in the public cloud. With this article I hope you get a good starting point for Kubernetes security considerations & best practices to secure your clusters and to be able to run your applications in production with great confidence.