artur-rodrigues.com

Automating multi architecture workloads in Kubernetes

by

When scheduling workloads, a vanilla Kubernetes installation is unaware of the compatibility of the container images that compose a Pod and the target node architecture. In the best case scenario, the workload is composed of container images that support multiple architectures (arm64 and amd64), allowing any node to be selected for housing that workload and letting the container runtime itself running in that node fetch the correct image.

However, there might be certain cases where a particular Pod contains container images built for a single architecture (for example, amd64) - in such cases, Kubernetes won’t prevent that workload from being scheduled on an arm64 node, unless the cluster administrator and/or the application owner have made extra configurations. What are those configurations?

The first possibility is for the cluster administrator to have decided to only run amd64 nodes. That is likely to be largely compatible with all open source tools and images, as amd64 remains the default architecture in the cloud, and most build pipelines will target it. In other words, simply by running only the amd64 architecture, a cluster administrator will probably never face issues around container image compatibility.

The other possibility is for the cluster administrator to put a taint on all arm64 nodes in the cluster. In fact, on GKE this is done by default:

By default, GKE schedules workloads only to x86-based nodes—Compute Engine machine series with Intel or AMD processors—by placing a taint (kubernetes.io/arch=arm64:NoSchedule) on all Arm nodes. This taint prevents x86-compatible workloads from being inadvertently scheduled to your Arm nodes

Outside of GKE, a cluster administrator might do the same to the node groups/pools that include ARM instances. In those cases, the pods will need a toleration to be able to run on such nodes.

Finally, application owners also have the possibility to use node selectors to ensure their workloads are only scheduled on amd64 nodes, by targeting the default label kubernetes.io/arch with value amd64.

Automating multi architecture clusters

As long as the cluster administrator taints nodes that are part of a node group that might include arm64 instances, we can automate the inclusion of tolerations through a Mutating Admission Controller. This controller will intercept all pod creation events, check the supported architectures for all containers specified in the spec, and include a toleration if all images are multiarch (arm64 and amd64 compatible):

func DoesPodSupportArm64(cache Cache, pod *corev1.Pod) bool {
	supported := true
	for _, container := range pod.Spec.Containers {
		if !DoesImageSupportArm64(cache, container.Image) {
			supported = false
		}
	}
	return supported
}

A proof-of-concept can be found on arturhoo/k8smultiarcher. In particular, regclient/regclient is used to fetch all the supported platforms/architectures through the Manifests V2 API, which does not require downloading the full image. The project also incorporates a cache and a fail-open mechanism (in case of failures or timeout, no toleration is added).

One important implementation detail is that the toleration added is for a multi-architecture node group, not a arm64-only one. This is particularly important when running a cluster autoscaling strategy (e.g. Karpenter) that taps into several instance types and spot offerings, i.e. at certain times it might be cheaper to run amd64 nodes.

A full end-to-end example can be found on the test suite for the project.