artur-rodrigues.com

Rate limiting Kubernetes pod creation with dynamic admission control

by

Resource Quotas and Limit Ranges are common ways to limit the number of pods (or resources used by pods) in Kubernetes clusters. However, when using Jobs for big-data or machine-learning pipelines it might be desirable to also start considering the rate which pods are created, especially if jobs are short-lived and there’s a concern that the control plane might be overwhelmed.

The first line of defence should be configuring the API server flags --max-requests-inflight and --max-mutating-requests-inflight, followed by configuring API Priority and Fairness, which allows for fine grained requests to be deprioritised (and ultimately rate limited) relative to other requests. Finally, the alpha Event Rate Limit can put a ceiling on the number of requests per second sent to the API server on a given namespace, for example.

Thinking about a final line of defence, I decided to explore implementing an admission webhook that would be configured (through a ValidatingWebhookConfiguration) to intercept all pod creation requests and enforce a rate limit.

var limiter = rate.NewLimiter(rate.Every(10*time.Second), 1)

func validatingHandler(c *gin.Context) {
	var review admissionv1.AdmissionReview
	if err := c.Bind(&review); err != nil {
		return
	}

	allowed := limiter.Allow()
	var status, msg string
	if allowed {
		status = metav1.StatusSuccess
	} else {
		status = metav1.StatusFailure
		msg = "rate limit exceeded"
	}

	review.Response = &admissionv1.AdmissionResponse{
		UID:     review.Request.UID,
		Allowed: allowed,
		Result: &metav1.Status{
			Status:  status,
			Message: msg,
		},
	}
	c.JSON(200, review)
}

Using golang.org/x/time/rate, we keep a limiter that allows one request every 10 seconds. If the request is allowed, we return StatusSuccess, otherwise we return a StatusFailure which will prevent the pod from being created.

The configuration itself, defines a rule that narrows the scope to only pod creation with a ‘fail open’ failure policy:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: k8slimiter-pod-creation
  annotations:
    cert-manager.io/inject-ca-from: k8slimiter/k8slimiter-certificate
webhooks:
  - name: k8slimiter-pod-creation.k8slimiter.svc
    admissionReviewVersions:
      - v1
    clientConfig:
      service:
        name: k8slimiter-service
        namespace: k8slimiter
        path: "/validate"
    rules:
      - apiGroups: [""]
        apiVersions: ["v1"]
        operations: ["CREATE"]
        resources: ["pods"]
    failurePolicy: Ignore
    sideEffects: None

With those in place, creating pods in quick succession leads to the expected rate limiting behaviour:

$ kubectl run "tmp-pod-$(date +%s)" --restart Never --image debian:12-slim -- sleep 1
pod/tmp-pod-1698005111 created
$ kubectl run "tmp-pod-$(date +%s)" --restart Never --image debian:12-slim -- sleep 1
Error from server: admission webhook "k8slimiter-pod-creation.k8slimiter.svc" denied the request: rate limit exceeded

A full working example can be found on arturhoo/k8slimiter, which leverages Gin and cert-manager to achieve a minimal and straightforward admission webhook setup.