Understanding pod placement concepts in Kubernetes can be intimidating. In this post, we have tried to simplify it for beginner and intermediate-level platform engineers.
If you are reading this post, you likely already know what Kubernetes is and even how to use it for container orchestration.
Kubernetes uses a sophisticated scheduler called the "kube-scheduler" to determine where to place newly created pods within a cluster. The scheduler's primary goal is to ensure that workloads run efficiently and resources are utilised optimally.
A lot has been written about it really well.
But in this article, we will delve into the details of how Kubernetes determines where to place Pods and the various factors that influence the Pod placement decision. We will also discuss best practices for optimising Pod placement in Kubernetes to achieve better performance and resource utilisation.
Throughout this document, we will be using a hypothetical scenario to illustrate the different aspects of the Pod Placement Algorithm in Kubernetes.
In this scenario, we have a Kubernetes cluster with three nodes, each with different capacity and constraints:
We will be deploying a web application that consists of three pods: a frontend, a backend, and a database. Each pod has specific resource requests and limits, as well as certain constraints that need to be taken into account when scheduling them on the nodes.
Using this example, we will explore the different aspects of the Pod Placement Algorithm in Kubernetes, including the different placement strategies, node affinity and anti-affinity, taints and tolerations, and Pod Overhead.
Kubernetes provides several options for placing Pods on Nodes based on the resources required by the Pod and the available resources on the Nodes. Kubernetes uses a node selection algorithm and a Pod scheduling algorithm to place Pods on Nodes. The node selection algorithm determines which Nodes are eligible to run a given Pod, while the Pod scheduling algorithm chooses a specific Node from the eligible Nodes.
Kubernetes provides several strategies for node selection. These include:
Label-based selection allows administrators to specify a set of labels that must be present on a Node for it to be eligible to run a given Pod. Pod manifests can specify nodeSelector rules that specify label requirements for the Nodes that are eligible to run the Pod.
In our below example, we used label-based selection to ensure that only Nodes with the label disk=ssd were eligible to run our nginx Pod:
Node affinity and anti-affinity provide a more flexible way to express scheduling requirements than label-based selection. With node affinity and anti-affinity, administrators can specify more complex rules for node selection based on attributes of the Nodes themselves or their Pods.
Node affinity allows administrators to specify that Pods should be scheduled on Nodes that have certain labels or other attributes. Node anti-affinity allows administrators to specify that Pods should not be scheduled on Nodes that have certain labels or other attributes.
This example defines a Deployment for a web application consisting of three Pods: frontend, backend, and database.
Each Pod has resource requests and limits, and node affinity and anti-affinity rules to ensure they are scheduled on appropriate nodes.
The frontend and backend Pods have node affinity rules matching nodes with specific CPU and memory resources, and tolerations allowing them to be scheduled on nodes with certain constraints.
The database Pod has a preferred node affinity rule matching nodes with SSD disks and a toleration allowing it to be scheduled on nodes with certain constraints.
These rules optimise resource utilisation and ensure high availability of the web application.
Taints and tolerations provide a way to repel / force Pods from Nodes that are not suitable for them. Taints are labels applied to Nodes that indicate that Pods should not be scheduled on them unless the Pods have a corresponding toleration.
Let's say we have a Kubernetes cluster with three nodes: node-1, node-2, and node-3. We want to ensure that node-1 is used exclusively for critical workloads and node-2 is reserved for GPU workloads. To achieve this, we can add taints to these nodes and specify tolerations in our Pod YAML file.
In this example, the Pod has two tolerations specified. The first toleration matches the taint on node-1, with a key of "workload", value of "critical", and effect of "NoSchedule". This means that the Pod will be scheduled on node-1 only if no other node is available, as node-1 is reserved for critical workloads.
The second toleration matches the taint on node-2, with a key of "gpu" and effect of "NoSchedule". This means that the Pod will only be scheduled on node-2 if no other node is available and the Pod has a GPU requirement.
By using taints and tolerations, we can ensure that Pods are scheduled on nodes that meet their specific requirements and constraints, while also reserving certain nodes for specific workloads.
Once the node selection process is completed, the pod scheduling process starts. Kubernetes uses the following scheduling mechanisms to place the pod on the selected node:
Node Affinity allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels on the node. There are two types of node affinity:
In this example, we have added the Node Affinity rules to the Deployment YAML.
The requiredDuringSchedulingIgnoredDuringExecution rule for the database Pod specifies that the Pod must be scheduled on nodes with a label node whose value is database.
The podAntiAffinity rule specifies that the frontend and backend Pods should not be scheduled on the same node, by matching their labels against each other and using the topologyKey to ensure they are on different nodes.
These rules ensure that the Pods are scheduled on nodes that meet their specific requirements and constraints, optimizing resource utilization and ensuring high availability of the web application.
In Kubernetes, there are different strategies for placing Pods on nodes. The choice of strategy depends on the use case and the requirements of the application. Here are some of the Pod Placement Strategies supported by Kubernetes:
Spread-based placement ensures that Pods are evenly distributed across nodes to maximise availability and fault tolerance. This strategy is useful for applications that require high availability and cannot tolerate node failures. In this strategy, Kubernetes tries to place a new Pod on a node that has the fewest Pods of the same type (based on labels). This ensures that the Pods are spread across nodes as evenly as possible, reducing the impact of node failures.
In this example, we have defined a Deployment for a web application with a replica count of 3 and spread-based placement strategy. The strategy ensures that the Pods are evenly spread across the nodes based on the topology key "kubernetes.io/hostname". The spreadConstraints field specifies that the maximum skew between nodes should be 1.
The maxSkew field specifies the maximum allowed difference between the number of Pods running on any two nodes. In this case, we have set it to 1, which means that no node can have more than one additional Pod running than any other node.
The topologyKey field specifies the node label that should be used to determine the topology of the cluster. In this case, we have set it to "kubernetes.io/hostname" to ensure that the Pods are spread across different nodes.
Bin-packing placement optimises resource utilisation by packing Pods as densely as possible on nodes. This strategy is useful for applications that require high resource utilisation and do not require high availability. In this strategy, Kubernetes tries to place a new Pod on a node that has the most available resources, such as CPU and memory.
In this example, we define a Deployment for a web application that consists of three Pods: frontend, backend, and database. The Deployment uses the bin-packing placement strategy to pack Pods as densely as possible on nodes based on their resource utilization.
The frontend and backend Pods have specific resource requests and limits for CPU and memory, while the database Pod has a higher memory requirement. All three Pods have an environment variable set to "production".
The Deployment also uses pod anti-affinity to ensure that Pods are not scheduled on nodes that already have a Pod of the same app. The topology key used for the anti-affinity rule is "kubernetes.io/hostname", which means that Pods are spread across different nodes as much as possible.
Overall, this example demonstrates how bin-packing placement can be used to optimize resource utilization for a web application deployed on a Kubernetes cluster.
Kubernetes also allows users to define their own Pod Placement strategies using Kubernetes APIs or third-party tools. With custom placement strategies, users can define placement rules based on specific requirements, such as regulatory compliance, network locality, or hardware affinity.
To define a custom placement strategy, users can use Kubernetes APIs to specify placement rules based on node labels, node annotations, or other node metadata. Users can also use third-party tools, such as the Cluster API or the Kubernetes Resource Management API, to define custom placement strategies.
In this example, we define a custom placement strategy for a Pod that requires high priority and needs to be scheduled only on nodes with SSD disks. We also define podAffinity and podAntiAffinity rules that ensure the Pod is scheduled on nodes that match certain labels and topology keys, and tolerations that allow it to be scheduled on nodes with certain constraints.
Note that the priorityClassName field is defined in a separate PriorityClass resource and is referenced in the Pod definition. This allows for the definition of different priority levels for Pods in the same cluster.
The following best practices will help ensure Effective Pod Placement for efficient resource utilisation and high availability of applications in Kubernetes.
By following these best practices for using labels and selectors in Kubernetes, you can ensure that your application is well-organized, easy to manage, and performs reliably.
Pod placement is a critical component of Kubernetes that enables users to optimise their applications' resource utilisation and performance. With various placement strategies such as spread-based and bin-packing placement and custom placement strategies, Kubernetes provides users with the flexibility and control to place Pods on nodes based on their specific requirements.