Topology Manager

The Topology Manager is a component of kubelet that coordinates the factors responsible for CPU and other hardware acceleration optimizations. It manages the elements accountable for these optimizations and acts as a reliable source of information that other Kubelet features use to optimize the topology’s resource allocation choices. This feature allows latency adverse systems requiring high-throughput parallel processing power to always maintain the maximum level of performance by incorporating data from the CPU Manager and other device plugins when making decisions about pod placement. Without topology manager, CPU Manager and Device Manager make decisions independently and this can result in pod placements that are undesirable.

Topology Manager also gets info from Hint Providers, which identifies the available NUMA Nodes. Hint Providers are furnished via an interface that sends and receives topology information. These hints act as an indicator of the nodes’ availability and defines the needed allocation of resources. The policies respond to the provided hints and select the best one defined by the policy to provide the optimal result. The selected hint is then stored as part of the Topology Manager. Depending on the policy settings, the pod can be accepted or declined from the node based on the selected hint. To enable this feature, the topology manager must start the kubelet with the feature gate flag set.

Topology Manager Policies

There are four supported policies for the Topology Manager:

  1. None: This is the default policy and uses no topology arrangement.
  2. Best-effort: When kubelet uses best-effort topology management policy, it calls each Hint Provider to discover their resource availability for every container within a Pod. Using this information, the Topology Manager saves the chosen NUMA node affinity for that container. If an affinity is not selected, the Topology Manager stores this information and admits the pod to the node anyway. The Hint Providers then use this data when deciding on future resource allocation.
  3. Restricted: When kubelet uses restricted topology management policy, it requests information from each Hint Provider to determine the available resources for the containers in a Pod. Using this information, the Topology Manager then stores the preferred NUMA Node affinity for that container. If the affinity is not wanted, the Topology Manager will reject this pod from the node. This results in the pod being seen as in a Terminated state, with a pod admission failure. As a result, the pod is considered in a Terminated state, which results in a pod admittance failure. Since the pod is perceived to be in a Terminated state, the Kubernetes scheduler will not try to reschedule the pod.
  4. Single NUMA node: If kubelet employs single-numa-node topology management policy, kubelet calls each Hint Provider to locate the available resources. The topology manager then uses this info to determine if a NUMA Node affinity is feasible. The Topology Manager retains this data, allowing the Hint Providers to decide on the resource allocation. If this is not possible, this also results in the pod being seen as in a Terminated state, with a pod admission failure logged. As a result, the pod is considered in a Terminated state, which results in a pod admittance failure. Since the pod is perceived to be in a Terminated state, the Kubernetes scheduler will not try to reschedule the pod.

Configuration on PMK

PMK now supports configuration of Topology Manager. The following fields are added to the cluster create API of qbert:

JSON
Copy

The default value of the cpuManagerPolicy is none and is expected work for most applications.

When cpuManagerPolicy is set to static , it turns on the CPU Manager which is required for Topology Manager to work.

The topologyManagerPolicy field sets the policy for Topology manager. It's possible values are listed below:

  • none : Topology Manager is turned off.
  • best-effort : Kubelet uses Hint Providers to determine the best placement location for a pod. The pod is scheduled even if all affinity requirements are not met.
  • restricted : Kubelet uses Hint Providers to find the preferred NUMA Node affinity for that container. If the affinity is not wanted, the pod is rejected and terminated with a pod admission failure. Since the pod is in Terminated state, the Kubernetes scheduler will not reschedule it.
  • single-numa-node : Kubelet uses Hint Providers to find a single NUMA node that meets all the affinity requirements. The pod is scheduled if such a node exists or moved to a terminated state otherwise.

You can read more about these policies here.

reservedCPUs is a list of CPUs reserved for general purpose system use. Kubelet will not schedule pods to run on these CPUs.

Limitations

  • The Topology Manage only allows eight NUMA nodes.
  • If more than 8 NUMA nodes exist, there will be a state explosion when enumerating the possible NUMA affinities and generating their hints.
  • The scheduler is not topology-aware, so it can be scheduled on a node and then fail on the node due to the Topology Manager. These kubelet settings are configurable using the --topology-manager-policy option.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated