PMK Scale Guide

Each PMK customer when on-boarded is provided a Management Plane (also known as Deployment Unit/DU/KDU) and this section outlines the recommendation and best practices for the same.

Following values are listed per Management Plane Instance:

CriteriaValue
Maximum number of nodes2500
Maximum number of clusters (Single node clusters)300
Maximum number of clusters (Small clusters - upto 8 nodes)30
Maximum number of clusters (Medium clusters - upto 200 nodes)8
Maximum number of clusters (Large clusters - upto 400 nodes)5

Maximum number of clusters (Combination of medium and large clusters)

Test configuration:

  • 400 Node clusters: 2
  • 250 Node clusters: 2
  • 200 Node clusters: 4
8
Maximum number of nodes onboarded in parallel30
Maximum number of clusters created in in parallel (Single node clusters)10

Note: Above values are based on latest Platform9 standard tests and are listed to provide guidance to users. Platform9 support can help you to scale to different numbers if above standard results are different from your requirements. Higher scale can be achieved with multiple Management Plane Instances, to go beyond the above listed node and cluster limits.

Following values are listed per PMK cluster which runs on a Management Plane Instance:

CriteriaValue

Maximum number of nodes Test configuration:

  • Master & worker count: 5 masters, 395 workers
  • Kubernetes version: 1.26 - 1.29. (PMK 5.9 and 5.10)
  • Master node size: 18 vcpus, 30 GB memory
  • Worker node size: 2 vcpus, 6GB memory
  • Pod density: 23
  • Cluster cpu usage max: 63%
  • CNI: Calico
  • Calico BGP: True; with Route-reflectors (3 nodes)
  • Metallb BGP: True
400

Maximum number of nodes

Test configuration:

  • Master & worker count: 5 masters, 395 workers
  • Kubernetes version: 1.22-1.25 (PMK 5.6.8, 5.7.3 and 5.9.2 )
  • Master node size: 18 vcpus, 30 GB memory
  • Worker node size: 2 vcpus, 6GB memory
  • Pod density: 23
  • Cluster cpu usage max: 63%
  • CNI: Calico
  • Calico BGP: False
  • Metallb BGP: False
300

Maximum number of node upgrades in parallel in a cluster Test configuration:

  • Master & worker count: 5 masters, 395 workers
  • Kubernetes version: 1.26 - 1.29
  • Master node size: 18 vcpus, 30 GB memory
  • Worker node size: 2 vcpus, 6GB memory
  • Pod density: 23
  • Cluster cpu usage max: 65%
  • CNI: Calico
  • Calico BGP: Calico with Route-reflectors (3 nodes)
  • Metallb BGP: True
  • Upgrades versions tested: 1.26->1.27, 1.27->1.28, 1.28->1.29
40 (10 % of total 400 nodes)
Maximum number of nodes to be attached to a cluster in parallel15
Maximum number of nodes to be detached from a cluster in parallel30
Maximum number of pods per node110 (Kubernetes default)

Some Test Observations:

Test configuration:

  • Master & worker count: 5 masters, 395 workers
  • Kubernetes version: 1.26 - 1.29
  • Master node size: 18 vcpus, 30 GB memory
  • Worker node size: 2 vcpus, 6GB memory
  • Pod density: 23
  • Cluster cpu usage max: 63%
  • CNI: Calico
  • Calico BGP: Calico wiih Route-reflectors (3 nodes)
  • Metallb BGP: True

Observations:

  • Number of pods: 9230
  • Number of pods per node: 23
  • Number of namespaces: 3000
  • Number of secrets: 15
  • Number of config maps: 1046
  • Number of services: 144
  • Number of pods per namespace: 7600 on single namespace
  • Number of services per namespace: 100
  • Number of deployments per namespace: 100

Component resource recommendations:

Number of nodesComponentLimitsRequestsAdditional data
350 to 400 nodes

cpu: 200m

memory:400Mi

cpu: 25m

memory: 100Mi

Test configuration: Pod density as 23 and cpu usage around 60%
300 nodesPrometheus

cpu: 2510m

memory: 12266Mi

Requests and limits could be set based on this observation. It is dependent on multiple factors such as number of node, number of promethues exporter being queries, number of time series data being stored, number of calls to Prometheus etc.

Management Plane Instance resource recommendations

Default (upto 750 nodes)

ComponentContainerLimitsRequests
Qbertqbert

cpu: 1500m

memory: 4000Mi

cpu: 40m

memory: 550Mi

Resmgrresmgr

cpu: 1000m

memory: 1500Mi

cpu: 25m

memory: 190Mi

Keystonekeystone

cpu: 1000m

memory: 1000Mi

cpu: 250m

memory: 800Mi

Prometheusprometheus

cpu: 1000m

memory: 4000Mi

cpu: 250m

memory: 200Mi

Vaultpf9-vault

cpu: 500m

memory: 500Mi

cpu: 25m

memory: 100Mi

Scaled configurations (750 to 2500 nodes)

ComponentContainerLimits(750-1500 nodes)Requests(750-1500 nodes)Limits(1500-2500)Requests(1500-2500)Additional changes
Prometheussocat19090

cpu: 1000m

memory: 1500Mi

cpu: 250m

memory: 400Mi

No ChangeNo Changemaxchild: 2500
prometheus

cpu: 1000m

memory: 4000Mi

cpu: 250m

memory: 200Mi

No ChangeNo ChangeWEB_MAX_ CONNECTIONS: 4000
Rabbitmqsocat5673

cpu: 400m

memory: 1000Mi

cpu: 50m

memory: 50Mi

cpu: 800m

memory: 1800 Mi

cpu: 200m

memory: 200Mi

rabbitmq

cpu: 1000m

memory: 1500Mi

cpu: 130m

memory: 750Mi

No ChangeNo Change
Resmgrsocat18083

cpu: 1000m

memory: 1500Mi

cpu: 250m

memory: 400Mi

Ingress-nginx-controllersocat444

cpu: 400m

memory: 1000Mi

cpu: 50m

memory: 50Mi

Sidekickserversocat13010

cpu: 400m

memory: 1000Mi

cpu: 50m

memory: 50Mi

sidekickserver

cpu: 500m

memory: 1000Mi

cpu: 50m

memory: 100Mi

Sunpike conductorsocat19111

cpu: 400m

memory: 1000Mi

cpu: 50m

memory: 50Mi

Pf9-vaultvault

cpu: 1250m

memory: 800Mi

cpu: 250m

memory: 400Mi

Sunpike-apiserversunpike-apiserver

cpu: 1000m

memory: 1000Mi

cpu: 500m

memory: 256Mi

Sunpike-conductorsunpike- conductor

cpu: 1000m

memory: 1000Mi

cpu: 200m

memory: 500Mi

Sunpike-kinesunpike-kine

cpu: 1000m

memory: 256Mi

cpu: 25m

memory: 256Mi

Sunpike-kube- controllerssunpike-kube- controllers

cpu: 500m

memory: 1000Mi

cpu: 25m

memory: 800Mi

Mysql/RDS config changes:

ConfigurationValue (750-1500 nodes)Value(1500-2500 nodes)
max_connections2048No change
mac_connect_error1000No change
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated