PMK Scale Guide
Recommended Management Plane practices
Each PMK customer when on-boarded is provided a Management Plane (also known as Deployment Unit/DU/KDU) and this section outlines the recommendation and best practices for the same.
Following values are listed per Management Plane Instance:
Criteria | Value |
---|---|
Maximum number of nodes | 2500 |
Maximum number of clusters (Single node clusters) | 300 |
Maximum number of clusters (Small clusters - upto 8 nodes) | 30 |
Maximum number of clusters (Medium clusters - upto 200 nodes) | 8 |
Maximum number of clusters (Large clusters - upto 400 nodes) | 5 |
Maximum number of clusters (Combination of medium and large clusters) Test configuration:
| 8 |
Maximum number of nodes onboarded in parallel | 30 |
Maximum number of clusters created in in parallel (Single node clusters) | 10 |
Note: Above values are based on latest Platform9 standard tests and are listed to provide guidance to users. Platform9 support can help you to scale to different numbers if above standard results are different from your requirements. Higher scale can be achieved with multiple Management Plane Instances, to go beyond the above listed node and cluster limits.
Recommended Cluster configuration practices
Following values are listed per PMK cluster which runs on a Management Plane Instance:
Criteria | Value |
---|---|
Maximum number of nodes Test configuration:
| 400 |
Maximum number of nodes Test configuration:
| 300 |
Maximum number of node upgrades in parallel in a cluster Test configuration:
| 40 (10 % of total 400 nodes) |
Maximum number of nodes to be attached to a cluster in parallel | 15 |
Maximum number of nodes to be detached from a cluster in parallel | 30 |
Maximum number of pods per node | 110 (Kubernetes default) |
Some Test Observations:
Test configuration:
- Master & worker count: 5 masters, 395 workers
- Kubernetes version: 1.26 - 1.29
- Master node size: 18 vcpus, 30 GB memory
- Worker node size: 2 vcpus, 6GB memory
- Pod density: 23
- Cluster cpu usage max: 63%
- CNI: Calico
- Calico BGP: Calico wiih Route-reflectors (3 nodes)
- Metallb BGP: True
Observations:
- Number of pods: 9230
- Number of pods per node: 23
- Number of namespaces: 3000
- Number of secrets: 15
- Number of config maps: 1046
- Number of services: 144
- Number of pods per namespace: 7600 on single namespace
- Number of services per namespace: 100
- Number of deployments per namespace: 100
Component resource recommendations:
Number of nodes | Component | Limits | Requests | Additional data |
---|---|---|---|---|
350 to 400 nodes | cpu: 200m memory:400Mi | cpu: 25m memory: 100Mi | Test configuration: Pod density as 23 and cpu usage around 60% | |
300 nodes | Prometheus | cpu: 2510m memory: 12266Mi | Requests and limits could be set based on this observation. It is dependent on multiple factors such as number of node, number of promethues exporter being queries, number of time series data being stored, number of calls to Prometheus etc. |
Management Plane Instance resource recommendations
Default (upto 750 nodes)
Component | Container | Limits | Requests |
---|---|---|---|
Qbert | qbert | cpu: 1500m memory: 4000Mi | cpu: 40m memory: 550Mi |
Resmgr | resmgr | cpu: 1000m memory: 1500Mi | cpu: 25m memory: 190Mi |
Keystone | keystone | cpu: 1000m memory: 1000Mi | cpu: 250m memory: 800Mi |
Prometheus | prometheus | cpu: 1000m memory: 4000Mi | cpu: 250m memory: 200Mi |
Vault | pf9-vault | cpu: 500m memory: 500Mi | cpu: 25m memory: 100Mi |
Scaled configurations (750 to 2500 nodes)
Component | Container | Limits(750-1500 nodes) | Requests(750-1500 nodes) | Limits(1500-2500) | Requests(1500-2500) | Additional changes |
---|---|---|---|---|---|---|
Prometheus | socat19090 | cpu: 1000m memory: 1500Mi | cpu: 250m memory: 400Mi | No Change | No Change | maxchild: 2500 |
prometheus | cpu: 1000m memory: 4000Mi | cpu: 250m memory: 200Mi | No Change | No Change | WEB_MAX_ CONNECTIONS: 4000 | |
Rabbitmq | socat5673 | cpu: 400m memory: 1000Mi | cpu: 50m memory: 50Mi | cpu: 800m memory: 1800 Mi | cpu: 200m memory: 200Mi | |
rabbitmq | cpu: 1000m memory: 1500Mi | cpu: 130m memory: 750Mi | No Change | No Change | ||
Resmgr | socat18083 | cpu: 1000m memory: 1500Mi | cpu: 250m memory: 400Mi | |||
Ingress-nginx-controller | socat444 | cpu: 400m memory: 1000Mi | cpu: 50m memory: 50Mi | |||
Sidekickserver | socat13010 | cpu: 400m memory: 1000Mi | cpu: 50m memory: 50Mi | |||
sidekickserver | cpu: 500m memory: 1000Mi | cpu: 50m memory: 100Mi | ||||
Sunpike conductor | socat19111 | cpu: 400m memory: 1000Mi | cpu: 50m memory: 50Mi | |||
Pf9-vault | vault | cpu: 1250m memory: 800Mi | cpu: 250m memory: 400Mi | |||
Sunpike-apiserver | sunpike-apiserver | cpu: 1000m memory: 1000Mi | cpu: 500m memory: 256Mi | |||
Sunpike-conductor | sunpike- conductor | cpu: 1000m memory: 1000Mi | cpu: 200m memory: 500Mi | |||
Sunpike-kine | sunpike-kine | cpu: 1000m memory: 256Mi | cpu: 25m memory: 256Mi | |||
Sunpike-kube- controllers | sunpike-kube- controllers | cpu: 500m memory: 1000Mi | cpu: 25m memory: 800Mi |
Mysql/RDS config changes:
Configuration | Value (750-1500 nodes) | Value(1500-2500 nodes) |
---|---|---|
max_connections | 2048 | No change |
mac_connect_error | 1000 | No change |