Platform9 Edge Cloud Release Notes
What's new 2023-12-01 Platform9 Edge Cloud 5.3 LTS Patch #15
Platform9 Kube Version: 1.20.15-pmk.2218
Airctl Release Build Version: v-5.3.0-3085281
Security Fixes
This version of Python contains a fix for CVE-2023-24329 backported to Python 3.9 from Python 3.11
What's new 2023-06-09 Platform9 Edge Cloud 5.3 LTS Patch #14
Platform9 Kube Version: 1.20.15-pmk.2218
Airctl Release Build Version: v-5.3.0-2710638
Security Fixes
The full list of vulnerabilities reported with the latest Trivy scan is here. Compare this to the previous 13.0 patch and its list of vulnerabilities here.
Bug Fixes
Known Issues
What's new 2023-05-15 Platform9 Edge Cloud 5.3 LTS Patch #13
This patch is now treated as dead-on-arrival (DOA), and is no longer supported. Please upgrade directly to patch #14 instead.
Platform9 Kube Version: 1.20.15-pmk.2214
Airctl Release Build Version: v-5.3.0-2674451
Bug Fixes
Known Issues
Whats new in Supplemental Patches #12.1 & #6.1
2022-07-05 Platform9 Edge Cloud 5.3 LTS Patch #12.1
Platform9 Kube Version: 1.20.15-pmk.2124
Airctl Release Build Version: v-5.3.0-2075501
2022-06-27 Platform9 Edge Cloud 5.3 LTS Patch #6.1
Platform9 Kube Version: 1.20.11-pmk.2119
Airctl Release Build Version: v-5.3.0-2043507
Bug Fixes
Robust API Health Check - Moved deprecated health check endpoint from healthz to livez to help with grabbing data for the health of the Kube API server.
- Validation: Check if
/opt/pf9/pf9-kube/vrrp_check_apiserver.shis usinghttps://127.0.0.1:443/livezendpoint. This should improve health check reliability and will be used by both keepalived and nodelet phases for status checks. - Reference: https://kubernetes.io/docs/reference/using-api/health-checks/
- Validation: Check if
Logging for Master & API Containers - This patch will start writing logs from kube-apiserver, kube-controller-manager and kube-scheduler on master nodes to
/var/log/pf9/{kube-apiserver,kube-controller-manager,kube-scheduler}locations. Each of the directory will have a.INFO,.ERRORand.WARNINGfile logging out corresponding log levels. For example, file names for kube-controller-manager: kube-controller-manager.ERROR, kube-controller-manager.INFO and kube-controller-manager.WARNING.
Patch #6.1 is intended to be applied directly to patch #6 installs. However, once applied to the environment, the upgrade path must now be to patch #12.1, a cumulative update that will include all fixes from patches #7 - #12.
What's new 2022-04-13 Platform9 Edge Cloud 5.3 LTS Patch #12
Platform9 Kube Version: 1.20.15-pmk.2100
Airctl Release Build Version: v-5.3.0-1911578
This build resolves an issue in the previous build when upgrading from a prior patch version.
Bug Fixes
"_task pingmaker_” from muster.conf
- When keystone enabled, the kubeconfig cannot be used if the DU is powered off.
- When keystone is disabled during the cluster creation (with flag "keystoneEnabled"), the downloaded kubeconfig does work for doing cluster operations even when the DU is powered off. The kubeconfig will use the username/password for keystone if "keystone is enabled". Only when it is "disabled", you will get the right kubeconfig with just the certificates.
Fix is provided for the following use cases:
- Token
- User/Password
- Certificate based (w/RBAC)
https://platform9.com/docs/kubernetes/kubeconfig-through-api
"If force_cert_auth query param set to true, kubeconfig would contains certificate based authentication, otherwise it’ll be token based"
NOTE: See #1397831 to view the steps to manually set the expiry of the client certificate in the Kubeconfig.
How to edit limits for Calico Pods
- Create cluster with POST API (The calico related fields in body are included as an example.)
curl --request POST \ --url https://platform9.io/qbert/v4/projectId/clusters \ --header "X-Auth-TokenX-Auth-Token" \ --data '{ "calicoNodeCpuLimit""1", "calicoNodeMemoryLimit""1000Mi", "calicoTyphaCpuLimit""200m", "calicoTyphaMemoryLimit""500Mi", "calicoControllerCpuLimit""200m", "calicoControllerMemoryLimit""400Mi"'- Update existing cluster with PUT API.
curl --request PUT \ --url https://platform9.io/qbert/v4/project_uuid/clusters/uuid \ --header "X-Auth-TokenX-Auth-Token" \ --data '{ "calicoNodeCpuLimit""1", "calicoNodeMemoryLimit""1002Mi", "calicoTyphaCpuLimit""202m", "calicoTyphaMemoryLimit""502Mi", "calicoControllerCpuLimit""202m", "calicoControllerMemoryLimit""402Mi"'- Restart the stack on the master node.
sudo systemctl stop pf9-nodeletd pf9-hostagent/opt/pf9/nodelet/nodeletd phases restartFor API reference: https://platform9.com/docs/v5.3/qbert/ref#putupdate-the-properties-of-a-cluster-specified-by-the-cluster-u
- In addition to the above, the fix for a bug where the metrics-server does not respect Kubernetes CPU limits is included. To fix this issue, edit the metrics-server deployment and increase the --cpu parameter for the metrics-server-nanny container. A good value for most clusters is 100m. See Example below.
Alert on Prometheus==========================ALERTS{alertname="CPUThrottlingCritical",alertstate="firing",container="metrics-server-nanny",namespace="kube-system",pod="metrics-server-v0.5.0-645fd4594c-x456b",severity="critical"}Prometheus rule==============Alert: CPUThrottlingCriticalAnnotations:Description: {{ $value | humanizePercentage }} throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container }} in pod {{ $labels.pod }}.Summary: Processes experience elevated CPU throttling, more than 85%.Expr: sum(increase(container_cpu_cfs_throttled_periods_total{container!="", }[5m])) by (container, pod, namespace)/sum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace)> ( 85 / 100 )For: 15mLabels:Severity: criticalGrafana snapshotresources: limits: cpu: 100m memory: 300Mi requests: cpu: 5m memory: 50MiHow to edit limits for Metrics-Server pod
Steps to manually change the override parameters for the metrics-server pod:
- API for patching the Cluster AddOn object of metrics-server.
export TOKEN="X-Auth-Token: <keystone token>"export HEADER="Content-Type: application/json"export ACCEPT="Accept: application/json"export HEADER_MERGE="Content-Type: application/merge-patch+json"export NAME="$CLSUUID-<type of addon>" export DU_FQDN="airctl-1-1877939-840.pf9.localnet"export TOKEN="gAAAAABiKFTKn9_5QMsGByTWYDVEcQWPC_tX8imwnFwCYeN0EtVEN4M4_pRGeixTzfOBzhkymzu8GpmJrK-2evuwT-PdfGnTMEPV4kCC2zL4sIT1CWlQBe6ciI3EEzVBnNOFp6l1UVKkkkT3ipqVrRYwe0mPiKdEHs9DbAv46S6ulsmwxQSvFn2U4rx1xKeyE47S-RDyDGOR"export PROJECT_ID="385326ea470b4ca4bbb41aafd11df6e6"export HEADER="Content-Type: application/json"export ACCEPT="Accept: application/json"export HEADER_MERGE="Content-Type: application/merge-patch+json"export CLSUUID="ddb010af-7cbb-42c5-9568-cc7f60f2f955"export NAME="$CLSUUID-metrics-server"- Create metric.json file. Example metric.json with the override parameters introduced which can be updated.
{ "kind": "ClusterAddon", "apiVersion": "sunpike.platform9.com/v1alpha2", "metadata": { "name": "8e751b29-153c-47d1-a8b6-f86d18d52ccd-metrics-server", "namespace": "default", "uid": "a10d4290-8308-42ed-a607-7ec5b9196b4a", "labels": { "sunpike.pf9.io/cluster": "8e751b29-153c-47d1-a8b6-f86d18d52ccd", "type": "metrics-server" } }, "spec":{ "clusterID":"8e751b29-153c-47d1-a8b6-f86d18d52ccd", "version":"0.5.0", "type":"metrics-server", "override":{ "params": [ { "name": "metricsCpuLimit", "value": "100m" }, { "name": "metricsMemoryLimit", "value": "300Mi" } ] }, "watch":true } } curl -X PATCH -H "X-Auth-Token: $TOKEN" -H "$HEADER_MERGE" -H "$ACCEPT" -d "@metric.json" https://$DU_FQDN/qbert/v4/$PROJECT_ID/sunpike/apis/sunpike.platform9.com/v1alpha2/namespaces/default/clusteraddons/$NAME -v --insecureUsing Kubectl
- Root login to the DU VM.
$ kubectl --kubeconfig /etc/sunpike/kubeconfig edit clusteraddons 8e751b29-153c-47d1-a8b6-f86d18d52ccd-metrics-server- Update the metricsMemoryLimit & metricsCpuLimit accordingly.
apiVersionsunpike.platform9.com/v1alpha2kindClusterAddonmetadata creationTimestamp"2022-02-23T08:40:07Z" finalizersaddons.pf9.io labels sunpike.pf9.io/cluster8e751b29-153c-47d1-a8b6-f86d18d52ccd typemetrics-server name8e751b29-153c-47d1-a8b6-f86d18d52ccd-metrics-server namespacedefault resourceVersion"207990" uida10d4290-8308-42ed-a607-7ec5b9196b4aspec clusterID8e751b29-153c-47d1-a8b6-f86d18d52ccd override paramsnamemetricsCpuLimit value100mnamemetricsMemoryLimit value200Mi typemetrics-server version0.5.0 watchtruestatus healthytrue lastCheckednull phaseInstalledkindListmetadata- On the master node, edit the pf9-addon-operator deployment to use the 3.2.3 addon image. After few seconds/mins, we can see the metrics-server pod with the desired values.
Note: Only updating the deployment to use the 3.2.3 addon image will also be sufficient. No need of step 1 or 2. As in that case, we have given some default value of 100m for metrics-server-nanny container/pod and the --cpu parameter.
Verified with the following API flags and YAML.
--audit-policy-file=/var/opt/pf9/kube/apiserver-config/audit-policy.yaml,--audit-log-path=/var/opt/pf9/kube/audit/audit.logapiVersionaudit.k8s.io/v1kindPolicyrules# do not log requests to the following levelNone nonResourceURLs"/healthz*""/logs""/metrics""/swagger*""/version"# limit level to Metadata so token is not included in the spec/statuslevelMetadata omitStagesRequestReceived resourcesgroupauthentication.k8s.io resourcestokenreviews# extended audit of auth delegationlevelRequestResponse omitStagesRequestReceived resourcesgroupauthorization.k8s.io resourcessubjectaccessreviews# log changes to pods at RequestResponse levellevelRequestResponse omitStagesRequestReceived resourcesgroup"" # core API group; add third-party API services and your API services if needed resources"pods" verbs"create" "patch" "update" "delete"# log everything else at Metadata levellevelMetadata omitStagesRequestReceivedEnhancements & Updates
Procedure
- During cluster creation, pass an integer argument to *_certExpiryHrs *within the cluster creation payload. Example: has _"certExpiryHrs":36.
"name""na-test-01" "masterNodes""33159359-afdd-485a-9379-1cfc38fc9bbd" "allowWorkloadsOnMaster"true "containersCidr""10.20.0.0/22" "servicesCidr""10.21.0.0/22" "mtuSize"1440 "certExpiryHrs"36 "privileged"true "appCatalogEnabled"false "nodePoolUuid""88858a3a-8a6c-4bd9-a985-b7c683b31187" "kubeRoleVersion""1.20.15-pmk.2100" "calicoIpIpMode""Always" "calicoNatOutgoing"true "calicoV4BlockSize""26" "calicoIPv4DetectionMethod""first-found" "networkPlugin""calico" "runtimeConfig""" "etcdBackup" "storageType""local" "isEtcdBackupEnabled"1 "storageProperties" "localPath""/etc/pf9/etcd-backup" "intervalInMins"1440 "tags" "pf9-system:monitoring": "true" # curl -X GET -k -H "X-Auth-Token: $TOKEN" -H "Content-Type: application/json" https://<DU_FQDN>/qbert/v3/<PROJECT_ID>/cloudProviders- Create a cluster using the above payload:
# curl -X POST -k -H "X-Auth-Token: $TOKEN" -H "Content-Type: application/json" -d "@<FILENAME>.json" https://<DU_FQDN>/qbert/v3/<PROJECT_ID>/clusters # curl -X POST -k -H "X-Auth-Token: $TOKEN" -H "Content-Type: application/json" -d "@cluster1.json" https://airctl-test.pf9.localnet/qbert/v3/0e1a916b57794498b43d138d9e8f2f02/clusters{"uuid":"38ccdeb2-5381-48a7-983e-b50821e2ea97"}- Once the cluster is ready, generate Kubeconfig for user with client-certificate:
# curl -k --url https://<DU_FQDN>/qbert/v3/<PROJECT_ID>/kubeconfig/<CLUSTER_UUID>?force_cert_auth=true --header "X-Auth-Token: $TOKEN" --header 'content-type: application/json' # curl -k --url https://airctl-test.pf9.localnet/qbert/v3/0e1a916b57794498b43d138d9e8f2f02/kubeconfig/38ccdeb2-5381-48a7-983e-b50821e2ea97?force_cert_auth=true --header "X-Auth-Token: $TOKEN" --header 'content-type: application/json'- Verify the validity of client certificate with the following command:
# cat <kubeconfig.file> | yq '.users[0].user.client-certificate-data' | base64 -d| openssl x509 -textValidation
- Certificate expiry obtained from Decoded PEM certificate:
# cat client-cert.txt | base64 -d | openssl x509 -noout -datesnotBefore=Mar 22 12:32:38 2022 GMTnotAfter=Mar 24 00:33:08 2022 GMT- Check_ certExpiryHrs _parameter set value for the cluster:
# curl -X GET -k -H "X-Auth-Token: $TOKEN" -H "Content-Type: application/json" https://<DU_FQDN>/qbert/v3/<PROJECT_ID>/clusters/<CLUSTER_UUID> | jq | grep certExpiryHrs # curl -X GET -k -H "X-Auth-Token: $TOKEN" -H "Content-Type: application/json" https://airctl-test.pf9.localnet/qbert/v3/0e1a916b57794498b43d138d9e8f2f02/clusters/38ccdeb2-5381-48a7-983e-b50821e2ea97 | jq | grep certExpiryHrs % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 2769 100 2769 0 0 7025 0 --:--:-- --:--:-- --:--:-- 7027 "certExpiryHrs": 36,What's new 2022-01-14 Platform9 Edge Cloud 5.3 LTS Patch #11
Platform9 Kube Version: 1.20.11-pmk.2038
Airctl Release Build Version: v-5.3.0-1806225
Bug Fixes
What's new 2021-12-22 Platform9 Edge Cloud 5.3 LTS Patch #10
Platform9 Kube Version: 1.20.11-pmk.2032
Bug Fixes
With patch 10, we are now deploying the autoscaler by default with a replica size of 2. This also means that we don’t deploy the coredns pod anymore. It will be entirely managed by the DNS autoscaler.
The DNS autoscaler is deployed by default along with CoreDNS.
The default autoscaling params are as follows:
MinReplicas = 1
MaxReplicas = 10
PreventSinglePointFailure = true (On nodes with 2 or more nodes, ensures at least 2 CoreDNS replicas)
NodesPerReplica = 16
CoresPerReplica = 256
Note that the values of both coresPerReplica and nodesPerReplica are floats. The idea is that when a cluster is using nodes that have many cores, coresPerReplica dominates. When a cluster is using nodes that have fewer cores, nodesPerReplica dominates. The fields will schedule 1 replica per X nodes (16 in the example shown above). This is all bounded by the MinReplicas and Max fields. So if you want 2 replicas, even though you have 32 or less nodes (assuming NodesPerReplicas is 16), set MinReplicas to 2. The MaxReplicas would prevent it from scheduling too many, even on a setup with many many nodes (or cores).The default polling period is set to 5 mins.
If you wish to change the CoreDNS or Autoscaling params from the defaults, you can to modify the CoreDNS ClusterAddon resource via the Qbert Sunpike API:
$DU_FQDN/qbert/v4/$PROJECT_ID/sunpike/apis/sunpike.platform9.com/v1alpha2/namespaces/default/clusteraddons/$CLUSTER_UUID-coredns
{ "apiVersion": "sunpike.platform9.com/v1alpha2", "kind": "ClusterAddon", "metadata": { "labels": { "sunpike.pf9.io/cluster": "<CLUSTER_UUID>", "type": "coredns" }, "name": "<CLUSTER_UUID>-coredns", "namespace": "default" }, "spec": { "clusterID": "<CLUSTER_UUID>", "override": { "params": [ { "name": "CoresPerReplica", "value": "512" }, { "name": "NodesPerReplica", "value": "64" }, { "name": "MinReplicas", "value": "3" }, { "name": "MaxReplicas", "value": "5" }, { "name": "PollPeriodSecs", "value": "300" } ] }, "type": "coredns", "version": "<COREDNS_VERSION>", "watch": false }As an example, save the above JSON example to a file "coredns.json". Please note that the above example changes the defaults of CoresPerReplica, NodesPerReplica, MinReplicas, and MaxReplicas. Do not change these fields if you wish to use the defaults.
An example PATCH call:
curl -k -X PATCH https://localhost/qbert/v4/f802075e3bb4464fb1072fdb3395ed3f/sunpike/apis/sunpike.platform9.com/v1alpha2/namespaces/default/clusteraddons/f2f65e9c-9e41-45c1-a536-b0333e25320e-coredns -H "X-Auth-Token: $TOKEN" -H "Content-Type: application/strategic-merge-patch+json" -d "@coredns.json"You can verify the ClusterAddon was modified successfully by making a GET call to:
https://$DUFQDN/qbert/v4/9d75c963d90b4d659eb541a655608839/sunpike/apis/sunpike.platform9.com/v1alpha2/namespaces/default/clusteraddons
Verifying the status phase for the Addon shows "Installed". You should also see a kube-dns-autoscaler Deployment in your cluster, along with the coredns pods scaled.
What's new 2021-12-14 Platform9 Edge Cloud 5.3 LTS Patch #9
Platform9 Kube Version: 1.20.11-pmk.2013
Enhancements & Updates
As an example, below will deploy whereabouts, and also the ip-reconciler CronJob on a 3 minute schedule only on nodes with the label "foo=bar".
apiVersionplumber.k8s.pf9.io/v1kindNetworkPluginsmetadata namenetworkplugins-sample11spec # Add fields here plugins hostPlumber nodeFeatureDiscovery multus whereabouts ipReconcilerSchedule"*/3 * * * *" ipReconcilerNodeSelector foobar sriovBug Fixes
- quay.io/operator-framework/configmap-operator-registry latest c22c74d5b16d
- nfvpe/sriov-device-plugin latest d5ce5066357b
- nginx latest 7ce4f91ef623
- nfvpe/sriov-cni latest 6ac3016f3d1b
- platform9/whereabouts latest a7b49560761b
- xagent003/whereabouts latest a7b49560761b
ETCD_SNAPSHOT_COUNT=10000
ETCD_QUOTA_BACKEND_BYTES=6442450944"
ETCD_AUTO_COMPACTION_MODE="periodic"
ETCD_AUTO_COMPACTION_RETENTION=240h"
- The password needs to be passed in
single quotesthrough cli. - There are no validation for the password in airctl if passed through cli, so they should set it according to instructions/validations we see on UI.
- When
airctl update-admin-passwordis run without a password passed in, a new random password is generated.