PMK Release 5.6 Release Notes

The Platform9 Managed Kubernetes (PMK) version 5.6 release is now available with support for Kubernetes 1.22 and 1.23 versions. The 5.6 release aims to bring multifold improvements to the user experience with our brand new Cluster API(CAPI) based lifecycle management for AWS and EKS clusters (in beta). This release also adds our brand new Open-Source project Arlon (also in beta), which bridges the gap between workload and infrastructure management with a single unified architecture. This release continues to upscale Platform9's commitment to Open-Source by contributing new products and features with deep Kubernetes integration and building Platform9 on Open-Source technologies.

All clusters running Kubernetes 1.20 must be upgraded to Kubernetes 1.21(PMK 5.5) prior to upgrading to Kubernetes 1.22(PMK 5.6).

Kubernetes 1.20 has reached End of Life as of 2021-02-28. New clusters should be built on 1.23.

Kubernetes 1.21 has reached End of Life as of 2022-06-28. New clusters should be built on 1.23.

PMK 5.6.0 Release Highlights (Released 2022-09-23)

AWS and EKS Cluster Lifecycle management based on Kubernetes Cluster API (In Beta)

We believe open-source is the present and the future and as part of this Platform9 5.6 release brings AWS and EKS cluster lifecycle management using Cluster API. You get a better way to create native Kubernetes clusters on AWS EC2. You can also create, manage, update and upgrade Kubernetes clusters in PMK using Amazon Elastic Kubernetes Service (EKS).

The AWS and EKS Kubernetes cluster creation and management using Cluster API is a beta feature in PMK 5.6. We are actively working on making this feature GA over our next few releases.

With 5.6 release we have added Worker Node Groups for CAPI based AWS CAPI clusters and EKS CAPI clusters. Some of the features enabled with Node Groups are:

  • Create multiple node groups
  • Edit node groups.
  • Enable Auto-scaling.
  • Configure to use Spot Instances.
  • Select Availability Zones.
  • Add bulk labels and taints.
  • Configure node group update strategy.

Infrastructure space in App Switcher

The new Infrastructure space in the App Switcher aims to simplify the Infrastructure Management for our users. The new dedicated space is to helps PMK Administrators create and manage different types clusters and associated resources such as Cloud Providers, Nodes, RBAC Profiles etc.

Dedicated Cluster pages for different type of clusters

New CAPI Cluster Dashboards

RHEL 8.5 and 8.6 support for BareOS clusters

PMK has added support for Red Hat Enterprise Linux(RHEL) 8.5 and 8.6 for BareOS clusters. This is supported only for 1.22 and 1.23 Kubernetes clusters.

Platform9 CLI

The 1.18 pf9ctl release is now available and can be installed by running the following command.

Bash
Copy

Profile Engine with Arlon (In Beta)

Starting the PMK 5.6 release, Platform9 has expanded on it's Profile Engine capabilities by adding support to open source Arlon. Arlon is an open source, declarative, policy driven framework for scalable management of Kubernetes cluster upgrades and security updates using the principles of GitOps and Infrastructure as Code (IaC). Arlon is built using open source ArgoCD and ClusterAPI.

The Arlon integration into PMK is still a beta feature in PMK release 5.6. We are actively working on making this feature GA in upcoming PMK releases.

Arlon integration with PMK in 5.6 does two things:

  1. Arlon 0.3.0 is deployed as a fully managed add-on along with PMK.
  2. Platform9 has released ArlonCTL, a new command line utility to create Arlon profiles, and create EKS clusters, using the managed Arlon add-on that ships with PMK.

For more information on Arlon, and PMK integration with Arlon read Profile Engine with Arlon

Enhancements & Updates

Added Added Cluster Types. PMK now supports 3 types of clusters:

  • CAPI - Clusters provisioned and managed by Platform9 Cluster API Integration.
  • Legacy - Clusters provisioned and managed by Platform9 Qbert API.
  • Imported - Clusters provisioned externally from managed service such EKS, AKS & GKE and are imported into Platform9 SaaS Management Plane.

Added Added support for RHEL 8.5 and 8.6.

Added Added flags to pass etcd backup configuration options in node bootstrap command.

Added Added flag to skip node authorization in prep-node.

Added Similar to docker setups, users can now setup container image registry mirror for containerd based nodes.

Added Added TLS encryption of traffic between etcd and kube-apiserver. Etcd server now works off secure port 2379 for Kubernetes versions 1.22 and later.

Enhanced Keepalived has been upgraded to newer version. This should address issue of VIP dropping from the expected network interface.

Enhanced Updated Profile Engine is now enabled for all clusters by default.

Enhanced Improved error messages.

Enhanced Improved dpkg lock check.

Enhanced containerd will be the default container runtime for PMK clusters.

Note that docker runtime will be removed in the upcoming versions and users are strongly advised to move to containerd runtime.

Bug Fixes

Fixed Resolved an issue where the etcd backup task status shows an incorrect error state status.

Fixed Fixed an issue on containerd setups, where not all containers were stopped when a pf9-kube stack upgrade or restart was attempted.

Fixed Resolved an issue triggering user alerts with node names as undefined.

Fixed Fixed an issue causing the cluster nodes to show up as disconnected after a reboot.

Fixed Fixed a FileSystem Alert Prometheus rule that was firing a false alert.

Fixed Fixed an issue that was causing Catapult to send alarms for clusters that no longer exist.

Fixed Resolved an issue with the ECO agent deployment on AKS, GKE clusters of versions 1.21+ was failing.

Fixed Fixed issue with nginx container image pulls when grafana is deployed by the Platform9 monitoring feature.

Fixed Fixed an issue where the debug logs not enabled for containerd runtime if cluster.debug is set to true.

Fixed Resolved an issue where Keepalived Loses VIP on the Interface. Keepalived has been upgraded to newer version. This should address issue of VIP dropping from the expected network interface.

Fixed Fixed an issue with the application switcher becoming unresponsive after being clicked multiple times.

Fixed Fixed an issue where the text box headers are hard to read because of the white highlighting and also don’t match with the Name text box.

Fixed Fixed an issue where removing a Helm repository would prevent deployed apps from being uninstalled.

Fixed Fixed the Nodes dashboard to default to All clusters

Fixed Fixed an error where offline and disconnected clusters would cause errors to be displayed in the UI.

Package Updates

The following packed components have been upgraded in latest v1.23.8 Kubernetes version:

ComponentVersion
CALICO3.23.5
CORE-DNS1.8.6
METRICS SERVER0.5.0
METAL LB0.12.1
KUBERNETES DASHBOARD0.12.1
CLUSTER AUTO-SCALER AWS1.23.1
CLUSTER AUTO-SCALER AZURE1.13.8
CLUSTER AUTO-SCALER CAPI1.23.1
FLANNEL0.14.0
ETCD3.4.14
CNI PLUGINS0.9.0
KUBEVIRT0.55.0
KUBEVIRT CDI1.51.0
KUBEVIRT ADDON0.55.0
LUIGI0.4.0
MONITORING0.57.0
ROFILE AGENT2.0.1
METAL31.1.1

Please refer to the Managed Kubernetes Support Matrix for v5.6 to view all currently deployed or supported upstream component versions.

Known Issues

CAPI AWS & EKS clusters

Known Issue Node with CAPI based clusters are not listed in UI in the node list page, node details page, infrastructure page.

Reusing the name of a CAPI cluster after deletion of a previous cluster leads to errors. Users are recommended to use unique names for clusters avoiding even the names which were used for clusters deleted in the past. In case users are unable to avoid such name reuse, contact Platform9 support for possible options of resolution.

Known Issue In rare instances, the CAPI based AWS clusters can fail to be provisioned(control plane not ready after 2 hours of creation) due to AWS Cloud-init issues. Current workaround is to delete the cluster and create a new one with a different name.

Known Issue Kubernetes may list deleted nodes associated with the cluster even after deleting machinepool node group in a CAPI based AWS cluster. Note that the EC2 instances of this node group should be terminated. You can run kubectl delete node <node_name> to remove such orphaned node records from Kubernetes.

Known Issue For CAPI based clusters, labels with special characters like - cannot be associated to the cluster. A known bug leads to only one part of the label being used when applying it to the cluster.

Known Issue For CAPI based clusters, downloading multiple kubeconfigs for a given user will invalidate the previous kubeconfigs of that user. Only the latest downloaded kubeconfig will be valid for that user account to use. User are recommended to use caution when sharing user accounts for such clusters.

Known Issue Users should not make changes to EKS clusters created using PMK, from AWS Console. Platform9 manages the lifecycle of the EKS control-plane and EKS nodes. Making any changes from AWS Console might result in undesired effects or render cluster in non-functional state.

Known Issue The Kubernetes dashboard is not accessible for CAPI based clusters by uploading the kubeconfig, because of an upstream issue where dashboard does not support OIDC-based kubeconfigs.

As a workaround, authenticate with the ID Token.

  1. Download and open the kubeconfig of a cluster.
  2. Copy the value of the id_token field.
  3. In the dashboard, select "token" authentication and paste the value in the form.

Note: refreshing is not supported by the dashboard, this means you lose access after the token expires (10-20 min)

To refresh the ID token, simply run a kubectl command with it. kubectl will replace the ID token in the kubeconfig with a valid one if it has expired. Then afterwards follow the steps above again.

Known Issue EKS cluster created with with MachinePool / MachineDeployment type of worker node groups with desired node count can sometimes get stuck in “provisioned” state, for more than an hour after control plane is ready.

Current workarounds:

  • Machine Deployment type Node Group: Scale down the affected node group to 0 and then scale it back up to desired count.
  • Machine Pool type Node Group: Delete the affected node group and add new node group to the cluster.
  • Create an EKS cluster with MachinePool / MachineDeployment with replica count as 0. Once the cluster is healthy change the replica count to desired value.

Known Issue In some instances a CAPI cluster when deleted during provisioning phase can get stuck into deleting phase. Please contact Platform9 support for possible options of resolution.

Other known Issues

Known Issue Calico IPAM is only supported when using Calico CNI.

Known Issue EKS, AKS, or GKE Cluster Import “401 Unauthorized” Notification and Empty Dashboards.

If an AWS Cloud Provider is configured to import clusters without the correct identity being added to the target cluster, Platform9 will be unable to access the cluster.

It's important to note that if you have used a Cloud Provider to register an EKS, AKS, or GKE cluster that was created with IAM user credentials, which no longer have access to the EKS, AKS, or GKE K8s clusters, Platform9 will fail with an 401 unauthorized error until that IAM user is given access to the K8s cluster.

View the EKS documentation here to ensure the correct access has been provisioned at for each imported cluster. https://aws.amazon.com/premiumsupport/knowledge-center/amazon-eks-cluster-access/

Known Issue Platform9 monitoring won't work on ARM-based nodes on EKS, AKS, or GKE.

Known Issue Docker cache is not deleted when PMK clusters are migrated from docker to containerd.

Current workaround is to delete /var/lib/docker manually.

Known Issue Cluster upgrade attempt is blocked on UI post a cluster upgrade failure due to nodes being in a converging/not converged state.

Known Issue Node in disconnected state prevent k8s batch cluster upgrades.

Known Issue Hostpath-csi-driver installs to to default namespace only.

Known Issue Kubelet authorization mode is marked set to AlwaysAllow instead of Webhook.

Known Issue UI throws error when using SSO with Azure AD and passwordless logins.

Known Issue PMK Cloud provider created directly in Sunpike cannot be used to create qbert clusters. qbert cloud providers will work to create both qbert and sunpike clusters. But cloud providers created directly in sunpike CANNOT be used to create qbert clusters. Please use the appropriate one based on your needs.

Known Issue The Arlon command 'arlonctl clusterspec update' supports updating the 'node count' or 'kube version' for a clusterspec. However, the update for kube version will not trigger an application sync in ArgoCD to update the clusters, since ArgoCD is configured to ignore the difference in values of the 'kube version' field between the desired manifest and live manifest of an app. The workaround is to manually sync the app from ArgoCD UI.

ArgoCD as a Service

ArgoCD 5.6 Release Notes

Metal³

Platform9 Managed Bare Metal With Metal³ Release Notes

KubeVirt

Platform9 Managed KubeVirt 5.6 Release Notes

PMK 5.6.8 Patch Update (Released 2023-06-01)

The upgrade path from PMK v5.6.8 to PMK v5.7.1 is not available right now and is going to be delivered via the PMK v5.7.2 patch soon. Users looking to immediately upgrade to PMK v5.7.1 should directly upgrade the DU from PMK v5.6.4 to PMK v5.7.1.

Added Added the support for GP3 type EBS Volumes with IOPS & Throughput configuration on AWS Qbert Clusters. With this change in place, all new AWS clusters will have a default volume type as gp3 with default Throughput: 125 and Iops: 3000. For custom throughout/Iops users can use the API with payload options ebsVolumeThroughput/ebsVolumeIops. Follow the documentation here : https://platform9.com/docs/kubernetes/migrate-aws-qbert-gp2-volumes-to-gp3

Added Added changes in response to traffic redirection from the older k8s.gcr.io to registry.k8s.io. https://kubernetes.io/blog/2023/03/10/image-registry-redirect/

Upgraded Upgraded calico version to 3.23.5.

Fixed Fixed a bug in UI which prevented the Self-service users with access to a particular namespace from being able to view resources in that namespace.

Fixed Fixed a bug on pf9ctl which caused the check to fail when the package python3-policycoreutils pre-existed on the nodes.

Fixed Fixed an issue due to which certificate generation fails if CA validity is less than the amount of TTL with which the certificate is attempted to be generated in Vault. A DU upgrade to PMK v5.6.8 and cluster upgrade to the latest PMK build version is required for this to take effect.

Fixed Fixed a bug which caused older static pod path to be used after upgrading the cluster from 1.22.9 to 1.23.8 and kubelet restarting continuously.

Fixed Fixed a bug which prevented Sunpike CA rotation if the CA had the expiry of more than a year.

Fixed Fixed a bug in pf9ctl which caused an unintended node to be decommissioned due to wrong choice of node uuid based on IP obtained from hostname -I.

Fixed Fixed an issue that caused some pods to be stuck in a Terminating state on RHEL nodes, due to missing sysctl parameter /proc/sys/fs/may_detach_ mounts set to 1.

Fixed Fixed an issue that caused ETCD backup failure on PMK v5.6.3.

Fixed Fixed an issue which caused the calico Typha pods on clusters were getting OOM killed due to resource constraints. Updates are made to current limits and Typha auto scaler ladder.

Fixed Fixed a bug which prevented login to qbert/resmgr database via mysqld-exporter pod.

Fixed Fixed a bug in UI which prevented the kube-apiserver flags to be added to apiserver configuration during cluster creation from UI.

Fixed Fixed issue that caused calicoctl binary being non executable.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
  Last updated by Ben White