How to Save Current State of Cluster Using Cluster-info Dump Command.
Problem
A "cluster dump" typically refers to a process or action of collecting comprehensive diagnostic information and logs from a Kubernetes cluster. This is done to troubleshoot issues, investigate problems, or provide detailed information to the support team for debugging purposes. Cluster dumps are useful for understanding the state of a cluster and diagnosing problems that may arise during cluster operations.
Here are some common use cases and information that may be included in a cluster dump:
- Logging: Cluster dumps often include logs from various components of the Kubernetes cluster, such as control plane components (API server, etcd), worker nodes, and other add-ons (e.g., kube-proxy, kube-dns). These logs can help identify errors, warnings, and other relevant information.
- Resource Information: Information about cluster resources, including nodes, pods, services, and configurations, may be included in a cluster dump. This can provide insights into the current state of the cluster and resource allocation.
- Configuration: The cluster dump may contain configuration files and settings for various Kubernetes components. This can help identify any discrepancies or misconfigurations that could be causing issues.
- Cluster Metrics: Cluster dumps may include metrics and statistics about cluster performance and resource usage. This information can be valuable for performance analysis and optimization.
- Network Information: Details about networking configurations, routes, and network policies may be included. Network-related issues can often be diagnosed with this information.
- Event Logs: Kubernetes events and audit logs may be part of the cluster dump. These logs can provide a historical record of cluster activity and events.
- Cluster State: Information about the current state of objects in the cluster, such as pod statuses, node conditions, and API server status, can help diagnose problems and understand the overall health of the cluster.
Environment
- Platform9 Managed Kubernetes - v-5.4 and Higher
Procedure
Th below command will save the information from all namespaces including customer's namespaces]:
# kubectl cluster-info dump -A -o yaml --output-directory=/tmp/cluster-dump-$(date +%Y-%m-%d_%H-%M)
If there are sensitive/confidential applications running in the cluster it is recommended to avoid using "-A" in the above command which will take dump from all the namespaces including the application namespaces.
It is recommended to take clusterdump from the namespace which is having the issue. For that replace "-A" with "-n <affected_namespace>" in the above command.
The contents created as part of the cluster dump will be:
# ls -lsh /tmp/cluster-dump-2022-12-02_07-03
total 32K
0 drwxr-xr-x 2 root root 170 Dec 2 07:03 default
0 drwxr-xr-x 2 root root 170 Dec 2 07:03 kube-node-lease
0 drwxr-xr-x 2 root root 170 Dec 2 07:03 kube-public
0 drwxr-xr-x 4 root root 265 Dec 2 07:03 kubernetes-dashboard
4.0K drwxr-xr-x 20 root root 4.0K Dec 2 07:03 kube-system
24K -rw-r--r-- 1 root root 23K Dec 2 07:03 nodes.yaml
0 drwxr-xr-x 3 root root 212 Dec 2 07:03 pf9-addons
4.0K drwxr-xr-x 9 root root 4.0K Dec 2 07:03 pf9-monitoring
0 drwxr-xr-x 4 root root 248 Dec 2 07:03 pf9-operators
0 drwxr-xr-x 3 root root 212 Dec 2 07:03 platform9-system
# du -sch /tmp/cluster-dump-2022-12-02_07-03
16M /tmp/cluster-dump-2022-12-02_07-03
16M total
Additional Information
For reference: