Search

Frequently Asked Questions

Difference Between MajorUpgradeVersion, MinorUpgradeVersion and PatchUpgradeVersion

VLAN Not Getting Attached to the VF

Host stays in disconnected state with error "ERROR - Could not parse out socket information from /proc/cpuinfo, defaulting to 0"

etcd fails to start with "failed to read WAL, cannot be repaired"

Master node in schedulingDisabled state after updating "timeoutSeconds" for calico

How Can I Check The Current Disk Space Consumption By ETCD Database?

What is Qbert API? How Can I Use Qbert API?

Can One Add a Second Master Node to an Already Configured Single Master Node Cluster?

Draining a Kubernetes Node

Can we use Docker Storage Driver in loop-lvm Mode on PMK Cluster?

Kubernetes Event Disappears After 60mins

Changes to Pod Eviction Timeout Are Not Taking Into Effect

Can Hostname be Used To add a Node to the Cluster With pf9ctl CLI?

Can the Kubeconfig Token Validity be modified?

Why is it Necessary to Annotate Autoscaler Deployment With IAM Policy in AWS Cluster?

Can Hostnames be used Instead of IP Addresses for Kubernetes Nodes in the Cluster?

Is DNS Horizontal Autoscaling Supported?

Are Spot Instances Supported on AWS Cloud Provider?

Procedure to Cordon or Drain a Single Master Node Cluster for Maintenance Purpose?

Why is CoreDNS Resolving via Upstream Google DNS (8.8.8.8)?

Can Platform9 Take Over Management of an Existing Cluster

Can Nodes from Multiple Locations or Cloud Providers be added in a Single Cluster?

Recovery of a Single Master Node Within the Cluster

Is It Possible To Update/Modify Public SSH Key After Cluster Deployment?

Rollback/Recovery Procedure in case Kubernetes Cluster Upgrade Fails

Does PMK Cluster Upgrade Include Both Kubernetes and ETCD?

Does Platform9 Managed Kubernetes Support Multi-OS Ubuntu Host Versions in a Cluster?

Kubelet Port 10255 is not available on PMK clusters

Endpoints Required For PMK Cluster Creation

Can one Attach a New Node to the Cluster and Authorize it to be a Master Node Without Deleting and Rebuilding the Existing Cluster

Is EFK Logging Supported in Platform9 Managed Kubernetes?

Kubernetes API Endpoint Access Loss During Cluster Upgrade

Prioritize a Specific Master Node Running Keepalived Service Such That the VIP Will Raise on it if the Node Becomes Available

Does Qbert API Follow Cluster API Based Extensions Defined by the Cluster Lifecycle SIG?

Does Kube-API Server Allow Requests on Port 8080?

Procedure to Allocate More Storage to Root Filesystem of a Pod

"logging before flag.Parse" Errors Observed For metrics-server POD.

API Service Preventing Namespaces from Deletion

Retrieve Node Configuration for Clusters Created Using Hostname

Changing Node IP Address/Interface

Pod is Stuck in the Terminating State

Unable To Add Node Due To "Device or resource busy"

What is the Maximum Nameservers on a PMK Node?

Does Metrics Server Need Requests and Limits?

Why Namespace Deletion is Stuck Due to Finalizers

Impact, Validation and Repair Action if pf9-nodeletd and pf9-comms Services are Down

Disable Continuous Pings From Nodes To Google DNS (8.8.8.8)

Impact If Worker Node Goes Down Where 'calico-kube-controllers', 'metrics-server', 'dashboard-metrics-scrape' and 'kubernetes-dashboard' Pods Are Running

Nodes Are Experiencing Disk Pressure

How to Check Certificate Expiry of the certificates deployed by Platform9 Managed Kubernetes?

AWS Launch Configuration Getting Deprecated.

Does Platform9 Support Enabling Kernel Parameter kernel.hung_task_panic?

List of Platform9 Public IPs and Repos to Whitelist in Firewall.

etcd backup fails with "error: the following arguments are required: cron_job_name"

Upgrading Kubernetes Cluster To v1.21 Fails With Error "/usr/bin/rm: No such file or directory"

Node went into NotReady state after updating CPU manager policy

How To Get Kubernetes Config Information And Token Details Using Kubectl Command

Scaling master nodes fails with error "waiting for etcd members to start"

Container keeps running on node after Kubernetes pod deletion

Not Able To Add New Nodes to PMK Cluster

ETCD Backup Job Warnings Seen In Kubelet Logs

How To Use Custom PVC For Prometheus Monitoring Addon From The Management Plane UI?

Cluster Upgrade From 1.20 to 1.21 Is Getting Failed Due To ETCD Corruption.

After Cluster Upgrade from 1.20 to 1.21, The Worker Nodes Are Losing Custom Labels.

Disabling Pf9-monitoring Addon Deletes Resources From The Monitoring Stack

Reduce The Number Of Privilege Escalations Using Sudo While Executing PF9 Scripts/Commands In The Workload Nodes.

During The MetalLB Version Upgrade To 0.13.7, Service IPs From Custom IPAddressPools [2nd] Getting Re-assgined.

Calico-kube-controller Pod Restarts Frequently Due To OOM- Memory Exhaustion.

Nodelet Phase got Stuck at Cert Generation Phase due to no Response from Vault.

"Error [ERR_TL S_CERT_ALTNAME_INVALID]: Hostname/IP Does not Match Certificate's altnames:" Which Breaks the Communication to Management Plane from Node.

Certificate Generation Fails Since Host CA Validity Is Less Than The Amount Of TTL With Which Certificate is Attempted To Be Generated In Vault

Multiple old CA cert Files Observed on Host After Host CA Rotation

Pods Deployed Using Multus Plugin Fail to Have Container Added to Network After Upgrading to Whereabouts:v0.6-pmk-6

False Alerts Generated by Alertmanager for KubePersistentVolumeErrors

Wrong Interface/IP Assigned For Kube-apiserver

Kubectl Commands Throwing Error Metrics Server Currently Unable to Handle Requests

Kubectl Commands Throwing Error: Metrics Server Currently Unable to Handle Requests

Getting Errors on Reverting MetalLB Addon to v0.13.7 From v0.13.11 Default version in 1.26 K8s

Pf9ctl Utility Fails With Segmentation Fault

Requirement to Change Platform9 Management Plane FQDN.

Alerting System for Management Plane Health.

IP-Reconciler Skips Next Pods if it Fails to Add IP to a Pod from the List

Clarity and Serenity Pods in CrashLoopBackOff due to OOMKilled

Self-Service users not able to view PVCs in the WebUI

How to Modify Logrotation Period in Management Plane

`streamingConnectionIdleTimeout` via DynamicKubeletConfig is not being respected by Containerd

How to set GOMAXPROCS parameter for all node-exporter pods

The certificate being used by the k8s API server on port 443 is an untrusted certificate.

ETCD not Initialising Over Masters due to Active Firewall

Nodelet Initiating Stop Chain on Failure due to Third Party Webhook

How to Implement Hierarchical Namespaces in PMK Workload Cluster

Dual Stack Support in Workload Cluster

Prohibit Root Containers in PMK Clusters

Can a Specific Subnet be Excluded From Calico IPPools?

Nodelet Phase failing at Cert Generation Phase.

Kubeconfig Invalidated after Kubernetes Cluster Upgrade of PMK Clusters

Node not Joining Cluster

Enable ETCD Encryption in Existing PMK Clusters

[Bug] Disabling pf9-luigi Addon Deletes Cert-Manager CRDs

Disabling MetalLB Addon in "InstallAddonError" Status Fails to Delete Resources on the Cluster

Key Parameters Evaluated During a Cluster Upgrade

ETCD Backup Error in the Management Plane UI

MetalLB ClusterAddon in Error State

Unable to Modify Existing default Apiserver Flags

Custom CertManager Pod in CrashLoopBackoff During Luigi Installation

Grafana Link in the UI is Redirecting to Management Plane Home Page

CertManager Installed as Part of Luigi Installation Impacting Custom CertManager

The Grafana Link For Clusters is Not Working, Lands Back on the UI's Manager page

DU VM is not Starting up due to Memory Exhaustion Over DU Host

Nodelet Failing at Configure Container Runtime Phase due to Unexpected Disk Mounts

Onboarding Fails for new Hosts on DU Having Dynamic Kubelet Configurations

CVE-2024-21626 for PMK 5.9.z Clusters

ALB as Load Balancer

Grafana UI Does not Load Metrics for 'Kubernetes' Dashboard for Clusters with Large Node Count

ETCD Recommendations for Platform9 Stack

How to set sysctl parameter for a privileged container?

KubeStateMetrics(KSM) Using Deprecated API version

Calico Kubeconfig Expires Causing Unauthorized Errors And Pods Creation is Failing

Etcdctl Command Fails if --cert, --cacert, --key flags Are Not Passed

etcd-backup job and etcd-backup-with-interval cronjob is not recreated after the cluster is upgraded to v1.22.9 if etcd encryption is enabled

How to set default EBS volumeType to GP3 for AWS cluster?

Node in NotReady state and nodelet phases stuck at "Configure and start kube-proxy" stage

How to Safely Shutdown and Restart a Kubernetes Cluster

API call for listing nodes in the "Unauthorized" state

Integrating Custom Container Registry with PMK Clusters

Hostagent Certificate Rotation Failing due to Comms Connection Failures

Enhancement of Logging Capabilities for Platform9 Web UI and API Access

Disable Platform9 Managed Monitoring Deployments

Tenant Switching Through UI Loads After Significant Delay

Password-based Kubeconfigs Should be Functional Even When the Management Plane is Unavailable

Kubectl Exec Command is Failing Intermittently With i/o Timeout Error

Solutions

How Tos

Internal Only

Templates

etcd fails to start with "failed to read WAL, cannot be repaired"

Problem

Master node is in NotReady state as etcd fails to start.

Environment

Platform9 Managed Kubernetes - v5.6.0 and Higher

Cause

Node filesystem was built with incorrect filesystem for etcd data.

Resolution

Rebuild the cluster with supported filesystem.

Additional Information

Below error is seen in etcd logs:

Javascript
    
{"level":"warn","ts":"2023-02-26T05:48:18.728Z","caller":"wal/file_pipeline.go:79","msg":"failed to preallocate space when creating a new WAL","size":64000000,"error":"no space left on device"}{"level":"fatal","ts":"2023-02-26T05:48:19.061Z","caller":"etcdserver/storage.go:108","msg":"failed to read WAL, cannot be repaired","error":"no space left on device","stacktrace":"go.etcd.io/etcd/etcdserver.readWAL\n\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdserver/storage.go:108\ngo.etcd.io/etcd/etcdserver.restartNode\n\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdserver/raft.go:533\ngo.etcd.io/etcd/etcdserver.NewServer\n\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdserver/server.go:480\ngo.etcd.io/etcd/embed.StartEtcd\n\t/tmp/etcd-release-3.4.14/etcd/release/etcd/embed/etcd.go:214\ngo.etcd.io/etcd/etcdmain.startEtcd\n\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdmain/etcd.go:302\ngo.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdmain/etcd.go:144\ngo.etcd.io/etcd/etcdmain.Main\n\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdmain/main.go:46\nmain.main\n\t/tmp/etcd-release-3.4.14/etcd/release/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}(END)
Copy

Last updated on

Was this page helpful?

On This Page

etcd fails to start with "failed to read WAL, cannot be repaired"Problem Environment Resolution Additional Information