etcd fails to start with "failed to read WAL, cannot be repaired"

Problem

  • Master node is in NotReady state as etcd fails to start.

Environment

  • Platform9 Managed Kubernetes - v5.6.0 and Higher

Cause

  • Node filesystem was built with incorrect filesystem for etcd data.

Resolution

  • Rebuild the cluster with supported filesystem.

Additional Information

  • Below error is seen in etcd logs:

{"level":"warn","ts":"2023-02-26T05:48:18.728Z","caller":"wal/file_pipeline.go:79","msg":"failed to preallocate space when creating a new WAL","size":64000000,"error":"no space left on device"}
{"level":"fatal","ts":"2023-02-26T05:48:19.061Z","caller":"etcdserver/storage.go:108","msg":"failed to read WAL, cannot be repaired","error":"no space left on device","stacktrace":"go.etcd.io/etcd/etcdserver.readWAL<br>\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdserver/storage.go:108<br>go.etcd.io/etcd/etcdserver.restartNode<br>\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdserver/raft.go:533<br>go.etcd.io/etcd/etcdserver.NewServer<br>\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdserver/server.go:480<br>go.etcd.io/etcd/embed.StartEtcd<br>\t/tmp/etcd-release-3.4.14/etcd/release/etcd/embed/etcd.go:214<br>go.etcd.io/etcd/etcdmain.startEtcd<br>\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdmain/etcd.go:302<br>go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2<br>\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdmain/etcd.go:144<br>go.etcd.io/etcd/etcdmain.Main<br>\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdmain/main.go:46<br>main.main<br>\t/tmp/etcd-release-3.4.14/etcd/release/etcd/main.go:28<br>runtime.main<br>\t/usr/local/go/src/runtime/proc.go:200"}
(END)

Last updated