Nodelet
What is a Nodelet?
A nodelet is a software agent that is installed and run on each node as a component of the Platform9 Managed Kubernetes (PMK) stack within a cluster. The nodelet agent provides multiple functions on both the Primary/Master and the worker nodes. This includes the installation and configuration of multiple Kubernetes services including etcd, containerd, Docker, networking, webhooks, and various other components.
Nodelet Phases
Generate Certificates
Role:
Generates prerequisites checks needed to install various certificates.
Prepare Kubeconfigs
Role:
Customizes the kubeconfigs needed to start the Kubernetes cluster.
Docker Configure
Role:
Installs and configures docker and containerd.
Docker Start
Role:
Installs and verifies running docker containers.
Etcd Configure
Role:
Verifies, configures, and runs etcd on the primary host server's file system.
Etcd Run
Role:
Starts and confirms the etcd service is running on the container.
Network Configure
Role:
Ensures that the Classless Inter-Domain Routing (or CIDR) configuration for flannel is up-to-date (It does not target other network plugins like Calico, Canal, or Weave).
CNI Configure
Role:
Configures the Container Network Interface (CNI).
Auth Webhook
Role:
Uses bouncer as a simple webhook endpoint server to validate/authenticate images created within the Kubernetes cluster (specifically, the admission controllers GenericAdmissionWebhook and the ValidatingAdmissionWebhook).
Misc Scripts
Role:
Responsible for composing the cloud provider config on the file systems of all nodes.
Kubelet Configure/Start
Role:
Starts and manages the proper configurations on Kubelets.
Kube Proxy Start
Role:
Starts and configures the kube-proxy service.
Wait for K8s Services
Role:
Starts and pauses various K8s services to ensure availability.
Label and Taint Node
Role:
Designates “master” or “worker” nodes. Additionally, taints workloads not allowed on master.
Uncordon Node
Role:
Marks nodes as schedulable using the kubectl uncordon node
command.
Deploy App Catalog
Role:
Configures and deploys the Monocular and Tiller services.
Configure/Start Keepalived
Role:
Configures and starts the keepalived service.
Deploy Luigi Operator
Role:
Activates the Luigi operator
Deploy KubeVirt
Role:
Deploys the KubeVirt operator in addition to its other custom resources.
Enable PF9 Sentry
Role:
Initiates and configures the pf9-sentry service within the platform9-system namespace.
Enable PF9 Add-on Operator
Role:
Starts and configures the pf9-addon-operator
service within the pf9-addons namespace.
Drain All Pods (Stop Only)
Role:
If invoked, this task drains the node before implementing a stop function on other tasks. When the _pf9-kube_
service begins draining the node, it executes a priority stop function. This ensures the task is prioritized over the stop function of other tasks.
CLI
Advantages of Using the CLI
Because a CLI does not utilize a graphical user interface (or GUI), it is oftentimes overshadowed by the more user-friendly, visual-based interfaces that a mouse and keyboard affords. What is not apparent is that behind the GUI are many of the same commands which drive the functionality of a program. The strength of the CLI is speed, efficiency, and customization with decreased memory consumption. In addition, It allows for experienced users to create scripts to automate repetitive tasks as well as chain command together to achieve a greater level of customization and capabilities than when using a single mouse click.
Many new users express the steeper learning curve as the primary downside of using the CLI. Additionally, there is less room for error, and understanding the large number of command options available to utilize can be daunting. New users can be stymied when trying to remember a command, its syntax, and the available flags and options it affords. Some relief is granted via the use of quick reference guides that are widely available. Users will often find that the ongoing usage of the CLI will increase productivity over time.
Caution should be exercised when running commands as the root user. Running an errant or malformed command can cause severe issues and damage the system, up to and including needing a full system restore. The only time clients should run commands as a root user are when configuring the underlying file system.
Best practice dictates creating secondary user(s) with limited permission sets. Additionally, backup copies of files or folder should be made before editing any important system configuration files or folders.
The following section specifies the nodeletd phase
related commands used to interact with the k8s stack via the CLI.
Nodelet CLI Syntax
/opt/pf9/nodelet/nodeletd phases [command]
Phases Help
The help flag defines the list of available options when running the nodelet phases
command.
/opt/pf9/nodelet/nodeletd phases --help
Commands related to phases related to bring up of k8s stack
Usage:
nodeletd phases [command]
Available Commands:
list Lists the phases and their index numbers to use with rest of commands
restart Restarts pf9 kube stack. Takes optional --phase param to allow restarting from the specific phase
start Starts pf9 kube stack. Takes optional --phase param to allow starting from the specific phase
status Checks the status of Platform9 Kube on this host. Takes optional --phase param to check the status of a specific phase
stop Stops pf9 kube stack. Takes optional --phase param to allow stopping till the specific phase
Flags:
-h, --help help for phases
Use "nodeletd phases [command] --help" for more information about a command.
Phases List
This command lists the available phases by passing the list option.
/opt/pf9/nodelet/nodeletd phases list
INDEX NUMBER FILE NAME STATUS CHECK
1 020-gen_certs.sh Generate certs / Send signing request to CA true
2 030-prepare_kube_configs.sh Prepare configuration false
3 040-docker_configure.sh Configure Docker false
4 045-docker_start.sh Start Docker true
Phases Stop
This command stops the pf9-kube
stack.
/opt/pf9/nodelet/nodeletd phases stop
Phases Start
This command starts the pf9-kube
stack.
/opt/pf9/nodelet/nodeletd phases start
Phases Restart
This command restarts the pf9-kube
stack.
/opt/pf9/nodelet/nodeletd phases restart
Phases Status
The verbose flag provides information on the condition and state of the pf9-kube
stack.
/opt/pf9/nodelet/nodeletd phases status --verbose
INDEX NUMBER FILE NAME PHASE STATUS
1 020-gen_certs.sh Generate certs / Send signing request to CA running
2 030-prepare_kube_configs.sh Prepare configuration N/A
3 040-docker_configure.sh Configure Docker N/A
4 045-docker_start.sh Start Docker N/A
Note: CLI output will contain info about the various phases that run before the following table is displayed. This info is also contained in the /var/log/pf9/kube/kube.log file.
Node Health
The curl command below provides an overview of the health of the specific node. The $TOKEN
refers to a temporary authentication token utilized to verify the service user, which removes the need for an interactive authentication method.
The DU
reference is in regard to the deployment unit that operates the platform9's server-side components. The $UUID
is the universal unique identifier for an object in the cluster. A sample output of the command is shown below.
curl -H "X-Auth-Token: $TOKEN" -H "Content-Type: application/json" https://$DU/resmgr/v1/hosts | jq '.[] | select(.extensions.pf9_kube_status.data.pf9_cluster_id |contains("'$UUID'")) | .extensions.pf9_kube_status.data.pf9_kube_node_state'
{
"pf9_kube_start_attempt": 0, // Number of start attempts till now
"last_failed_status_check": "", //*
"pf9_cluster_role": "master",
"last_failed_task": "", // The task/phase script that failed on (pf9_kube_start_attempt-1) attempt
"all_tasks": [
"Generate certs / Send signing request to CA",
"Prepare configuration",
"Configure Docker",
"Start Docker",
"Configure etcd",
"Start etcd",
"Network configuration",
"Configure CNI plugin",
"Configure and start auth web hook / pf9-bouncer",
"Miscellaneous scripts and checks",
"Configure and start kubelet",
"Configure and start kube-proxy",
"Wait for k8s services and network to be up",
"Apply and validate node taints",
"Uncordon node",
"Validate k8s DNS",
"Deploy dashboard",
"Deploy app catalog",
"Deploy metrics server",
"Configure and start Keepalived",
"Configure and start MetalLB",
"Configure and start Autoscaler",
"Configure and start pf9-sentry",
"Drain all pods (stop only operation)"
],
"pf9_kube_node_state": "ok", // **
"current_status_check": "",
"completed_tasks": [
"Generate certs / Send signing request to CA",
"Prepare configuration",
"Configure Docker",
"Start Docker",
"Configure etcd",
"Start etcd",
"Network configuration",
"Configure CNI plugin",
"Configure and start auth web hook / pf9-bouncer",
"Miscellaneous scripts and checks",
"Configure and start kubelet",
"Configure and start kube-proxy",
"Wait for k8s services and network to be up",
"Apply and validate node taints",
"Uncordon node",
"Validate k8s DNS",
"Deploy dashboard",
"Deploy app catalog",
"Deploy metrics server",
"Configure and start Keepalived",
"Configure and start MetalLB",
"Configure and start Autoscaler",
"Configure and start pf9-sentry",
"Drain all pods (stop only operation)"
],
"pf9_kube_service_state": "true",
"all_status_checks": [
"Generate certs / Send signing request to CA",
"Start Docker",
"Start etcd",
"Network configuration",
"Configure and start auth web hook / pf9-bouncer",
"Miscellaneous scripts and checks",
"Configure and start kubelet",
"Configure and start kube-proxy",
"Wait for k8s services and network to be up",
"Configure and start Keepalived"
],
"last_failed_status_time": 0, //*
"pf9_cluster_id": "37ba60bb-1a36-4f78-8b83-528adea459bf",
"current_task": "",
"status_check_timestamp": 1594197441 //*
}
Note:
The last_failed_status_check
field is cleared 10 minutes after the status check is successful.
The pf9_kube_service_state
tries to simulate the node state as reported by the hostAgent. The values this field can report on are as follows:
The following status states are described below.
Status | Description |
---|---|
OK | Everything is fine. |
Converging | Starting pf9-kube failed and this is the initial attempt to restart it. |
Retrying | Starting pf9-kube failed and Nodelet has tries less than 10 times to start pf9-kube. |
Failed | Starting pf9-kube failed and Nodelet has tried more than 10 times to start pf9-kube. |