-
Notifications
You must be signed in to change notification settings - Fork 41.9k
Description
What happened?
When kubelet loses connect, the node goes into the unknown state. The node lifecycle controller marks the pod as not ready by the markPodsNotReady function because the health check status of the pod can not be obtained through kubelet. This feature is available only when node's Ready state transitions from true to unknown.
However, if the node is already in the fail state (such as a containerd failure), markPodsNotReady will not take effect if the node loses its connection at this time.
kubernetes/pkg/controller/nodelifecycle/node_lifecycle_controller.go
Lines 883 to 888 in cac5388
| case currentReadyCondition.Status != v1.ConditionTrue && observedReadyCondition.Status == v1.ConditionTrue: | |
| // Report node event only once when status changed. | |
| controllerutil.RecordNodeStatusChange(nc.recorder, node, "NodeNotReady") | |
| fallthrough | |
| case needsRetry && observedReadyCondition.Status != v1.ConditionTrue: | |
| if err = controllerutil.MarkPodsNotReady(ctx, nc.kubeClient, nc.recorder, pods, node.Name); err != nil { |
In this case, the pod may accidentally remain ready, which may cause some network traffic to be accidentally forwarded to this node.
What did you expect to happen?
As long as the node loses its connection beyond grace time, MarkPodsNotReady should always work
How can we reproduce it (as minimally and precisely as possible)?
- Stop containerd and wait for the node
Readystate to false - Stop kubelet or shutdown the node and wait the node
Readystate to unknown - The pods which not be evicted on this node would be always ready
Anything else we need to know?
In the node lifecycle controller logic,MarkPodsNotReady is just triggered when a node goes from true state to an unknown state. The correct way is to trigger when the node becomes unknown state regardless of whether the node state was previously true
Kubernetes version
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.15", GitCommit:"1d79bc3bcccfba7466c44cc2055d6e7442e140ea", GitTreeState:"clean", BuildDate:"2022-09-22T06:03:36Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
$ uname -a
5.4.119-1-tlinux4-0008 #1 SMP Fri Nov 26 11:17:45 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output hereInstall tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
Metadata
Metadata
Labels
Type
Projects
Status