Troubleshooting Kubernetes: "zombie pods"
I recently ran into a mysterious problem as I was developing and testing a lab exercise to teach about Kubernetes resiliency. I sort of caused the problem myself, because I had run through several scenarios with the example application, and I wanted to blow it all away and start over...so I just started deleting things. That, my friends, is a sure-fire way to break something. If you are dealing with Deployments and ReplicaSets, merely deleting a pod is just going to cause K8s to try and redeploy it. I ended up with a handful of pods that were stuck in a state of "Terminating," and they would not die. For days. So, I asked around, tried researching the problem. A google search of "pods stuck in terminating" gave many hits, with many different possible causes and solutions. Some issues mentioned kubelet and a hostname mismatch - that was not it. I tried doing a drain, cordon, and shutdown of the node. When I started it back up, the pods were still there, stil...