So I found out something fun today, and I thought that I would share my experiences with anyone else who may experience the same "issue" with Kubernetes (although the real hero lives in IRC land; the Atomic folks have been amazingly helpful).
It's well known that the Kubernetes Master is treated as a Unicorn; meaning, what happens when that master or your Kubernetes cluster suffers from a power hit? It's definitely not a great situation to be in. Likewise, there's not much you can do about a power hit if you're working on shit in your house, right?
I noticed that scaling out additional nodes or scaling them back produced some really strange results (both in Kubernetes and of course Cockpit). Quickly I found out that my host had suffered from a power cycle. Thank you Duke Energy! At first glance it seems like you're in purgatory; you can't really create new pods, but you can't seem to delete the old one's either. It appears it could be an issue with
etcd or Kubernetes, but let's dig deeper to find out.
You can try to find these running "ghost" containers by issuing a
kubectl get pods --namespace=default (namespace is default in my lab environment at the moment), but I noticed that you don't get anything in return. The Cockpit interface shows pods getting spun up over and over. So who's right? Both are actually, and it can make you feel pretty powerless.
Deeper investigating showed that the containers were actually in fact being spun up by the
kubectl command, but those containers where not being properly reported back to Kubernetes.
The folks over on the RedHat #atomic IRC channel really helped me out. They said that they personally remove the containers using Docker. Realizing what I was doing (basically creating more work for myself later), I stopped trying to scale out more containers, as this means I was going to be forced to delete them out of Docker directly anyway. Not fun...especially if you were in a "production" environment where you could have thousands of active containers. Hint: This is where good label use comes into play!
So here are the procedures you'll need to perform if you ever run into this, but only in a lab environment. I will update this over the weekend after some additional testing (read on at the bottom to find out more), because my thought is that I may be able to just delete the running
kubernetes container on the master, and that may reset my declarative state for the kubernetes cluster. I will have to dig deeper, but alas this is a lab (which I'm using for a later demonstration), so time is of essence.
Making Kubernetes Useful Again
On each node, stop the following services:
sudo systemctl stop kube-proxy kubelet flanneld
Next, make sure that the Docker service is started, so that you can work directly with the running containers on the Atomic Kubernetes node:
sudo systemctl start docker
Find out which containers are running. For extra credit, also find out which images are downloaded to each of the nodes in the k8s cluster:
[centos@fedminion02 ~]$ sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4274b04c3c6e nginx "nginx -g 'daemon of 46 minutes ago Exited (0) 46 minutes ago k8s_nginx.d7d3eb2f_my-nginx-zvq6a_default_11523c39-5c88-11e5-833d-fa163e6fd979_33fb7971 871cc3668e87 gcr.io/google_containers/pause:0.8.0 "/pause" 46 minutes ago Exited (0) 46 minutes ago k8s_POD.ef28e851_my-nginx-zvq6a_default_11523c39-5c88-11e5-833d-fa163e6fd979_fc2dfa29 fb95a7269f8c nginx "nginx -g 'daemon of 50 minutes ago Exited (0) 50 minutes ago k8s_nginx.d7d3eb2f_my-nginx-dvped_default_8ba8e71f-5c87-11e5-833d-fa163e6fd979_30e16003 edb89618ab45 gcr.io/google_containers/pause:0.8.0 "/pause" 50 minutes ago Exited (0) 50 minutes ago k8s_POD.ef28e851_my-nginx-dvped_default_8ba8e71f-5c87-11e5-833d-fa163e6fd979_9731a746 5734b49d6f02 nginx "nginx -g 'daemon of 20 hours ago Exited (0) 50 minutes ago k8s_nginx.d7d3eb2f_my-nginx-9qdsz_default_1af37908-5bef-11e5-833d-fa163e6fd979_6975d0cb b4e969c99ba9 gcr.io/google_containers/pause:0.8.0 "/pause" 20 hours ago Exited (0) 50 minutes ago k8s_POD.ef28e851_my-nginx-9qdsz_default_1af37908-5bef-11e5-833d-fa163e6fd979_0138d224 36ce4d655d68 nginx "nginx -g 'daemon of 20 hours ago Exited (0) 20 hours ago k8s_nginx.d7d3eb2f_my-nginx-pjkk9_default_16578ce0-5bef-11e5-833d-fa163e6fd979_ae73356d 2eb60a208f14 gcr.io/google_containers/pause:0.8.0 "/pause" 20 hours ago Exited (0) 20 hours ago k8s_POD.ef28e851_my-nginx-pjkk9_default_16578ce0-5bef-11e5-833d-fa163e6fd979_607d9c01 d7ff987e7240 nginx "nginx -g 'daemon of 20 hours ago Exited (0) 20 hours ago k8s_nginx.d7d3eb2f_my-nginx-xsfpf_default_0cbbc5e4-5bef-11e5-833d-fa163e6fd979_ca445d38 7d814893c059 gcr.io/google_containers/pause:0.8.0 "/pause" 20 hours ago Exited (0) 20 hours ago k8s_POD.ef28e851_my-nginx-xsfpf_default_0cbbc5e4-5bef-11e5-833d-fa163e6fd979_100aa082 860f0e4e3f8b nginx "nginx -g 'daemon of 21 hours ago Exited (0) 20 hours ago k8s_nginx.d7d3eb2f_my-nginx-bpfed_default_3e0d7c10-5be5-11e5-833d-fa163e6fd979_5c30503c f3d238100f6d gcr.io/google_containers/pause:0.8.0 "/pause" 21 hours ago Exited (0) 20 hours ago k8s_POD.ef28e851_my-nginx-bpfed_default_3e0d7c10-5be5-11e5-833d-fa163e6fd979_c8d23490 [centos@fedminion02 ~]$ [centos@fedminion02 ~]$ [centos@fedminion02 ~]$ [centos@fedminion03 ~]$ sudo docker images -a REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE docker.io/nginx latest 0b354d33906d 6 days ago 132.8 MB <none> <none> cc72ca035672 6 days ago 132.8 MB <none> <none> bfd040d9aed8 6 days ago 132.8 MB <none> <none> b0d40bad0b2b 6 days ago 132.8 MB <none> <none> 4f7e3e7c2546 6 days ago 132.8 MB <none> <none> 3a6a1b71b6e0 6 days ago 132.8 MB <none> <none> 93f81de2705a 6 days ago 125.1 MB <none> <none> 4ac684e3f295 6 days ago 125.1 MB <none> <none> d6c6bbd63f57 6 days ago 125.1 MB <none> <none> 426ac73b867e 6 days ago 125.1 MB none> <none> 8c00acfb0175 8 days ago 125.1 MB <none> <none> 843e2bded498 8 days ago 125.1 MB gcr.io/google_containers/pause 0.8.0 2c40b0526b63 5 months ago 241.7 kB <none> <none> 56ba5533a2db 5 months ago 241.7 kB <none> <none> 511136ea3c5a 2 years ago 0 B [centos@fedminion03 ~]$
So I deleted them:
[centos@fedminion03 ~]$ sudo docker rm -f 9ef242e9f4fe 3f7902f141f8 9005e5560b46 6f645a44c13b f206cdae56ab 8e4614e3b3e1 69c1970f2899 31e362ce4fc5 d2246a05f4ab 134aef09ba5f 9ef242e9f4fe 3f7902f141f8 9005e5560b46 6f645a44c13b f206cdae56ab 8e4614e3b3e1 69c1970f2899 31e362ce4fc5 d2246a05f4ab 134aef09ba5f [centos@fedminion03 ~]$ [centos@fedminion03 ~]$ sudo docker rmi -f 0b354d33906d cc72ca035672 bfd040d9aed8 b0d40bad0b2b 4f7e3e7c2546 3a6a1b71b6e0 93f81de2705a 4ac684e3f295 d6c6bbd63f57 426ac73b867e 8c00acfb0175 843e2bded498 2c40b0526b63 56ba5533a2db 511136ea3c5a Untagged: docker.io/nginx:latest Deleted: 0b354d33906d30ac52d2817ea770ddce18c7531e58b5b3ca0ae78873f5d2e207 Deleted: cc72ca0356723a8008c5e8fe0118700aee1153930e16877560db58a0121757f0 Deleted: bfd040d9aed8c694585cfa02bb711f8a93f9fd1d6c8920b89b0030b2017c0b5b Deleted: b0d40bad0b2b521174cf3ee2f4e74d889561a7c5e2c9c85167978de239bc1c97 Deleted: 4f7e3e7c25464989f847bb9a1d1f3d4c479d12137b44ed5c8b8d83bcf67f4d4b Deleted: 3a6a1b71b6e021df3abda9f223b016a8bc018c27785530d4af69ada52cac404a Deleted: 93f81de2705a3caf7ffd22347c5768b5169c6d037e3867825a4ab7cdd9a7de99 Deleted: 4ac684e3f295443f69e92c60768b1052ded4aac3a54466c2debf66de35a479e9 Deleted: d6c6bbd63f57240b4d7a3714f032004b1ed0aa1b9761778f9a2a8632e0c5efd1 Deleted: 426ac73b867e7b17208d55c2bcc6ba9bd81ae4aff1ad79106e63fc94d783f27f Deleted: 8c00acfb017549e44d28098762c3e6296872a1ca9b90385855f1019d84bb0dac Deleted: 843e2bded49837e4846422f3a82a67be3ccc46c3e636e03d8d946c57564468ba Deleted: 2c40b0526b6358710fd09e7b8c022429268cc61703b4777e528ac9d469a07ca1 Deleted: 56ba5533a2dbf18b017ed12d99c6c83485f7146ed0eb3a2e9966c27fc5a5dd7b Deleted: 511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158 [centos@fedminion03 ~]$
I really hope that some @kubernetesio folks comment on this one. I really believe that a better way of handling this could be erasing only the
kubernetes container on the master and rebooting the minions? From what I understand, the
kubernetes container helps maintain state and proxies information between the master and the minions, so it would make sense that this could resolve the issue. I will try attempt to explore this option in my home lab later this week. Here's a sample of what I would be testing:
[fedora@fedmaster ~]$ kubectl get services NAME LABELS SELECTOR IP(S) PORT(S) AGE kubernetes component=apiserver,provider=kubernetes <none> 10.109.0.1 443/TCP 23h [fedora@fedmaster ~]$ kubectl delete kubernetes
As with many folks, I am still learning Kubernetes. It's an extremely impressive project, and I'm lucky to be exploring it.I'm also thankful for these little strange senarios, because they'll only help me learn more, and I love learning new things!