本文記錄了在安裝k8s v1.13.2過程中出現的各式問題以及解決方案,不定期更新,供日后查看。正常安裝步驟見:
Kubernetes實踐指南:kubeadm安裝集群K8s:v1.13.2 - 簡書
1.kubelet啟動報錯:W0203 MemoryAccounting CPUAccounting not enabled for pid...
[root@k8s-node2 ~]# service kubelet status
Redirecting to /bin/systemctl status kubelet.service
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 日 2019-02-03 11:35:52 CST; 1h 49min ago
Docs: https://kubernetes.io/docs/
Main PID: 9766 (kubelet)
CGroup: /system.slice/kubelet.service
└─9766 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/con...
2月 03 13:10:53 k8s-node2 kubelet[9766]: W0203 13:10:53.182621 9766 container_manager_linux.go:804] CPUAccounting not enabled for pid: 9766
2月 03 13:10:53 k8s-node2 kubelet[9766]: W0203 13:10:53.182630 9766 container_manager_linux.go:807] MemoryAccounting not enabled for pid: 9766
2月 03 13:15:53 k8s-node2 kubelet[9766]: W0203 13:15:53.183017 9766 container_manager_linux.go:804] CPUAccounting not enabled for pid: 9085
2月 03 13:15:53 k8s-node2 kubelet[9766]: W0203 13:15:53.183056 9766 container_manager_linux.go:807] MemoryAccounting not enabled for pid: 9085
2月 03 13:15:53 k8s-node2 kubelet[9766]: W0203 13:15:53.183156 9766 container_manager_linux.go:804] CPUAccounting not enabled for pid: 9766
2月 03 13:15:53 k8s-node2 kubelet[9766]: W0203 13:15:53.183161 9766 container_manager_linux.go:807] MemoryAccounting not enabled for pid: 9766
2月 03 13:20:53 k8s-node2 kubelet[9766]: W0203 13:20:53.184116 9766 container_manager_linux.go:804] CPUAccounting not enabled for pid: 9085
2月 03 13:20:53 k8s-node2 kubelet[9766]: W0203 13:20:53.184155 9766 container_manager_linux.go:807] MemoryAccounting not enabled for pid: 9085
2月 03 13:20:53 k8s-node2 kubelet[9766]: W0203 13:20:53.184237 9766 container_manager_linux.go:804] CPUAccounting not enabled for pid: 9766
2月 03 13:20:53 k8s-node2 kubelet[9766]: W0203 13:20:53.184243 9766 container_manager_linux.go:807] MemoryAccounting not enabled for pid: 9766
首先查看內存的使用情況:# free -h 發現并沒有存在內存不夠的情況。解決辦法:增加一個配置文件,明確啟用DefaultCPUAccounting和DefaultMemoryAccounting:
# mkdir -p /etc/systemd/system.conf.d
# cat <<EOF >/etc/systemd/system.conf.d/kubernetes-accounting.conf
[Manager]
DefaultCPUAccounting=yes
DefaultMemoryAccounting=yes
EOF
# systemctl daemon-reload && systemctl restart kubelet
2.Kubernetes Node節點執行 kubectl get all 錯誤:The connection to the server localhost:8080 was refused.
[root@k8s-node2 ~]# kubectl get all
The connection to the server localhost:8080 was refused - did you specify the right host or port?
使用 netstat -ntlp 命令檢查是否監聽了localhost:8080端口,發現并沒有。而在Master節點上使用kubectl命令雖然不會報錯,但其8080端口仍然未被監聽。
事實上,kubectl命令是通過kube-apiserver接口進行集群管理。該命令可以在Master節點上運行是因為kube-apiserver處于工作狀態:
[root@k8s-master ~]# docker ps | grep apiserver
269a09fc31ce 177db4b8e93a "kube-apiserver --..." 20 hours ago Up 20 hours k8s_kube-apiserver_kube-apiserver-k8s-master_kube-system_e65c58fe4249c7d1554ca017bda21943_0
dcf07ff997a1 k8s.gcr.io/pause:3.1 "/pause" 20 hours ago Up 20 hours k8s_POD_kube-apiserver-k8s-master_kube-system_e65c58fe4249c7d1554ca017bda21943_0
而同時,在Node節點上只有kube-proxy和kubelet處于工作狀態:
[root@k8s-node1 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fa14d993436a 142953928206 "/install-cni.sh" 20 hours ago Up 20 hours k8s_install-cni_calico-node-clc9p_kube-system_ac5f61a7-26d2-11e9-9274-000c29d747fb_0
4e77ea62ac14 01cfa56edcfc "/usr/local/bin/ku..." 20 hours ago Up 20 hours k8s_kube-proxy_kube-proxy-nzfvg_kube-system_ac5f6294-26d2-11e9-9274-000c29d747fb_0
2bb208e1573d e537e5882f91 "start_runit" 20 hours ago Up 20 hours k8s_calico-node_calico-node-clc9p_kube-system_ac5f61a7-26d2-11e9-9274-000c29d747fb_0
8490970048da k8s.gcr.io/pause:3.1 "/pause" 20 hours ago Up 20 hours k8s_POD_calico-node-clc9p_kube-system_ac5f61a7-26d2-11e9-9274-000c29d747fb_0
f8eb0bb6693b k8s.gcr.io/pause:3.1 "/pause" 20 hours ago Up 20 hours k8s_POD_kube-proxy-nzfvg_kube-system_ac5f6294-26d2-11e9-9274-000c29d747fb_0
因此,kubectl命令其實不是為Node節點的主機準備的,而是應該運行在一個Client主機上:如K8s-Master節點的非root用戶。當我們kubeadm init success后,系統會提示我們將admin.conf文件保存到Client主機上:
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join 192.168.1.120:6443 --token oe50fb.0pt36rwvz2utey4d --discovery-token-ca-cert-hash sha256:60bd336002b8f5d269996f1daf324c0a71814d6a25d82ab7b1d17ddeddd68860
我們查看 /etc/kubernetes/admin.conf文件
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjR
YRFRFNU1ESXdNakE1TlRZMU5sb1hEVEk1TURFek1EQTVOVFkxTmxvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTHFwCnRScTl4Smk5cz
NTdUVsVXljNmMwcGhhWSs4OHlQcUpYQnBsZk1YOFpJcmJVWDdHTFB5ZDVzZlBrS0lrblJ6dUgKeTZxb091NUVVbWtYZ1dldlNzK1JITGdYbHNuUFBhSHhCK0o5Y1pxNjg5cnQrd3huMDl6OVpNT0ROc0ZMTHRVMgoxUEFoY3lRZ
TNOZVBPSUdseHQvckZRRlBUV05KQTErbmJCSk9sZEhlVUhmWjNaaVcwbVFHM0IrWk1SUUpWdkM0CmIrdHRVaUpaK3FQL09SaUZKR3VUYmJzS2tsUlNIaG9xMnVtSExxYmhLTVJNQXRRbTIxZWMzaXVxVVp4QWl4MlcKdnR1Uzgr
ZUV0U3lIQW8xTm00bzd2dFh3eGVrTkYzT2lVOUZ5T1VvS3NxdVRKenVhdk9UdVJoYjd1REpQZERoaApFRzZzMlZvUjZyRDB2UjFmZUZVQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZ
NQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFBSytla09IT1dQTGhsVzJva2g0bTlRNTRJY3oKOEJPU1VEYnJsSk9iSXFUaWNvWktsOGNNMjM3OTlDcXUrVDh2WHA3YXRQc0xtd2xRK2VVK2lUVUNZVGk3d013Lwo1M1
lxWjNCSHVQS2F0RDNoVGpFRlVIbzFZVHMyYmZqVHZ5Z2hLbGhDVnBGL1k4NmFHOVFUVUxmc0g5VXpwbWtjCk5DZzU3T0tUWjFNc3FQUmIrM1hRSEFCWHVaR1RNVG4zaGVZR2dnYklVaC9vdTJyM2RhdFY0ZWdTaDhveFBJcmoKa
FdhU0JOcmVaaE45a1VsVmNoT3RsZ2lvcDJzR1A0V2RLQisxc2kxU2x2YUI5aGR6VklpTHFGWnlhY3I5ZUlvaAp1ckVib2lZYXovU2hGeSs1UCs1SWViZ0h5QWtuWm5EbXFKT3ZXbjducUNhc3RmYi81bERHYVZCcmxtZz0KLS0t
LS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
server: https://192.168.1.120:6443
name: kubernetes
即可發現,當Client使用該config文件啟動kubelet后,他將訪問Master節點的6443端口獲得數據(Master 6443端口是處于LISTEN狀態的),而非localhost:8080端口(因為Node節點無法找到該config文件)。我們也可以把Client客戶端放在其他主機中,甚至Node節點。只要將該config文件按照系統提示方式添加到Client客戶端中即可。我們使用scp命令將文件發送至目標主機:
# scp -r .kube/ 192.168.1.110:/root //在此我直接將/root/.kube文件夾發至目標主機
即可實現使用kubectl訪問Master節點。
[root@localhost .kube]# kubectl get no
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 18h v1.13.2
k8s-node1 Ready <none> 18h v1.13.2
k8s-node2 Ready <none> 18h v1.13.2
也就是說,我們正常向Master注冊pod的過程也是在Client客戶端完成的,而非在Node節點或Master節點完成。
3.kubelet啟動報錯:E0208 node "k8s-master" not found
2月 08 15:55:36 k8s-master kubelet[6164]: E0208 15:55:36.068126 6164 kubelet.go:2266] node "k8s-master" not found
2月 08 15:55:36 k8s-master kubelet[6164]: E0208 15:55:36.169675 6164 kubelet.go:2266] node "k8s-master" not found
2月 08 15:55:36 k8s-master kubelet[6164]: E0208 15:55:36.238707 6164 kubelet_node_status.go:94] Unable to register node "k8s-master" with API server: Post https://192.168.1.120:6443/api/v1/nodes: dial tcp 192.168.1.120:6443: connect: connection refused
kubeadm在Master節點也安裝了kubelet,默認情況下不參與負載。這個錯誤比較明顯,即節點(kubelet)無法連接至Master(kube-apiserver),不是因為Master節點6443端口關閉,而是后來我修改過Master節點的IP地址,導致舊的IP地址無法正確匹配。解決方法其一可以通過kubeadm reset 重新安裝。這里我們嘗試修改已安裝好的kubernetes參數來使其正確運行。
[root@k8s-master ~]# cd /etc/kubernetes && ls
總用量 36
-rw------- 1 root root 5455 2月 8 16:05 admin.conf
-rw------- 1 root root 5487 2月 8 16:05 controller-manager.conf
-rw------- 1 root root 5483 2月 8 16:06 kubelet.conf
drwxr-xr-x. 2 root root 113 2月 8 16:08 manifests
drwxr-xr-x. 3 root root 4096 2月 2 17:56 pki
-rw------- 1 root root 5435 2月 8 16:08 scheduler.conf
將conf文件中舊的IP地址(192.168.1.120)修改為(192.168.111.120)并保存,重新加載kubelet服務(文件夾內也有conf文件)
[root@k8s-master ~]# systemctl daemon-reload
[root@k8s-master ~]# systemctl restart kubelet && journactl -xefu kubelet
2月 08 16:47:04 k8s-master kubelet[19409]: E0208 16:47:04.188505 19409 kubelet.go:2266] node "k8s-master" not found
2月 08 16:47:04 k8s-master kubelet[19409]: E0208 16:47:04.290432 19409 kubelet.go:2266] node "k8s-master" not found
2月 08 16:47:04 k8s-master kubelet[19409]: E0208 16:47:04.326230 19409 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://192.168.111.120:6443/api/v1/services?limit=500&resourceVersion=0: x509: certificate is valid for 10.96.0.1, 192.168.1.120, not 192.168.111.120
2月 08 16:47:04 k8s-master kubelet[19409]: E0208 16:47:04.356546 19409 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.111.120:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dk8s-master&limit=500&resourceVersion=0: x509: certificate is valid for 10.96.0.1, 192.168.1.120, not 192.168.111.120
2月 08 16:47:04 k8s-master kubelet[19409]: E0208 16:47:04.362324 19409 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://192.168.111.120:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-master&limit=500&resourceVersion=0: x509: certificate is valid for 10.96.0.1, 192.168.1.120, not 192.168.111.120
從錯誤日志判斷來看,是Master上的kubelet在與同一節點上的kube-apiserver通信過程中,發現這個apiserver返回的tls證書是屬于192.168.1.120的,而非192.168.111.120的apiserver,于是報了錯。為了要解決這個問題,我們需要為新的IP地址生成自己的數字證書。貼上參考鏈接:
生成apiserver數字證書(3.2)
stackoverflow invalid-x509
(好了,這個問題得到此為止了,因為我要自己生成太多的數字證書。最后我在各個節點上重新安裝了一遍。)
4.不關閉swap進行安裝k8s的解決方案(以kubeadm為例)
1.kubelet啟動時加入參數--fail-swap-on=false并重啟(KUBELET_EXTRA_ARGS:/etc/sysconfig/kubelet)
2.kubeadm init --ignore-preflight-errors=Swap(在kubeadm join時也需要手動加入--ignore...)。
5.unknown container "/system.slice/kubelet.service":
kubelet運行時報錯:
Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service":
啟動時添加參數:--runtime-cgroups=/systemd/system.slice
--kubelet-cgroups=/systemd/system.slice