Kube-proxy无法检索节点信息 - 无效nodeIP

问题描述:

我一直在尝试设置Kubernetes集群几个月,但目前为止我还没有运气。Kube-proxy无法检索节点信息 - 无效nodeIP

我试图将其设置为4 裸机运行的电脑coreOS。我只是重新安装了所有东西,但我遇到了和以前一样的问题。我正在关注this教程。我想我已经正确配置了一切,但我不是100%确定的。当我重新启动任何一台机器,kubelet和flanneld服务正在运行,但我看到下面的错误为他们检查服务状态时systemctl status

kubelet错误:Process: 1246 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid (code=exited, status=254)

flanneld错误Process: 1057 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/lib/coreos/flannel-wrapper.uuid (code=exited, status=254)

如果我重新启动这两个服务,他们工作,或至少看起来像他们工作 - 我没有得到任何错误。

其他一切似乎工作正常,所以唯一的问题(我认为)剩下的就是所有节点上的kube-proxy服务。

如果我跑kubectl get pods我看到所有豆荚运行:

$ kubectl get pods 
NAME         READY  STATUS RESTARTS AGE 
kube-apiserver-kubernetes-4   1/1  Running 4   6m 
kube-controller-manager-kubernetes-4 1/1  Running 6   6m 
kube-proxy-kubernetes-1    1/1  Running 4   18h 
kube-proxy-kubernetes-2    1/1  Running 5   26m 
kube-proxy-kubernetes-3    1/1  Running 4   19m 
kube-proxy-kubernetes-4    1/1  Running 4   18h 
kube-scheduler-kubernetes-4   1/1  Running 6   18h 

The answer to this question建议检查是否已注册上kubelet kubectl get node回报率相同的名称。至于我查了日志,节点正确注册,这也是kubectl get node

$ kubectl get node 
NAME   STATUS      AGE  VERSION 
kubernetes-1 Ready       18h  v1.6.1+coreos.0 
kubernetes-2 Ready       36m  v1.6.1+coreos.0 
kubernetes-3 Ready       29m  v1.6.1+coreos.0 
kubernetes-4 Ready,SchedulingDisabled  18h  v1.6.1+coreos.0 

我用(上面链接)的教程建议我用--hostname-override输出,但我不能在获得节点信息主节点(kubernetes-4)如果我试图在本地卷曲它。所以我删除了它,现在我可以正常获取节点信息。

有人建议它可能是一个法兰绒问题,我应该检查法兰绒端口。使用netstat -lntu我得到以下输出:

Active Internet connections (only servers) 
Proto Recv-Q Send-Q Local Address   Foreign Address   State  
tcp  0  0 127.0.0.1:10248   0.0.0.0:*    LISTEN  
tcp  0  0 127.0.0.1:10249   0.0.0.0:*    LISTEN  
tcp  0  0 127.0.0.1:2379   0.0.0.0:*    LISTEN  
tcp  0  0 MASTER_IP:2379   0.0.0.0:*    LISTEN  
tcp  0  0 MASTER_IP:2380   0.0.0.0:*    LISTEN  
tcp  0  0 127.0.0.1:8080   0.0.0.0:*    LISTEN  
tcp6  0  0 :::4194     :::*     LISTEN  
tcp6  0  0 :::10250    :::*     LISTEN  
tcp6  0  0 :::10251    :::*     LISTEN  
tcp6  0  0 :::10252    :::*     LISTEN  
tcp6  0  0 :::10255    :::*     LISTEN  
tcp6  0  0 :::22     :::*     LISTEN  
tcp6  0  0 :::443     :::*     LISTEN  
udp  0  0 0.0.0.0:8472   0.0.0.0:*      

所以我假定端口是罚款?

而且etcd2作品,etcdctl cluster-health表明,所有节点都是健康

这是重新启动时启动etcd2,除此之外云配置的一部分,我只存储SSH密钥和节点的用户名/密码/组它:

#cloud-config 

coreos: 
    etcd2: 
    name: "kubernetes-4" 
    initial-advertise-peer-urls: "http://NODE_IP:2380" 
    listen-peer-urls: "http://NODE_IP:2380" 
    listen-client-urls: "http://NODE_IP,http://127.0.0.1:2379" 
    advertise-client-urls: "http://NODE_IP:2379" 
    initial-cluster-token: "etcd-cluster-1" 
    initial-cluster: "kubernetes-4=http://MASTER_IP:2380,kubernetes-1=http://WORKER_1_IP:2380,kubernetes-2=http://WORKER_2_IP:2380,kubernetes-3=http://WORKER_3_IP:2380" 
    initial-cluster-state: "new" 
    units: 
    - name: etcd2.service 
     command: start 

这是/etc/flannel/options.env文件的内容:

FLANNELD_IFACE=NODE_IP 
FLANNELD_ETCD_ENDPOINTS=http://MASTER_IP:2379,http://WORKER_1_IP:2379,http://WORKER_2_IP:2379,http://WORKER_3_IP:2379 

相同的端点是下在kube-apiserver.yaml文件

任何想法/建议可能是什么问题?此外,如果有一些细节想让我知道,我会将它们添加到帖子中。

编辑:我忘了包含kube-proxy日志。

主节点KUBE-代理日志:

$ kubectl logs kube-proxy-kubernetes-4 
I0615 07:47:45.250631  1 server.go:225] Using iptables Proxier. 
W0615 07:47:45.286923  1 server.go:469] Failed to retrieve node info: Get http://127.0.0.1:8080/api/v1/nodes/kubernetes-4: dial tcp 127.0.0.1:8080: getsockopt: connection refused 
W0615 07:47:45.303576  1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP 
W0615 07:47:45.303593  1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic 
I0615 07:47:45.303646  1 server.go:249] Tearing down userspace rules. 
E0615 07:47:45.357276  1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get http://127.0.0.1:8080/api/v1/endpoints?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused 
E0615 07:47:45.357278  1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:46: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused 

工作节点KUBE-代理日志:

$ kubectl logs kube-proxy-kubernetes-1 
I0615 07:47:33.667025  1 server.go:225] Using iptables Proxier. 
W0615 07:47:33.697387  1 server.go:469] Failed to retrieve node info: Get https://MASTER_IP/api/v1/nodes/kubernetes-1: dial tcp MASTER_IP:443: getsockopt: connection refused 
W0615 07:47:33.712718  1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP 
W0615 07:47:33.712734  1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic 
I0615 07:47:33.712773  1 server.go:249] Tearing down userspace rules. 
E0615 07:47:33.787122  1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get https://MASTER_IP/api/v1/endpoints?resourceVersion=0: dial tcp MASTER_IP:443: getsockopt: connection refused 
E0615 07:47:33.787144  1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:46: Failed to list *api.Service: Get https://MASTER_IP/api/v1/services?resourceVersion=0: dial tcp MASTER_IP:443: getsockopt: connection refused 

你尝试脚本here?这些是您使用的教程的精简版本,适用于各种平台。这些脚本完全适用于k8s v1.6.4的裸机。我有一个tweaked script更好的加密。

kube-apiserver未运行,这说明错误dial tcp 127.0.0.1:8080: getsockopt: connection refused。当我调试kube-apiserver时,这是我在节点上做的事情:

  1. 删除/etc/kubernetes/manifests/kube-apiserver.yaml
  2. 手动运行一个hyperkube容器。根据您的配置,您将不得不安装额外的卷(即-v)以将文件公开到容器。将图像版本更新为您使用的版本。

    docker run --net=host -it -v /etc/kubernetes/ssl:/etc/kubernetes/ssl quay.io/coreos/hyperkube:v1.6.2_coreos.0

  3. 上述命令将在hyperkube容器推出的壳。现在,使用kube-apiserver.yaml清单中的标志启动kube-apiserver。它应类似于此示例:

    /hyperkube apiserver \ --bind-address=0.0.0.0 \ --etcd-cafile=/etc/kubernetes/ssl/apiserver/ca.pem \ --etcd-certfile=/etc/kubernetes/ssl/apiserver/client.pem \ --etcd-keyfile=/etc/kubernetes/ssl/apiserver/client-key.pem \ --etcd-servers=https://10.246.40.20:2379,https://10.246.40.21:2379,https://10.246.40.22:2379 \ ...

在任何情况下,我建议你拆掉集群,首先尝试的脚本。它可能只是工作ootb。