龙芯平台搭建Docker Swarm集群
http://ask.loongnix.org/?/article/91
本教程将指导完成如下工作:
- 初始化一个Docker swarm集群;
- 添加节点到swarm;
- 部署swarm服务;
- 管理swarm。
准备工作:
- 三台Linux系统的主机(可以是物理机、虚拟机或者docker容器,本文选择了三台龙芯3A3000+Loongnix(Fedora21-20170927));
- 每台主机安装Docker Engine且版本为1.12及以上;
- 其中一台主机作为管理节点,需知道其IP地址;
- 主机彼此之间开放下面几个端口:
- TCP 端口2377用于集群管理通信,
- TCP/UDP端口7946用于节点间通信,
- UDP端口4789用于overlay网络通信。
默认情况这些端口都是开放的,如果不确定可以执行下面的命令打开这3个端
iptables -A INPUT -p tcp --dport 2377 -j ACCEPTiptables -A INPUT -p tcp --dport 7946 -j ACCEPTiptables -A INPUT -p udp --dport 7946 -j ACCEPTiptables -A INPUT -p udp --dport 4789 -j ACCEPT
初始化一个Docker swarm集群首先确认各主机的Docker deamon已经启动:
如果服务状态不是active(running),执行命令service docker start来启动Docker deamon。接下来就可以正式开始了。1.选择一台主机作为管理节点(manager1),获取到主机IP为10.20.42.45。终端输入命令docker swarm init 初始化swarm。
[[email protected] ~]# docker swarm init --advertise-addr 10.20.42.45Swarm initialized: current node (892ozqeoeh6fugx5iao3luduk) is now a manager. To add a worker to this swarm, run the following command: docker swarm join \ --token SWMTKN-1-5vs5ndm8k5idcxeckprr61kg6a7h90dp3uihdhr3kwl1ejwtwg-58jqj86p1nqfh225t51p5h8lp \ 10.20.42.45:2377 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
--advertise-addr 配置管理节点的广播地址为10.20.42.45,其他的节点要想加入集群需要能够访问该地址。输出信息显示了其他节点分别作为管理节点和工作节点加入该集群的方法。2.输入命令docker info查看当前状态的swarm信息,截取部分关键信息如下:
[[email protected] ~]# docker infoContainers: 10 Running: 0 Paused: 0 Stopped: 10Images: 22Server Version: 1.12.2... ...Swarm: active NodeID: 250tj9l3mnrrtprdd0990b2t3 Is Manager: true ClusterID: atrevada8k0amn83zdiig6qkb Managers: 1 Nodes: 1 Orchestration: Task History Retention Limit: 5... ...
3.输入命令docker node ls查看节点信息:
[[email protected] loongson]# docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS250tj9l3mnrrtprdd0990b2t3 * manager1 Ready Active Leader
*表示你连接到的节点ID。添加两个节点到swarm另外选择一台主机作为工作节点(worker1),第三台主机也作为工作节点(worker2)。1.worker1上打开终端。上文我们在创建swarm时,输出提示信息展示了如何作为工作节点加入swarm:
[[email protected] ~]# docker swarm join /--token SWMTKN-1-5vs5ndm8k5idcxeckprr61kg6a7h90dp3uihdhr3kwl1ejwtwg-58jqj86p1nqfh225t51p5h8lp /10.20.42.45:2377This node joined a swarm as a worker.
如果你丢失了上面命令的信息,可以在manager1上执行docker swarm join-token worker重新获取。2.worker2上重复worker1的步骤,作为工作节点加入swarm。3.回到管理节点mgnager1,输入命令docker node ls查看swarm内所有节点状态:
[[email protected] ~]# docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS250tj9l3mnrrtprdd0990b2t3 * manager1 Ready Active Leadera24i9nu2943niy8eq239bbpwv worker1 Ready Active e0lh6c2zb57qg8db7usvg17r6 worker2 Ready Active
*显示本机连接到的节点,MANAGERSTATUS一栏为Leader表示该节点为管理节点,空表示为工作节点。 部署一个服务到swarm为了更直观的观察集群的服务编排,我们在管理节点启动portainer,其中swarm visualizer模块能直观地看到每个节点的服务详情。portainer 下载:
[[email protected] ~]# docker pull jiangxinshang/portainer
启动portainer(9000端口不要被其他应用占用):
[[email protected] ~]# docker run -t -i -p 9000:9000 -v /var/run/docker.sock:/var/run/docker.sock docker.io/jiangxinshang/portainer2017/10/13 07:55:28 Starting Portainer 1.14.3 on :9000
浏览器输出127.0.0.1:9000就能看到集群图示了,下图为服务部署之前的情况:
1.manager1上打开终端,输入命令:
[[email protected] ~]# docker service create --replicas 1 --name hello 10.20.42.45:5000/fedora /bin/bash -c "ping loongnix.org"an85njt7e5dadfpwcfyr21sfs
docker service create命令是创建服务--name 将该服务命名为hello-- replicas 规定了该服务的期望状态为一个运行示例参数10.20.42.45:5000/fedora /bin/bash -c "ping loongnix.org"定义了服务是用镜像10.20.42.45:5000/fedora(三台主机节点上必须有同一个pull下来的镜像)创建一个容器,并在容器内执行/bin/bash -c "ping loongnix.org"。2.执行如下命令,可以查看当前服务状态:
[[email protected] ~]# docker service lsID NAME REPLICAS IMAGE COMMANDan85njt7e5da hello 1/1 10.20.42.45:5000/fedora /bin/bash -c ping loongnix.org
3. 查看该服务信息
[[email protected] ~]# docker service ps helloID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERRORcwk9d8b67n20lmglhbz7jewne hello.1 10.20.42.45:5000/fedora worker2 Running Running 5 minutes ago
当前状态是服务运行在worker2上,但也可能运行在worker1或者manager1上,因为管理节点同时作为工作节点也会运行服务。在下面的步骤中你会看到更详细的演示。在swarm visualizer上观察当前三个节点的服务。
查看服务的详细信息1.登录manager1,终端执行命令docker service inspect --pretty <SERVICE-ID>,能查看该服务可读性良好的格式化输出:
[[email protected] ~]# docker service inspect --pretty helloID: an85njt7e5dadfpwcfyr21sfsName: helloMode: Replicated Replicas: 1Placement:UpdateConfig: Parallelism: 1 On failure: pauseContainerSpec: Image: 10.20.42.45:5000/fedora Args: /bin/bash -c ping loongnix.orgResources:
2. 如果把--pretty去掉,看到的则是json形式的格式化输出:
[[email protected] ~]# docker service inspect hello[ { "ID": "an85njt7e5dadfpwcfyr21sfs", "Version": { "Index": 230 }, "CreatedAt": "2017-10-13T02:14:33.706598Z", "UpdatedAt": "2017-10-13T02:14:33.706598Z", "Spec": { "Name": "hello", "TaskTemplate": { "ContainerSpec": { "Image": "10.20.42.45:5000/fedora", "Args": [ "/bin/bash", "-c", "ping", "loongnix.org" ] }, "Resources": { "Limits": {}, "Reservations": {} }, "RestartPolicy": { "Condition": "any", "MaxAttempts": 0 }, "Placement": {} }, "Mode": { "Replicated": { "Replicas": 1 } }, "UpdateConfig": { "Parallelism": 1, "FailureAction": "pause" }, "EndpointSpec": { "Mode": "vip" } }, "Endpoint": { "Spec": {} }, "UpdateStatus": { "StartedAt": "0001-01-01T00:00:00Z", "CompletedAt": "0001-01-01T00:00:00Z" } }]
伸缩服务1.manager1上执行命令docker service scale <SERVICE-ID>=<NUMBER-OF-TASKA>可修改当前运行任务的个数,例如:
[[email protected] ~]# docker service scale hello=5hello scaled to 5
2. 通过命令查看当前运行任务个数,REPLICAS变为了5/5。
[[email protected] ~]# docker service lsID NAME REPLICAS IMAGE COMMANDan85njt7e5da hello 5/5 10.20.42.45:5000/fedora /bin/bash -c ping loongnix.org
3.查看hello服务在各节点分配情况:
[[email protected] ~]# docker service ps helloID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERRORcwk9d8b67n20lmglhbz7jewne hello.1 10.20.42.45:5000/fedora worker2 Running Running 7 minutes ago 9wyodmvvf77ucbums23ortuj0 hello.2 10.20.42.45:5000/fedora worker1 Running Running 41 seconds ago 19ev4tcj0be2t1xbdkf7evp4f hello.3 10.20.42.45:5000/fedora manager1 Running Running 44 seconds ago 1tm1tc2r8xdgdry2hu0pjilmy hello.4 10.20.42.45:5000/fedora manager1 Running Running less than a second ago cfqh34h9e6jv7l0iiwau8jhsf hello.5 10.20.42.45:5000/fedora worker2 Running Running less than a second ago
通过swarm visualizer查看服务直观图示:
swarm会负载均衡地编排服务在各节点的运行。4. 在manager1节点上我们也可以观察到对应的两个容器信息:
[[email protected] ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES2f6be7870988 10.20.42.45:5000/fedora:latest "/bin/bash -c 'ping l" 5 minutes ago Up 5 minutes hello.4.1tm1tc2r8xdgdry2hu0pjilmya2da3bb3c4f0 10.20.42.45:5000/fedora:latest "/bin/bash -c 'ping l" 5 minutes ago Up 5 minutes hello.3.19ev4tcj0be2t1xbdkf7evp4f
worker1上通过命令观察到一个对应容器信息:
[[email protected] ~]# docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES954af2601ac2 10.20.42.45:5000/fedora:latest "/bin/bash -c 'ping l" 6 minutes ago Up 6 minutes hello.2.9wyodmvvf77ucbums23ortuj0
worker2 略。删除运行的服务1.在管理节点manager1上,删除服务通过命令dockr service rm <SERVICE-ID>实现:
[[email protected] ~]# docker service rm hellohello
2. 管理节点再查看该服务信息会报错:
[[email protected] ~]# docker service ps helloError: No such service: hello
3.工作节点查看容器也关闭了
[[email protected] ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
滚动更新服务在这个环节我们部署一个服务来创建基于fedora:21.0镜像的容器,然后演示滚动更新服务,升级镜像容器到fedora:21.1。1.首先在manager1节点部署服务,并配置swarm的更新间隔时间为10秒:
[[email protected] ~]# docker service create --replicas 5 --name fedora --update-delay 10s fedora:21.0 ping loongnix.orgdes35b3cu097uelo8n8zv5gez
我们在部署服务时指定滚动更新策略。--update-delay 表示更新服务的一个任务或一组任务之间的时间间隔。时间间隔用数字和单位组成,m 表示分,h 表示时,例如10m30s表示指定了10分30秒的间隔。默认情况下,调度器一次更新一个任务。也可以通过参数 --update-parallelism 配置调度器每次同时更新的最大任务数量。默认情况下,若更新一个任务返回了RUNNING状态,调度器会转去更新下一个,直到所有任务都更新完成;若更新一个任务返回了FAILED,调度器则暂停更新。我们可以在执行docker service create 命令或 docker service update 命令时使用 --update-failure-action 参数来指定更新返回失败之后的行为。2. 查看集群内各节点的服务编排:
3.开始更新fedora镜像,swarm管理器将依据update的配置测略实施更新:
[[email protected] ~]# docker service update --image fedora:21.1 fedorafedora
调度器按照如下步骤实现滚动更新:
- 停止第一个任务
- 为已停止的任务调度更新
- 为已更新的任务开启容器
- 如果一个任务的更新结果返回RUNNING,等待指定的时间间隔后开始更新下一个任务;如果更新一个任务的任意阶段返回了FAILED,中止更新任务。
4.通过portainer能看到服务的实时更新情况:
5.输入命令docker service ps <SERVICE-ID>观察滚动更新:
[[email protected] ~]# docker service ps fedoraID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR6t6p5r9ujrsq0e1kf3k0yta6n fedora.1 fedora:21.1 worker1 Running Running 8 minutes ago 2vctv5ay741c7z6vlx2kqdwje \_ fedora.1 fedora:21.0 worker2 Shutdown Shutdown 8 minutes ago 8vrfziykhn7ar7b7zyhxlsarr fedora.2 fedora:21.1 manager1 Running Running 6 minutes ago cy0zvhqawk1j9g0jb112sq482 \_ fedora.2 fedora:21.0 worker1 Shutdown Shutdown 6 minutes ago 0yojbjr1omf46qycc2c6at4b7 fedora.3 fedora:21.1 worker2 Running Running 7 minutes ago cisqv46430hy7p3kj5xaefgd8 \_ fedora.3 fedora:21.0 manager1 Shutdown Shutdown 7 minutes ago 17ailb36wdeg4zvkl9mytoxhu fedora.4 fedora:21.1 manager1 Running Running 6 minutes ago cvpl0svmlyj1ozdb6r0eiovts \_ fedora.4 fedora:21.0 manager1 Shutdown Shutdown 6 minutes ago 4lt4ax7ezyqid6ndk89ksngc5 fedora.5 fedora:21.1 worker2 Running Running 7 minutes ago 8sbzju1llogo7d6s9t4h0e1ia \_ fedora.5 fedora:21.0 worker2 Shutdown Shutdown 7 minutes ago
输出显示全部任务已经更新完毕。 下线某个节点在前面所有的步骤中,所有节点都是运行状态且可用性为ACTIVE。swarm管理器会向ACTIVE状态的节点分配任务,目前为止各节点都能接收任务。有时,像计划中的维护时段,需要将一个节点可用性设置为DRAIN。DRAIN的节点不能从swarm管理器接收任务。管理器会将DRAIN节点的任务停止掉,分发给其他ACTIVE的节点。1.开始之前,先确认集群内各节点状态都是ACTIVE:
[[email protected] ~]# docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS250tj9l3mnrrtprdd0990b2t3 * manager1 Ready Active Leadera24i9nu2943niy8eq239bbpwv worker1 Ready Active e0lh6c2zb57qg8db7usvg17r6 worker2 Ready Active
2.重新执行之前的服务部署,将任务个数设为3,保证每个节点都被分发有任务:
[[email protected] ~]# docker service create --replicas 3 --name helloagain 10.20.42.45:5000/fedora /bin/bash -c "ping loongnix.org"0rxfbv9fwrs0fv06hfnp8j5je[[email protected] ~]# docker service ps helloagainID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR8xszceewi2dj9xhbce53rkq1a helloagain.1 10.20.42.45:5000/fedora worker1 Running Preparing 6 seconds ago 6mwhdnseq3wkkxbc3dmxhwoey helloagain.2 10.20.42.45:5000/fedora worker2 Running Running 1 seconds ago 395a7pk7e4vby3jd1sihv0gic helloagain.3 10.20.42.45:5000/fedora manager1 Running Running 5 seconds ago
3.执行命令docker node update --availability drain <NODE-ID>将一个存在任务的节点下线:
[[email protected] ~]# docker node update --availability drain worker1worker1
4.查看下线节点的详细信息,其中Availability显示为Drain:
[[email protected] ~]# docker node inspect --pretty worker1ID: a24i9nu2943niy8eq239bbpwvHostname: worker1Joined at: 2017-10-13 01:16:01.272489 +0000 utcStatus: State: Ready Availability: DrainPlatform: Operating System: linux Architecture: mips64Resources: CPUs: 4 Memory: 7.598 GiBPlugins: Network: bridge, host, null, overlay Volume: localEngine Version: 1.12.2
5.查看该服务当前的编排情况:
[[email protected] ~]# docker service ps helloagainID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERRORe6mh01ekb4dcpnnu3yez7qqdx helloagain.1 10.20.42.45:5000/fedora worker2 Running Running 4 minutes ago 8xszceewi2dj9xhbce53rkq1a \_ helloagain.1 10.20.42.45:5000/fedora worker1 Shutdown Shutdown 4 minutes ago 6mwhdnseq3wkkxbc3dmxhwoey helloagain.2 10.20.42.45:5000/fedora worker2 Running Running 10 minutes ago 395a7pk7e4vby3jd1sihv0gic helloagain.3 10.20.42.45:5000/fedora manager1 Running Running 10 minutes ago
worker1节点的任务已经关闭,被分发到了worker2上。6.重新将worker1的Availability从DRAIN改回为ACTIVE,再观察:
[[email protected] ~]# docker node update --availability active worker1worker1[[email protected] ~]# docker node inspect --pretty worker1ID: a24i9nu2943niy8eq239bbpwvHostname: worker1Joined at: 2017-10-13 01:16:01.272489 +0000 utcStatus: State: Ready Availability: ActivePlatform: Operating System: linux Architecture: mips64Resources: CPUs: 4 Memory: 7.598 GiBPlugins: Network: bridge, host, null, overlay Volume: localEngine Version: 1.12.2[[email protected] ~]# docker service ps helloagainID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERRORe6mh01ekb4dcpnnu3yez7qqdx helloagain.1 10.20.42.45:5000/fedora worker2 Running Running 8 minutes ago 8xszceewi2dj9xhbce53rkq1a \_ helloagain.1 10.20.42.45:5000/fedora worker1 Shutdown Shutdown 8 minutes ago 6mwhdnseq3wkkxbc3dmxhwoey helloagain.2 10.20.42.45:5000/fedora worker2 Running Running 13 minutes ago 395a7pk7e4vby3jd1sihv0gic helloagain.3 10.20.42.45:5000/fedora manager1 Running Running 13 minutes ago
可以看到,worker1的Availability状态变回Active,状态为Ready。因为当前没有任务变化还暂时没有被分配任务。一个可用性为Active的节点可以在以下情况接收新的任务:
- 当伸缩一个服务时
- 当任务滚动更新时
- 当其他某个节点被设为Drain时
- 当某个任务在其他 Active 节点上启动失败时