Skip to main content
  1. Posts/

Force Docker Swarm splitting

·367 words·2 mins· loading · loading ·
DevOps Docker Docker-Swarm

Imagine, that we have such cluster:

1root@ip-172-16-14-154:~# docker node ls
2ID                            HOSTNAME       STATUS  AVAILABILITY  MANAGER STATUS      ENGINE VERSION
39e0rtzevshatwop54579tjnwx     new-master     Ready   Active        Leader              19.03.8
4ddlmhk2i6eyi1s9d5dt32i2da *   old-master-01  Ready   Active        Reachable           19.03.8
5lxj15zgc75os1bjpcwd8who6m     old-master-02  Ready   Active        Reachable           19.03.8

and we want to split it into two parts: old with two master nodes ddlmhk2i and lxj15zgc and a new one on the node 9e0rtzev. After splitting both clusters should have the same list of services and we will manually shutdown unused.

Stop Docker daemon on the node, which should be cut off from the cluster:

1root@new-master:~# systemctl stop docker

At this moment we can remove this node from our old cluster:

1root@old-master-01:~# docker node demote 9e0rtzev
2root@old-master-01:~# docker node rm 9e0rtzev

Now is time for some hacking. Let’s edit two files on the new-master node. First of them will be /var/lib/docker/swarm/state.json:

1[
2  {"node_id":"9e0rtzevshatwop54579tjnwx","addr":"172.16.14.14:2377"},
3  {"node_id":"ddlmhk2i6eyi1s9d5dt32i2da","addr":"172.16.14.154:2377"},
4  {"node_id":"lxj15zgc75os1bjpcwd8who6m","addr":"172.16.14.247:2377"}
5]

We should remove from this file two old nodes and now it will look like:

1[{"node_id":"9e0rtzevshatwop54579tjnwx","addr":"172.16.14.14:2377"}]

And another file is /var/lib/docker/swarm/docker-state.json:

 1{
 2  "LocalAddr":"",
 3  "RemoteAddr":"172.16.14.154:2377",
 4  "ListenAddr":"0.0.0.0:2377",
 5  "AdvertiseAddr":"",
 6  "DataPathAddr":"",
 7  "DefaultAddressPool":null,
 8  "SubnetSize":0,
 9  "DataPathPort":0,
10  "JoinInProgress":false
11}

It can be different, but our task is just remove a value from the RemoteAddr and write this node IP into the LocalAddr value, something like this:

 1{
 2  "LocalAddr":"172.16.14.14",
 3  "RemoteAddr":"",
 4  "ListenAddr":"0.0.0.0:2377",
 5  "AdvertiseAddr":"",
 6  "DataPathAddr":"",
 7  "DefaultAddressPool":null,
 8  "SubnetSize":0,
 9  "DataPathPort":0,
10  "JoinInProgress":false
11}

Now we can start Docker Daemon:

1root@ip-172-16-14-14:~# systemctl start docker
2root@ip-172-16-14-14:~# docker info
3...
4 Swarm: pending
5  Error: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
6  Is Manager: true
7  Node Address: 172.16.14.14
8  Manager Addresses:
9   172.16.14.14:2377

We can see, that now this node doesn’t know anything about old nodes, but it still wants to see other managers to operate properly. We can solve this with the next command:

1root@ip-172-16-14-14:~# docker swarm init --force-new-cluster

Now we have two clusters with the same state and both of them can be operated independently. Just don’t forget to remove old nodes and unused services from both of them.

@soar
Author
@soar
Senior SRE/DevOps engineer

Related

Prometheus Swarm Discovery
·246 words·2 mins· loading · loading
DevOps Docker Golang Grafana Prometheus Docker-Swarm
Memory usage as art
·5 words·1 min· loading · loading
Sysadmin Docker