May 14, 2020, 6:52 p.m.
Posted by soar

Force Docker Swarm splitting

Imagine, that we have such cluster:

root@ip-172-16-14-154:~# docker node ls
ID                            HOSTNAME       STATUS  AVAILABILITY  MANAGER STATUS      ENGINE VERSION
9e0rtzevshatwop54579tjnwx     new-master     Ready   Active        Leader              19.03.8
ddlmhk2i6eyi1s9d5dt32i2da *   old-master-01  Ready   Active        Reachable           19.03.8
lxj15zgc75os1bjpcwd8who6m     old-master-02  Ready   Active        Reachable           19.03.8

and we want to split it into two parts: old with two master nodes ddlmhk2i and lxj15zgc and a new one on the node 9e0rtzev. After splitting both clusters should have the same list of services and we will manually shutdown unused.

Stop Docker daemon on the node, which should be cut off from the cluster:

root@new-master:~# systemctl stop docker

At this moment we can remove this node from our old cluster:

root@old-master-01:~# docker node demote 9e0rtzev
root@old-master-01:~# docker node rm 9e0rtzev

Now is time for some hacking. Let's edit two files on the new-master node. First of them will be /var/lib/docker/swarm/state.json:

[
  {"node_id":"9e0rtzevshatwop54579tjnwx","addr":"172.16.14.14:2377"},
  {"node_id":"ddlmhk2i6eyi1s9d5dt32i2da","addr":"172.16.14.154:2377"},
  {"node_id":"lxj15zgc75os1bjpcwd8who6m","addr":"172.16.14.247:2377"}
]

We should remove from this file two old nodes and now it will look like:

[{"node_id":"9e0rtzevshatwop54579tjnwx","addr":"172.16.14.14:2377"}]

And another file is /var/lib/docker/swarm/docker-state.json:

{
  "LocalAddr":"",
  "RemoteAddr":"172.16.14.154:2377",
  "ListenAddr":"0.0.0.0:2377",
  "AdvertiseAddr":"",
  "DataPathAddr":"",
  "DefaultAddressPool":null,
  "SubnetSize":0,
  "DataPathPort":0,
  "JoinInProgress":false
}

It can be different, but our task is just remove a value from the RemoteAddr and write this node IP into the LocalAddr value, something like this:

{
  "LocalAddr":"172.16.14.14",
  "RemoteAddr":"",
  "ListenAddr":"0.0.0.0:2377",
  "AdvertiseAddr":"",
  "DataPathAddr":"",
  "DefaultAddressPool":null,
  "SubnetSize":0,
  "DataPathPort":0,
  "JoinInProgress":false
}

Now we can start Docker Daemon:

root@ip-172-16-14-14:~# systemctl start docker
root@ip-172-16-14-14:~# docker info
...
 Swarm: pending
  Error: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
  Is Manager: true
  Node Address: 172.16.14.14
  Manager Addresses:
   172.16.14.14:2377

We can see, that now this node doesn't know anything about old nodes, but it still wants to see other managers to operate properly. We can solve this with the next command:

root@ip-172-16-14-14:~# docker swarm init --force-new-cluster

Now we have two clusters with the same state and both of them can be operated independently. Just don't forget to remove old nodes and unused services from both of them.

Comments