How to completely remove a node from a Consul cluster

clusterconsul

This Consul Server node, in another DC, keeps joining some time after I remove it.

The goal:

A cluster of 5 Consul Servers in DC alpha0, whose KV Store a alpha0 Vault cluster uses as a Storage backend:

  • alpha0consulserver1.alpha0
  • alpha0consulserver2.alpha0
  • alpha0consulserver3.alpha0
  • alpha0consulserver4.alpha0
  • alpha0consulserver5.alpha0

A cluster of 5 Consul Servers in DC prd0, whose KV Store a prd0 Vault cluster uses as a Storage backend:

  • prd0consulserver1.prd0
  • prd0consulserver2.prd0
  • prd0consulserver3.prd0
  • prd0consulserver4.prd0
  • prd0consulserver5.prd0

WAN connection is OK. But I am concerned that if they sync their KV stores, this may affect the two separate HashiCorp Vault clusters that each use them as a Back-end.

The problem:

A poorly tested Puppet script I wrote has resulted in one Consul node prd0consulserver5, connecting to another in a different DC, alpha0consulserver1.

I have completely purged and re-installed Consul for prd0consulserver5, but alpha0consulserver1 keeps connecting to it.

Here is an example of one of the configuration files, specifically, the one for alpha0consulserver1.alpha0:

nathan-basanese-zsh8 % sudo cat /etc/consul/config.json
{
    "bind_addr": "192.176.100.1",
    "client_addr": "0.0.0.0",
    "data_dir": "/opt/consul",
    "domain": "consul.basanese.com",
    "bootstrap_expect": 5,
    "enable_syslog": true,
    "log_level": "DEBUG",
    "datacenter": "bts0",
    "node_name": "alpha0consulserver1",
    "ports": {
        "http": 8500,
        "https": 8501
    },
    "recursors": ["192.176.176.240", "192.176.176.241"],
    "server": true,
    "retry_join": ["192.176.100.3", "192.176.100.2", "192.176.100.1"]
}

Here are some relevant logs from prd0consulserver5, but I can post more upon request:

2017/05/26 23:38:00 [DEBUG] memberlist: Stream connection from=192.176.100.1:47239
2017/05/26 23:38:00 [INFO] serf: EventMemberJoin: alpha0consulserver2.alpha0 192.176.100.2
2017/05/26 23:38:00 [INFO] serf: EventMemberJoin: alpha0consulserver1.alpha0 10.240.112.3
2017/05/26 23:38:00 [INFO] consul: Handled member-join event for server "alpha0consulserver2.bts0" in area "wan"
2017/05/26 23:38:00 [INFO] serf: EventMemberJoin: alpha0consulserver3.alpha0 192.176.100.3
2017/05/26 23:38:00 [INFO] consul: Handled member-join event for server "alpha0consulserver1.bts0" in area "wan"
2017/05/26 23:38:00 [INFO] consul: Handled member-join event for server "alpha0consulserver3.bts0" in area "wan"

Eventually, I get to this:

2017/05/26 23:39:02 [DEBUG] memberlist: Initiating push/pull sync with: 192.176.100.2

I shut down the node, as I don't want the keys I write to the KV store on alpha0 nodes to appear on prd0 nodes.

What I have tried so far:

I've tried the following:

https://www.consul.io/api/agent.html#graceful-leave-and-shutdown

I didn't try force-leave since it doesn't work on nodes outside of the configured DC.

I've also tried deregistering ALL prod0 hosts from the alpha0 hosts.

https://www.consul.io/api/catalog.html#deregister-entity

I'm at my wit's end, here, and can't seem to find a way

I've searched it on search engines, using this query and many similar queries: https://duckduckgo.com/?q=totally+deregister+consul+node&t=hc&ia=software

The following two results seemed to have a slightly similar problem, but nothing as simple as keeping a cluster of 5 Consul servers separate from another cluster of 5 Consul Servers.

https://github.com/hashicorp/consul/issues/1188
https://groups.google.com/forum/#!msg/consul-tool/bvJeP1c3Ujs/EvSZoYiZFgAJ

I think this may be handled by the "join_wan": configuration setting, but it doesn't seem to have a way to explicitly turn it off. Plus, that seems like a hack-ey way to fix this problem.

I've also considered IPTables.

Anyway, I feel like there's something missing. I've started digging in to the Raft protocol, but I feel like maybe I've started going off on a tangent in my search. Any guidance appreciated, be it a comment or an answer.

More precisely, how do I keep the prd0 Consul Server Nodes having their own separate KV store and Consul Leader from the alpha0 Consul Server Nodes?

Best Answer

only after trying standard removal processes here:

$ consul leave
$ consul force-leave <node>

should you move onto the below, which will completely remove the node from the cluster:

$ consul operator raft remove-peer
Related Topic