Configure Consul cluster with ACL enabled

access-control-listclusterconsul

Hello everyone and thanks for reading.

I'm quite new to Consul. I've reading the documentation and practicing for a while, thus I've been able to properly configure consul in a few nodes.

Now I want to enable ACL's so I can manage the security of my Consul cluster, but I'm not able to get this working. I'm following this guide: https://learn.hashicorp.com/consul/security-networking/production-acls#create-the-agent-policy.

My scnario:

  • Node 1: the 'bootstrap' node. IP: 172.20.10.41.
  • Node 2: the 'slave' node. IP: 172.20.10.40

What I expect:

  • To have consul up and running and using ACL's to control which processes/nodes can connect to the cluster and read/write information.

I'm able to enable ACL's on one Consul Agent, running it with the following command:

consul agent -server -bootstrap -config-dir=/etc/consul/conf.d/agent.json -data-dir=/tmp/consul/ -ui -client=0.0.0.0

Here is my agent.json file:

{
  "primary_datacenter": "dc1",
  "acl" : {
    "enabled": true,
    "default_policy": "allow",
    "down_policy": "extend-cache"
  }
}

Once Consul is up and running, I run

# consul acl bootstrap

which gives me

AccessorID:   3c354e3c-2d1c-24b1-41ce-0645fdd6c3e7
SecretID:     1e026ae6-8902-eae2-6a18-6b0fb36bbed4
Description:  Bootstrap Token (Global Management)
Local:        false
Create Time:  2019-05-03 12:41:18.038389106 -0300 -03
Policies:
   00000000-0000-0000-0000-000000000001 - global-management

I create a policy and a token to allow all node things:

# consul acl policy create -name "Agent-write-policy" -description "Policy for generating agents write permissions" -rules @agent_write_policy.hcl -token "1e026ae6-8902-eae2-6a18-6b0fb36bbed4"

And

# consul acl token create -description "Agent write token" -policy-name "Agent-write-policy" -token "1e026ae6-8902-eae2-6a18-6b0fb36bbed4"

AccessorID:   7324d2d0-f82f-cea8-44d1-82c2d07cd35a
SecretID:     11dfcacf-7eae-a286-f108-990c1963fb29
Description:  Agent write token
Local:        false
Create Time:  2019-05-03 12:30:11.292590345 -0300 -03
Policies:
   0171cfc2-06f3-6702-9c46-df117eb1bd53 - Agent-write-policy

Then i go to my second server node, and start consul

# consul agent -server -data-dir=/tmp/consul -config-dir=/etc/consul/conf.d/agent.json

My agent.json file:

{
  "primary_datacenter": "dc1",
  "acl" : {
    "enabled": true,
    "default_policy": "allow",
    "down_policy": "extend-cache",
    "tokens": {
                "default": "11dfcacf-7eae-a286-f108-990c1963fb29"
        }
  }
}

With with my second node up, I run

# consul join 172.20.10.41

Error joining address '172.20.10.41': Unexpected response code: 403 (ACL not found)
Failed to join any nodes.

I also tried adding -token="" to the join command.

If i disable acl in Node 2, i'm able to join the cluster but node/service information won't sync.

2019/05/03 12:35:26 [WARN] agent: Node info update blocked by ACLs
2019/05/03 12:35:51 [WARN] agent: Coordinate update blocked by ACLs

What am I doing wrong?

Maybe there are a lot of things that i'm doing wrong. If any of you have a begginer-proof guide for me, i'll be very grateful.

Thanks for your time. (and sorry for my bad english)

Greetings.

Best Answer

it seems that both primary_datacenter and agent_recovery keys must be present in agent config in order to even join the cluster with acl on, this config was working to me:

primary_datacenter = "infra2-lab"
acl = {
  enabled = true
  default_policy = "deny"
  enable_token_persistence = true
  down_policy    = "extend-cache"
  tokens = {
    default = "same-token"
    master = "same-token"
    agent_recovery = "same-token"
  }
}

Where same-token was the global token I got when bootstraping ACLs, it suddenly started working when I added agent_recovery key. Nonetheless, I am also still just experimenting with Consul ACLs and find them just as confusing as you.

Related Topic