Why can’t I create this gluster volume

glusterfsubuntu-12.04

I'm setting up my first Gluster 3.4 install and all is good up until I want to create a distributed replicated volume.

I have 4 servers 192.168.0.11, 192.168.0.12, 192.168.0.13 & 192.168.0.14.

From 192.168.0.11 I ran:

gluster peer probe 192.168.0.12
gluster peer probe 192.168.0.13
gluster peer probe 192.168.0.14

On each server I have a mounted storage volume at /export/brick1

I then ran on 192.168.0.11

gluster volume create gv0 replica2 192.168.0.11:/export/brick1
192.168.0.12:/export/brick1 192.168.0.13:/export/brick1 192.168.0.14:/export/brick1

But I get the error:

volume create: gv0: failed: Host 192.168.0.11 is not in 'Peer in Cluster' state

Sure enough if you run
gluster peer status
it shows 3 peers with the other connected hosts.
i.e.
Number of Peers: 3

Hostname: 192.168.0.12
Port: 24007
Uuid: bcea6044-f841-4465-88e4-f76a0c8d5198
State: Peer in Cluster (Connected)

Hostname: 192.168.0.13
Port: 24007
Uuid: 3b5c188e-9be8-4d0f-a7bd-b738a88f2199
State: Peer in Cluster (Connected)

Hostname: 192.168.0.14
Port: 24007
Uuid: f6f326eb-0181-4f99-8072-f27652dab064
State: Peer in Cluster (Connected)

But, from 192.168.0.12, the same command also shows 3 hosts and 192.168.0.11 is part of it. i.e.

Number of Peers: 3

Hostname: 192.168.0.11
Port: 24007
Uuid: 09a3bacb-558d-4257-8a85-ca8b56e219f2
State: Peer in Cluster (Connected)

Hostname: 192.168.0.13
Uuid: 3b5c188e-9be8-4d0f-a7bd-b738a88f2199
State: Peer in Cluster (Connected)

Hostname: 192.168.0.14
Uuid: f6f326eb-0181-4f99-8072-f27652dab064
State: Peer in Cluster (Connected)

So 192.168.0.11 is definitely part of the cluster.

The question is, why am I not able to create the volume on the first gluster server when running the gluster command. Is this normal behaviour or some sort of bug?

Best Answer

I was seeing an obscure error message about an unconnected socket with peer 127.0.0.1.

[2013-08-16 00:36:56.765755] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1022)

It turns out the problem I was having was due to NAT. I was trying to create gluster servers that were behind a NAT device and use the public IP to resolve the names. This is just not going to work properly for the local machine.

What I had was something like the following on each node.

A hosts file containing

192.168.0.11  gluster1
192.168.0.12  gluster2
192.168.0.13  gluster3
192.168.0.14  gluster4

The fix was to remove the trusted peers first

sudo gluster peer detach gluster2
sudo gluster peer detach gluster3
sudo gluster peer detach gluster4

Then change the hosts file on each machine to be

# Gluster1
127.0.0.1     gluster1
192.168.0.12  gluster2
192.168.0.13  gluster3
192.168.0.14  gluster4


# Gluster2
192.168.0.11  gluster1
127.0.0.1     gluster2
192.168.0.13  gluster3
192.168.0.14  gluster4

etc

Then peer probe, and finally create the volume which was then successful.

I doubt that using IP addresses (the public ones) will work in this case. It should work if you use the private addresses behind your NAT. In my case, each server was behind a NAT in the AWS cloud.

Related Topic