Repcached didn’t work when replicate with 2 servers

memcached

I have compiled the memcached-repcached with –enable-replication option, and started it like below:

on the server 1:

# /usr/local/memcached/bin/memcached -v -d -u memcached -l 192.168.7.106 -p 11216 -c 2048 -m 512 -P /usr/local/memcached/var/run/test.pid -x 192.168.3.82 -X 11216
# replication: connect (peer=192.168.3.82:11216)
replication: marugoto copying
replication: close
replication: failed to initialize replication server socket

on the server 2:

# /usr/local/memcached/bin/memcached -v -d -u memcached -l 192.168.3.82 -p 11216 -c 2048 -m 512 -P /usr/local/memcached/var/run/test.pid -x 192.168.7.106 -X 11216
replication: connect (peer=192.168.7.106:11216)
replication: marugoto copying

But it seems that the server 2 only connect to memcached instance on the server 1, it didn't listen on port 11216:

# lsof -i :11216
COMMAND     PID      USER   FD   TYPE   DEVICE SIZE NODE NAME
memcached 12786 memcached    6u  IPv4 55213579       TCP 192.168.3.82:56176->192.168.7.106:11216 (ESTABLISHED)

therefore I cannot telnet on server 2:

# telnet 192.168.3.82 11216
Trying 192.168.3.82...
telnet: connect to address 192.168.3.82: Connection refused
telnet: Unable to connect to remote host: Connection refused

But if I try to replicate 2 memcached instances on only one server with different port, it works fine:

$ telnet 192.168.7.106 11216
Trying 192.168.7.106...
Connected to 192.168.7.106.
Escape character is '^]'.
set foo 0 0 3
bar
STORED

$ telnet 192.168.7.106 11217
Trying 192.168.7.106...
Connected to 192.168.7.106.
Escape character is '^]'.
marugoto_end
rep foo 0 0 3 1
bar
get foo
VALUE foo 0 3
bar
END

Did I miss something?

Best Answer

looks like you mismatched the port settings:

-x 192.168.3.82 -X 11216

and

-x 192.168.7.106 -X 11216

use same ports for replication (which is ok as you use different machines)

But, why did you start these daemons with other ports:

-l 192.168.7.106 -p 11217

and

-l 192.168.3.82 -p 11216

I sugest changing 11217 to 11216 and that should start working right away. I guess you already started with default settings to see repcached basically working, right? My default settings worked fine, like: http://marcusspiegel.de/2010/05/02/howto-install-memcached-with-repcached-build-in-server-side-replication-on-debian-lenny

outputs from a working example:

See how I started memcached (truncated from pstree):

  |-memcached,2915 -m 64 -p 11211 -u root -P /var/run/memcachedrep.pid -d -x 192.168.18.11

and on the other node:

  |-memcached,2965 -m 64 -p 11211 -u root -P /var/run/memcachedrep.pid -d -x 192.168.18.10

Ports used:

tcp        0      0 192.168.18.11:54122     192.168.18.11:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.11:54133     192.168.18.11:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.11:54130     192.168.18.11:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.11:54125     192.168.18.11:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.11:52466     192.168.18.10:11211     TIME_WAIT   -               
tcp6       0      0 192.168.18.11:11212     192.168.18.10:37881     VERBUNDEN   2965/memcached

and the other node

tcp        0      0 192.168.18.10:57768     192.168.18.10:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.10:45406     192.168.18.11:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.10:45412     192.168.18.11:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.10:56134     192.168.18.11:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.10:40624     192.168.18.10:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.10:37881     192.168.18.11:11212     VERBUNDEN   2915/memcached  
tcp        0      0 192.168.18.10:57750     192.168.18.10:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.10:45428     192.168.18.11:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.10:45419     192.168.18.11:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.10:45410     192.168.18.11:11211     TIME_WAIT   -               
tcp        0      0 192.168.18.10:57766     192.168.18.10:11211     TIME_WAIT   -       

Keep in mind The port used for replication (11212) is different form that port used for serving the cache (11211)!

The replication port has to be the same on both machines and must be accessible from each others interfaces. The serving port is setup to be the same on both nodes as both nodes are connecting to the same ports as clients, too. That mimiks a setup like Master-Master-Replication in MySQL.

Related Topic