Tomcat 6 session replication does not work with HAProxy

haproxytomcat6

I have HAProxy loadbalancer and two Tomcat backend servers. HAProxy is configured with cookie based persistence, Tomcat is configured with SimpleTcpCluster according to documentation. Multicast between both Tomcat backend servers is enabled. However session replication does not work. Every time when I shut down server which holds session, users are logged out. In catalina.out I see that servers are communicating with each other, for example when I take one backend down:

May 8, 2014 11:00:25 AM
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
performBasicCheck INFO: Suspect member, confirmed
dead.[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 2,
1, 69}:5000,{10, 2, 1, 69},5000, alive=931801,id={-18 123 59 -88 -95
20 78 -34 -83 31 -43 73 -64 -71 42 -62 }, payload={}, command={},
domain={}, ]]

Also, when I take backend up:

WARNING: Manager [webservice#],
requesting session state from
org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 2, 1,
69}:5000,{10, 2, 1, 69},5000, alive=672675,id={-18 123 59 -88 -95 20
78 -34 -83 31 -43 73 -64 -71 42 -62 }, payload={}, command={},
domain={}, ]. This operation will timeout if no session state has been
received within 60 seconds. May 8, 2014 10:54:21 AM
org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor
report INFO: ThroughputInterceptor Report
Tx Msg:1 messages
Sent:0.00 MB (total)
Sent:0.00 MB (application)
Time:0.01 seconds
Tx Speed:0.04 MB/sec (total)
TxSpeed:0.04 MB/sec (application)
Error Msg:0
Rx Msg:0 messages
Rx Speed:0.00 MB/sec (since 1st msg)
Received:0.00 MB]

May 8, 2014 10:54:21 AM org.apache.catalina.ha.session.DeltaManager
waitForSendAllSessions INFO: Manager [webservice#]; session
state send at 5/8/14 10:54 AM received in 111 ms.

So, clustering and multicast is working.

Here is HAProxy backend config:

backend BE-tomcat_http
mode            http
cookie SERVERID insert indirect nocache
balance         leastconn
timeout connect     30000
timeout server      30000
retries         3
option          httpchk OPTIONS /
option          redispatch
option          http-server-close
option          http-pretend-keepalive
server          node01 10.2.1.69:80 cookie node01 check inter 1000
server          node02 10.2.1.90:80 cookie node02 check inter 1000

Here is Tomcat server.xml

    <Engine name="Catalina" defaultHost="localhost" jvmRoute="node01">
<Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase"/>


  <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="8">
                  <Manager className="org.apache.catalina.ha.session.DeltaManager"
                           expireSessionsOnShutdown="false"
                           notifyListenersOnReplication="true"
                           mapSendOptions="8"/>
                  <Channel className="org.apache.catalina.tribes.group.GroupChannel">
                  <Membership className="org.apache.catalina.tribes.membership.McastService"
                              address="228.0.0.4"
                              port="45564"
                              frequency="500"
                              dropTime="3000"/>
                  <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                            address="auto"
                            port="5000"
                            selectorTimeout="500"
                            minThreads="2"
                            maxThreads="6"/>
                  <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
                  <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
                  </Sender>
                  <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
                  <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
                  <Interceptor className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor"/>
                  </Channel>
                  <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
                                             filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/>
                   <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>
                   <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
    <!--           <Deployer className="org.apache.catalina.ha.deploy.FarmWarDeployer"
                             tempDir="/tmp/war-temp/"
                             deployDir="/tmp/war-deploy/"
                             watchDir="/tmp/war-listen/"
                             watchEnabled="false"/> -->
     </Cluster>

I see that cookie persistnce works, because when users are logged in, they are sticked with one backend server as long as session is valid. However when I shut down server which holds session, users are kicked out although I see in log file that other server noticed that.

Also web.xml has the distributable element set.

Any ideas?

Thanks

Best Answer

I can not see an issue with the config you provided. A few suggestions for you.

  1. You can confirm the sessions replicate to each node in the cluster by going into the manager (http://node01:80/manager/html) and view each session in the other node's manager.

    I suspect you are not replicating because a node loss shouldn't kill the session.

  2. Check you firewall rules for port: 5000, and for the multicast address: 228.0.0.4

    We ran into most of our issues on firewall config!