Magento – Magento Enterprise 1.13.1 REDIS cluster with failover automation

cachemagento-enterpriseperformanceredis

One of the answers above mentions the use of:

Cache hosts (by parhamr)

There are two hosts running Redis in a master-slave configuration with automated failover. Three Redis instances are used to increase throughput and provide fine-tuning of persistence behaviors.

I can't seem to figure out a way to add more than just 1 redis instance using native magento enterprise 1.13.1 integration methods. How do you set this up in the config files to fail over from one redis instance to another or do you do read/write separation? I can't figure out a way unless I use dynamic dns entry or an additional load balancer?

Dynamic DNS won't do any good if one of the instances goes down, load balancer on the other hand will continuously monitor and utilize only "active" instances, but is there a solution on a code/config level that i can utilize out of the box?

Thank you.

Best Answer

There are two things going on, here: 1) division of Redis functionality across instances and 2) failover of Redis through Sentinel. My team uses load balancers specifically to support item 2.

Here’s how we did this for a production 1.12 cluster in mid-2013:

Multiple Redis Instances

Edit local.xml (Example config) to point <redis_session />, <cache />, and <full_page_cache /> at three different Redis instances. My team has chosen to run sessions on port 6382 (32 GB limit), backend cache on port 6383 (48 GB limit), and full page cache on port 6384 (12 GB limit).

This architecture is used to fine tune memory limits and RDB configurations for each cache type and we also can scale Redis to higher aggregate throughputs because Redis is single threaded. Provisioning this Ubuntu 12.04 LTS server required duplication of the /etc/redis/*.conf configuration files, /etc/init.d/redis* files, and calling update-rc.d $name defaults to ensure each Redis instance had its own log files, can independently be signaled, and that all instances are started on system boot.

Load Balancing

The production cluster has one primary server (cache01) and one failover server (cache02) with identical system specifications and Redis configurations (3 instances per server; see above).

My team had two highly available NetScaler appliances for which each Redis instance was defined as a service. We defined vservers for each instance on each host, e.g. production_cache_6382_primary. These vservers are using virtual interfaces for which private IP addresses have been allocated and the local.xml file points to. The request path is like this:

Magento application sends request through PHP Redis client
PHP Redis Client connects to IP address and port configured in local.xml
Request is routed through LACP bonded switches
The switches point to a HA pair of NetScalers
NetScaler primary points to production_cache_6382_primary
The request reaches the configured primary Redis instance

Redis Sentinel

My team has configured Redis Sentinel to use host cache02 as a secondary failover for host cache01. The NetScaler vserver production_cache_6382_primary is set to use the Redis service on port 6382 of host cache02 as its failover. Since the NetScaler TCP uptime check is initiated every 6 seconds, Sentinel has a generous window to promote the secondary service to the primary role. This environment allows Magento’s local.xml to have a static, hardcoded value for IP addresses of Redis instances and it allows our systems to automatically detect and switch which cache server is in service as primary. Here’s how this process works:

NetScaler performs a TCP monitoring operation against the Redis service on port 6382 of host cache01, finds it working
All client requests pointing to the production_cache_6382_primary vserver are sent to the above host
Redis instance 6382 as primary on host cache01 is stopped
Client connections to production_cache_6382_primary start experiencing connection and request failures
Redis Sentinel independently detects the outage of instance 6382 on host cache01 and promotes the secondary instance on cache02 as primary
Six seconds after step 1, the NetScaler performs a TCP monitoring operation against the Redis service on port 6382 of host cache01, finds it down
The NetScaler performs its failover routine and starts sending client requests to the 6382 service on host cache02 instead of cache01
Client connections and requests are still pointing at the same vserver IP address and start to experience successful reads and writes

Related Solutions

Magento – Redis on Magento Enterprise 1.13

Redis is supported in Magento 1.13 out of box - it is also a direct port of Colin's CE-compatible module.

The below is adapted from Colin's Github for Cm_Cache_Backend_Redis, edited for the class names in Enterprise 1.13.

This is how you would configure:

<!-- This is a child node of config/global -->
<cache>
  <backend>Mage_Cache_Backend_Redis</backend>
  <backend_options>
    <server>127.0.0.1</server> <!-- or absolute path to unix socket -->
    <port>6379</port>
    <persistent></persistent> <!-- Specify a unique string like "cache-db0" to enable persistent connections. -->
    <database>0</database>
    <password></password>
    <force_standalone>0</force_standalone>  <!-- 0 for phpredis, 1 for standalone PHP -->
    <connect_retries>1</connect_retries>    <!-- Reduces errors due to random connection failures -->
    <read_timeout>10</read_timeout>         <!-- Set read timeout duration -->
    <automatic_cleaning_factor>0</automatic_cleaning_factor> <!-- Disabled by default -->
    <compress_data>1</compress_data>  <!-- 0-9 for compression level, recommended: 0 or 1 -->
    <compress_tags>1</compress_tags>  <!-- 0-9 for compression level, recommended: 0 or 1 -->
    <compress_threshold>20480</compress_threshold>  <!-- Strings below this size will not be compressed -->
    <compression_lib>gzip</compression_lib> <!-- Supports gzip, lzf and snappy -->
  </backend_options>
</cache>

<!-- This is a child node of config/global for Magento Enterprise FPC -->
<full_page_cache>
  <backend>Mage_Cache_Backend_Redis</backend>
  <backend_options>
    <server>127.0.0.1</server> <!-- or absolute path to unix socket -->
    <port>6379</port>
    <persistent></persistent> <!-- Specify a unique string like "cache-db0" to enable persistent connections. -->
    <database>1</database> <!-- Separate database 1 to keep FPC separately -->
    <password></password>
    <force_standalone>0</force_standalone>  <!-- 0 for phpredis, 1 for standalone PHP -->
    <connect_retries>1</connect_retries>    <!-- Reduces errors due to random connection failures -->
    <lifetimelimit>57600</lifetimelimit>    <!-- 16 hours of lifetime for cache record -->
    <compress_data>0</compress_data>        <!-- DISABLE compression for EE FPC since it already uses compression -->
  </backend_options>
</full_page_cache>

An example of Redis session storage would be:

    <!-- example of redis session storage -->
    <session_save>db</session_save>
    <redis_session>                       <!-- All options seen here are the defaults -->
        <host>127.0.0.1</host>            <!-- Specify an absolute path if using a unix socket -->
        <port>6379</port>
        <password></password>             <!-- Specify if your Redis server requires authentication -->
        <timeout>2.5</timeout>            <!-- This is the Redis connection timeout, not the locking timeout -->
        <persistent></persistent>         <!-- Specify unique string to enable persistent connections. E.g.: sess-db0; bugs with phpredis and php-fpm are known: https://github.com/nicolasff/phpredis/issues/70 -->
        <db>0</db>                        <!-- Redis database number; protection from accidental loss is improved by using a unique DB number for sessions -->
        <compression_threshold>2048</compression_threshold>  <!-- Set to 0 to disable compression (recommended when suhosin.session.encrypt=on); known bug with strings over 64k: https://github.com/colinmollenhour/Cm_Cache_Backend_Redis/issues/18 -->
        <compression_lib>gzip</compression_lib>              <!-- gzip, lzf or snappy -->
        <log_level>1</log_level>               <!-- 0 (emergency: system is unusable), 4 (warning; additional information, recommended), 5 (notice: normal but significant condition), 6 (info: informational messages), 7 (debug: the most information for development/testing) -->
        <max_concurrency>6</max_concurrency>                 <!-- maximum number of processes that can wait for a lock on one session; for large production clusters, set this to at least 10% of the number of PHP processes -->
        <break_after_frontend>5</break_after_frontend>       <!-- seconds to wait for a session lock in the frontend; not as critical as admin -->
        <break_after_adminhtml>30</break_after_adminhtml>
        <bot_lifetime>7200</bot_lifetime>                    <!-- Bots get shorter session lifetimes. 0 to disable -->
    </redis_session>

Source: https://github.com/colinmollenhour/Cm_Cache_Backend_Redis

Source: http://www.magentocommerce.com/knowledge-base/entry/ee113-later-release-notes#ee113-11300-highlights

Magento – Problems Flushing Redis Cache with Separate Backend Server

In case anyone sees this. I've posted this info elsewhere, but the 109 and 403 prefixes come from an MD5 hash of the path to the path/etc folder. If your folder is in different places on different servers (or the same place on different servers that are sharing a cache) you can run into problems.

Cache hosts (by parhamr)