There are two main options for controller redundancy in Cisco's current wireless offerings. You can either use Backup Controllers or High Availability; depending on the firmware level of your 5508's, your acceptable failover time, and your budget.
Based on your question, we're working with the following topology:
Traditionally, utilizing Backup Controllers was the main way to provide redundancy for a WLC failure. For Zone A, you could just select the Wireless LAN Controller at Zone E, and assign that as the Secondary Controller for each AP as desired. You can set the Primary and Secondary controllers for the AP on the controller via the GUI, the CLI, or even SNMP as pointed out by Mike Pennington's answer to this question. With Backup Controllers, in the case of a WLC failure AP's would begin to search for their Secondary Controller and re-establish their CAPWAP tunnel. The obvious downside to this, is the outage that occurs from the client prospective while the AP drops it's tunnel and begins to build it again to the Secondary Controller.
In response to the need for a somewhat better failover scenario, Cisco brought out High Availability in WLC firmware 7.3. In this scenario, you purchase a second WLC and license it specifically to serve as a standby. You place it adjacent to your existing WLC, and it shares an IP address and session/Config/AP information with the main controller. Now in the event of a WLC failure, the failover from the AP perspective is intended to be transparent. (As with all new features, and wireless in general in my experience, your milage will vary. However, in some of the newer code revisions, the failover is supposed to be getting nearly transparent to the AP and the client.)
Now with all of that said, which is better?
Backup Controllers is the cheaper way if your existing 5508's at Zone A and Zone E have enough available capacity to carry the load of either site, and your business can tolerate a few minutes of downtime in the event of a WLC failure. In that case, simply configure your secondary controllers on each Access Point, and off you go. Note that there is some more management overhead with using Backup Controllers, you will have to configure VLANs/Interfaces for all of your SSID's in each zone, make your AP Groups on each controller if you use them, etc. Now, when the WLC goes down at their site, users will experience some downtime as the AP's migrate, but at least they're not down hard.
High Availability becomes a reasonable solution here if you don't have capacity on your existing 5508's and/or your business can't tolerate the failover time in the Backup Controllers method. In other words, if you don't have capacity on your existing controller to use them as a backup for each other, and you're going to have to spend some money anyways, I would recommend looking into the High Availability solution and pricing.
Also, I wanted to clarify some of what you mentioned at the end of your question.
Mobility Groups can be used over a L3 connection, but Cisco's Mobility feature is for allowing seamless client roaming across AP's connected to different controllers, not to give you redundancy in the event of a WLC failure.
Flexconnect (formerly HREAP) is not entirely intended to help you with WLC failure, but with loss of network connectivity to a remote site. The end result is the same, the remote AP can't connect to it's controller, however it is important to think of the failure scenario it is created for. If you're only trying to solve the WLC failure scenario, you don't need Flexconnect at your remote sites, you can just use either Backup Controllers or High Availability. However if you ARE trying to solve the network isolation failure scenario, by all means, check out Flexconnect.
EDIT: In response to your two questions in the comments below:
1) You are correct, assuming there is complete L3 reach-ability between Zone A's subnets and Zone E's subnets. If, for example, you have configured Backup Controllers for the AP's in Zone A. Then, if an AP loses it's connection to the WLC in Zone A, it will look for the WLC in Zone E directly across the L3 connection. Upon reaching it, it will begin the CAPWAP join process.
2) You are correct that the process involves:
- Creating a trunk port to the WLC and connected switch
- Adding/Creating new VLAN(s) on the connected switch
- Adding the new VLAN(s) to the trunk port
- Adding an interface(s) for the new VLAN(s) to the WLC
- Associating those VLAN interfaces on the WLC to an SSID (either on the main WLAN or under an AP Group)
Now, if you've done that at Zone A and everything is working fine there, you can either do that process all again over on the WLC and switch in Zone E, or use your existing VLANs/Interfaces on Zone E.
Either way, the VLAN's/IP Addressing will be different between the two zones, but as long as the SSID's/AP Groups correspond between the two controllers, it doesn't matter to the WLC/AP/Client what VLAN the traffic uses. (It might matter to you, for managment/traffic separation/routing purposes, but the WLC doesn't care what VLAN it puts traffic out onto from a particular SSID.)
Edit 2: Attempting to clarify the VLAN assignment per WLC in each Zone per comments below:
Per your example in the comments, imagine that the WLC in ZoneA is serving two VLANs/SSIDs to your wireless clients:
- VLAN 10 - Office - 10.0.0.0/24
- VLAN 20 - Guest - 10.0.1.0/24
And the WLC in Zone E is also serving two VLANs/SSIDs:
- VLAN 11 - Office - 10.1.0.0/24
- VLAN 22 - Guest - 10.1.1.0/24
Now, also assume that you have configured the AP's in Zone A to use the WLC in Zone E as their secondary WLC, and vice versa for the AP's in Zone E.
Under normal circumstances, clients connecting to the Office SSID at Zone A will get addresses in the 10.0.0.0/24 subnet. However, if the WLC in Zone A fails, and the AP's re-register to the WLC in Zone E, clients will begin to get IP addresses from the 10.1.0.0/24 subnet.
Now, depending on your network configuration, this could be perfectly acceptable. It could possibly not impact your users at Zone A at all for them to now have an address from Zone E. As long as the normal resources they connect to in Zone A, or wherever on your network, are available to the 10.1.0.0/24 subnet over in Zone E. However, if you have anything configured on your network, (ACLs, Web Proxies, etc,) that require the clients at Zone A to have an address in the 10.0.0.0/24 subnet, then we have a problem.
At this point, you would have several options. You could look at allowing the 10.1.0.0/24 Zone E subnet to access those resources (but if there are restrictions in place, they are likely there for some infrastructure or security reason). Or you could extend some sort of L2 tunneling across your MPLS connection so that VLANs 10 and 20 from Zone A can also live in Zone E. This creates it's own problems and wherever possible I try to refrain from tunneling L2 connectivity all over creation.
My recommendation however, should you find yourself in the scenario where users from Zone A MUST have addresses the 10.0.0.0/24 subnet, would be to look into simply placing a second WLC in Zone A. It is, by far, the simplest solution to that particular problem.
Best Answer
There a few inconsistencies in your diagram / configurations. In the diagram it is AS 207, in the configuration it is AS 209 and in the final configuration it is AS 207.
If the provider AS is the same in both MPLS networks then:
I'll call your original network CUG-1 and your new one CUG-2.
There won't be any problems with advertising the routes to CUG-2 as it will be in a different VPN to CUG-1. If you don't do any filtering then the routes will be automatically advertised to CUG-1 via the core (provided you aren't separating them with VRFs on that device). You will need to rewrite the AS number before advertising it out to the spokes from your core and vice versa.
If you don't do this then:
Backup DC routes will be dropped by the CUG-1 PE router due to the BGP loop prevention mechanism i.e. routers in AS207 will see their own AS in the AS-Path ( e.g. AS 65002 -> AS 207 -> AS 65001 -> AS 207 DROP).
Spoke routes will be dropped by the CUG-2 PE router for the same reason (AS 65004 -> AS 207 -> AS 65001 -> AS 207 DROP).
If you use BGP AS-override then the AS-Path will look something like this on e.g. Backup DC to Spoke Site: AS 65001 -> AS 65001 -> AS 65001 -> AS 207 -> AS 65004.
Because you will be circumventing the BGP loop prevention mechanism, ensure that the Core DC is the only location that is peering with both CUGs.
Wrote this on mobile so let me know if it is a bit disjointed and I'll clean it up later.