How far should we take the N+N redundancy craziness

hardwareredundancy

The industry standard when it comes from redundancy is quite high, to say the least. To illustrate my point, here is my current setup (I'm running a financial service).

Each server has a RAID array in case something goes wrong on one hard drive

…. and in case something goes wrong on the server, it's mirrored by another spare identical server

… and both server cannot go down at the same time, because I've got redundant power, and redundant network connectivity, etc

… and my hosting center itself has dual electricity connections to two different energy providers, and redundant network connectivity, and redundant toilets in case the two security guards (sorry, four) needs to use it at the same time

… and in case something goes wrong anyway (a nuclear nuke? can't think of anything else), I've got another identical hosting facility in another country with the exact same setup.

Cost of reputational damage if down = very high
Probability of a hardware failure with my setup : <<1%
Probability of a hardware failure with a less paranoiac setup : <<1% ASWELL
Probability of a software failure in our application code : >>1% (if your software is never down because of bugs, then I suggest you doublecheck your reporting/monitoring system is not down. Even SQLServer – which is arguably developed and tested by clever people with a strong methodology – is sometimes down)

In other words, I feel like I could host a cheap laptop in my mother's flat, and the human/software problems would still be my higher risk.

Of course, there are other things to take into consideration such as :

scalability
data security
the clients expectations that you meet the industry standard

But still, hosting two servers in two different data centers (without extra spare servers, nor doubled network equipment apart from the one provided by my hosting facility) would provide me with the scalability and the physical security I need.

I feel like we're reaching a point where redundancy is just a communcation tool. Honestly, what's the difference between a 99.999% uptime and a 99.9999% uptime when you know you'll be down 1% of the time because of software bugs ?

How far do you push your redundancy crazyness ?

Best Answer

When the cost of the redundancy is higher then the cost of being down while what ever is broken is being replaced, it's to much redundancy.

Related Solutions

Questions about single point of failure for small operations

This all boils down to risk management. Doing a proper cost/risk analysis of your IT systems will help you figure out where to spend the money and what risks you can or have to live with. There's a cost associated with everything...this includes HA and downtime.

I work at a small place so I understand this struggle and the IT geek in me wants no single points of failure anywhere but the cost of doing that at every level is not a realistic option. But here are a few things that I've been able to do without having a huge budget. This doesn't always mean removing the single point of failure though.

Network Edge: We have 2 internet connections a T1 and Comcast Business. Planning on moving our firewall over to a pair of old computers running pfSense using CARP for HA.

Network: Getting a couple of managed switches for the network core and using bonding to split the critical servers between the two switches prevents a switch failure from taking out the entire data closet.

Servers: All servers have RAID and redundant power supplies.

Backup Server: I have an older system that isn't as powerful as the main file server but it has a few large sata drives in raid5 which takes hourly snapshots of the main fileserver. I have scripts setup for this to switch roles to be the primary file server should it go down.

Offsite Backup Server: Similar to the onsite backup we do nightly backups to a server over a vpn tunnel to one of the owners house.

Virtual Machines: I have a pair of physical servers that run a number of services inside of virtual machines using Xen. These are running off a NFS share on the main file server and I can do live migration between the physical servers if the need arises.

Cisco – Redundant links from Router to Switches

You are going to have to setup dot1q trunking between the switches and the router (BTW did you mean 3825?) and then create a vlan interface on the router. You will not be able to have two router interfaces within the same IP address subnet otherwise.

You may need a switching module in the router for this to function as desired -- such as the NME-16ES-1G.

[edit / additional information]

You will not be able to have two router interfaces in the same subnet unless you either: (a) use a BVI interface as Vatine suggested (there are performance and other considerations using them however) or (b) put the two physical interfaces into a vlan (see example below).

!
interface FastEthernet0/3/0
 switchport access vlan 10
 switchport mode access
!
interface FastEthernet0/3/1
 switchport access vlan 10
 switchport mode access
!
interface FastEthernet0/3/2
 switchport mode access
 shutdown
!
interface FastEthernet0/3/3
 switchport mode access
 shutdown
!
interface Vlan10
 description Server_Vlan
 ip address 192.168.10.1 255.255.255.0
!

If you have two routers, then you could provide IP address/gateway redundancy for the servers by using HSRP, VRRP or GLBP.

[edit / additional information (HSRP example)]

interface Vlan10
 description Server_Vlan
 ip address 192.168.10.2 255.255.255.0
 standby ip 192.168.10.1
 standby priority 150
 standby preempt
!

For your second router, change Vlan10 to ip address 192.168.10.3 and a priority of 140. Use the command "show standby brief" on both routers to confirm HSRP operation.

Best Answer

Related Solutions

Questions about single point of failure for small operations

Cisco – Redundant links from Router to Switches

Related Topic