This is nothing to do with Dell specifically but to do with the optimal triple-channel memory configuration of those Xeon 55/56xx-series chips - and no, you need all three memory channels in play for what Dell call their 'optimised' mode, it's not a BIOS thing, it's a 'using all available memory bandwidth' thing.
Unless you need the extra memory go for 12GB (3 x 4GB) over 16GB (4 x 4GB), or stump up for 24GB (6 x 4GB or 3 x 8GB) if you have the moolah :)
RAM for servers comes with a few common metrics to specify it's capacity and ability to work in a particular configuration. To help confuse this there are different names for what is essentially the same thing, and the "standard" name changes depending on which type of RAM you're using.
Capacity (1GB, 4GB, 32GB, etc)
This is easy enough; everyone should already be familiar with the concept that RAM comes in different capacities. The particular type of RAM determines what the maximum size of a single stick can be, but that's irrelevant because actual implementations limit the amount of RAM a system can support (ie, check the documentation for your system to see what capacity it supports).
RAM's capacity can be organized in different configurations. Usually there's just one standard configuration for RAM of a certain size. If you're buying ultra-cheap RAM off the Internet be warned that it may be non-standard (especially if they mention the organization) and not supported by your server.
Speed (1600MHz, etc)
For the purposes of this Answer, you want the speed of the RAM to match the maximum speed of the system. RAM that is one or sometimes two "speeds" faster will work as well, though at the lesser speed. Similarly RAM that is one or two "speeds" slower will work, also at the lesser speed.
Integrity Protection (ECC or Non-ECC)
ECC is the most common form of integrity protection (ie, making sure cosmic rays didn't flip any bits and none of the memory locations are going bad). In most systems the RAM must either be ECC or non-ECC, whatever the system requires. Occasionally this is called 72-bit memory (a misnomer leftover from 64 memory data channels getting 8 bits of ECC along side the data bus).
When RAM has ECC, that protection information can be checked at a variety of times. The most basic protection reads and checks the ECC data only when the RAM at that memory location is read. More advanced options allow the system to check ECC regularly. Most frequently I've seen this called "memory scrubbing"; it works much like disk array scrubbing; and like disk array scrubbing you should have it enabled unless there's a good reason to disable it.
ECC is one of the steps reducing the impact of Row Hammer bug.
Bus Electrical Capacity (Unbuffered or Registered)
We're not electrical engineers, so all you really need to know is that Buffered or Registered RAM allows more RAM in a system than without. Like ECC this is something that must be supported by the system. Unlike ECC many new servers support both Unbuffered/Unregistered and Buffered/Registered RAM. Older servers tended to support only one or the other. Registers are a type of buffer, but the terms are used interchangeably when applied to RAM. I have never see a system that can mix Unbuffered and Registered at the same time.
When you see UDIMM, the "U" is for "Unbuffered". The "R" in RDIMM is "Registered".
Ranks
Registered RAM has well defined electrical "usage" characteristics metered in "ranks". Each RAM channel (or bus) in a system can support so many ranks at each speed it supports. Typically systems are rated at two speeds (ie, the channel runs at X speed normally with up to A ranks; but Y speed if over that; and only up to B ranks are possible).
There is RAM available with the same capacity and speed, but taking up different numbers of ranks. Typically the more capacity the more ranks a module takes up. Low voltage modules take up less ranks (per the module's specifications).
Foot Notes
There are a variety of configuration options unrelated to what physical RAM you need to buy for your server. These include mirroring the RAM (just like RAID1, but for RAM), sparing (literally spare RAM that if one goes bad the spare replaces it), timing and related optimizations.
Modern servers typically have the memory controller(s) integrated into the CPU instead of a separate North Bridge chip. This means systems that support multiple CPUs must have the CPU socket populated that corresponds to a memory slot in order to use that slot. Similarly some CPUs required there to be memory populated in their slots for the system to work. See the system's documentation for details.
Modern servers typically have more than one memory channel. These channels operate mostly independently, which will allow greater memory bandwidth in memory-intensive usage scenarios. Generally you should plan on distributing memory across all channels on all populated CPUs as evenly as is realistic to ensure the best performance.
Best Answer
It does make a difference, it will only make sense if you require the RAS (Reliability, Availability, and Service) features on x4 or x8 devices and understand the trade-offs for your needs. More details can be explained in the Dell white paper Dell™ PowerEdge™ Servers 2009 - Memory.
Also, configuration and layout with details specific to the R710 are available on the Technical Guidebook for the PowerEdge R710 - (Google this because I don't have reputation for link).
The important issue to note is the difference between ECC on the chip and the "Advanced ECC" provided by Dell's BIOS for Single Device Data Correction (SDDC). You will have a performance impact on both. The ECC will recover from errors during writes to the chip. However, SDDC goes a step further and will organize the bits so that an entire chip can fail and still be recoverable. See an example and details SDDC E7500 Chipset
The issues is whether your performance and/or reliability are of the utmost concern in your specific usage of the machine. If a chip failure will cause a loss of critical data or usage on this machine and it's non-redundant in the implementation, Advanced ECC may be a great way to go. However, you do so at a performance impact which may be more important to you.
I've implemented both in the field on Dell PowerEdge servers for single Microsoft SQL Server implementations. If I can be of more help, just comment to let me know.
Hope that helps.
EDIT: Coverage gap / ECC implementations
Yes, there is a coverage gap even if you implement both. Since, you are specifically using a cluster of high availability servers, IMHO you should use the Advanced ECC. Your performance impact is minimal compared to the benefits for the clustered devices. According to Crucial you have only a 2% decrease in performance on ECC memory in general.
The gap would be more specific to the types of errors that occur and how each handles the errors. In your specific situation it shouldn't translate to data loss. Since this is an Enterprise DBMS and errors, concurrency issues, etc. are managed at the software level in order to prevent data loss. A detailed history is kept of changes in a properly configured DBMS and the software that uses it can typically setup to have the transaction "rollback" any if a severe error occurs.
ECC Implementations
ECC will attempt to correct any bit errors in memory read/write. However, if the error is more significant, then not even ECC will be able to recover, causing potential loss of data. There is more discussion on ECC as well at ServerFault/What is ECC ram and why is it better?
According to Wikipedia on ECC_Memory
SDDC
If you refer to the E7500 chipset document above (note the 55xx/56xx from Intel require login/partnership but the idea is similar which is why I didn't link originally), which describes SDDC and how it's made possible. Basically, it uses a technique for organizing the words written to memory that ensures all are written in such a way that every word will only contain a single bit error i.e. the word should be recoverable from the single bit error (as above). Now that's per word, so it could potentially recover from up to 4-bit errors on x4 devices (1 per word) and up to 8-bit errors on x8 devices (still 1 per word) by error correcting each word.
Additional errors, more bit errors, total memory failure, channel failure, bus failure, etc. can still all cause horrible problems but that's why you have a cluster and an Enterprise DBMS.
In short, if you have everything enabled and there's too many bit errors for error correction algorithms to correct you will still have an error i.e. error coverage gap. These can be exceptionally rare though.