Electronic – Single Bit Error Correction & Double Bit Error Detection

error correctionparity

Can someone explain, in their own words, what Double Bit Error Detection is and how to derive it? An example of corrupted data and how to detect the double bit would be appreciated.

I can do Single Bit Error Correction using parity bits as well as correct the flipped bit. Now when I reach Double Bit Error Detection I understand there is an extra DED bit, which is somehow related to the even or odd parity of the bit sequence. However, I am lost.

What I read:
http://en.wikipedia.org/wiki/Error_detection_and_correction

Video on Hamming Code:
http://www.youtube.com/watch?v=JAMLuxdHH8o

Best Answer

A Hamming code is a particular kind of error-correcting code (ECC) that allows single-bit errors in code words to be corrected. Such codes are used in data transmission or data storage systems in which it is not feasible to use retry mechanisms to recover the data when errors are detected. This type of error recovery is also known as forward error correction (FEC).

Constructing a Hamming code to protect, say, a 4-bit data word

Hamming codes are relatively easy to construct because they're based on parity logic. Each check bit is a parity bit for a particular subset of the data bits, and they're arranged so that the pattern of parity errors directly indicates the position of the bit error.

It takes three check bits to protect four data bits (the reason for this will become apparent shortly), giving a total of 7 bits in the encoded word. If you number the bit positions of an 8-bit word in binary, you see that there is one position that has no "1"s in its column, three positions that have a single "1" each, and four positions that have two or more "1"s.

If the four data bits are called A, B, C and D, and our three check bits are X, Y and Z, we place them in the columns such that the check bits are in the columns with one "1" and the data bits are in the columns with more than one "1". The bit in position 0 is not used.

Bit position:  7  6  5  4  3  2  1  0
   in binary:  1  1  1  1  0  0  0  0
               1  1  0  0  1  1  0  0
               1  0  1  0  1  0  1  0
         Bit:  A  B  C  X  D  Y  Z  -

The check bit X is set or cleared so that all of the bits with a "1" in the top row — A, B C and X — have even parity. Similarly, the check bit Y is the parity bit for all of the bits with a "1" in the second row (A, B and D), and the check bit Z is the parity bit for all of the bits with a "1" in the third row (A, C and D).

Now all seven bits — the codeword — are transmitted (or stored), usually reordered so that the data bits appear in their original sequence: A B C D X Y Z. When they're received (or retrieved) later, the data bits are put through the same encoding process as before, producing three new check bits X', Y' and Z'. If the new check bits are XOR'd with the received check bits, an interesting thing occurs. If there's no error in the received bits, the result of the XOR is all zeros. But if there's a single bit error in any of the seven received bits, the result of the XOR is a nonzero three-bit number called the "syndrome" that directly indicates the position of the bit error as defined in the table above. If the bit in this position is flipped, then the original 7-bit codeword is perfectly reconstructed.

A couple of examples will illustrate this. Let's assume that the data bits are all zero, which also means that all of the check bits are zero as well. If bit "B" is set in the received word, then the recomputed check bits X'Y'Z' (and the syndrome) will be 110, which is the bit position for B. If bit "Y" is set in the received word, then the recomputed check bits will be "000", and the syndrome will be "010", which is the bit position for Y.

Hamming codes get more efficient with larger codewords. Basically, you need enough check bits to enumerate all of the data bits plus the check bits plus one. Therefore, four check bits can protect up to 11 data bits, five check bits can protect up to 26 data bits, and so on. Eventually you get to the point where if you have 8 bytes of data (64 bits) with a parity bit on each byte, you have enough parity bits to do ECC on the 64 bits of data instead.

Different (but equivalent) Hamming codes

Given a specific number N of check bits, there are 2N equivalent Hamming codes that can be constructed by arbitrarily choosing each check bit to have either "even" or "odd" parity within its group of data bits. As long as the encoder and the decoder use the same definitions for the check bits, all of the properties of the Hamming code are preserved.

Sometimes it's useful to define the check bits so that an encoded word of all-zeros or all-ones is always detected as an error.

What happens when multiple bits get flipped in a Hamming codeword

Multible bit errors in a Hamming code cause trouble. Two bit errors will always be detected as an error, but the wrong bit will get flipped by the correction logic, resulting in gibberish. If there are more than two bits in error, the received codeword may appear to be a valid one (but different from the original), which means that the error may or may not be detected.

In any case, the error-correcting logic can't tell the difference between single bit errors and multiple bit errors, and so the corrected output can't be relied on.

Extending a Hamming code to detect double-bit errors

Any single-error correcting Hamming code can be extended to reliably detect double bit errors by adding one more parity bit over the entire encoded word. This type of code is called a SECDED (single-error correcting, double-error detecting) code. It can always distinguish a double bit error from a single bit error, and it detects more types of multiple bit errors than a bare Hamming code does.

It works like this: All valid code words are (a minimum of) Hamming distance 3 apart. The "Hamming distance" between two words is defined as the number of bits in corresponding positions that are different. Any single-bit error is distance one from a valid word, and the correction algorithm converts the received word to the nearest valid one.

If a double error occurs, the parity of the word is not affected, but the correction algorithm still corrects the received word, which is distance two from the original valid word, but distance one from some other valid (but wrong) word. It does this by flipping one bit, which may or may not be one of the erroneous bits. Now the word has either one or three bits flipped, and the original double error is now detected by the parity checker.

Note that this works even when the parity bit itself is involved in a single-bit or double-bit error. It isn't hard to work out all the combinations.