Electronic – Calculating number of errors that is undetected by a CRC using a particular polynomial

Channel codingcrc

My goal is to determine optimal generator polynomial and size of the Frame Check Sequence (CRC field) for protocol that I am developing. Reading the paper, "Cyclic Redundacy Code (CRC) Polynomial Selection For Embedded Networks" by Koopman and Chakravarty, provided me with a solid understanding of the topic and proved to be a valuable resource.

Especially interesting for me is Table 1. (provided below), "Example Hamming weights for data word size 48 bits". My question is how to come up with this table on my own? That is, how to calculate number of undetectable errors, provided generator polynomial and data word size? I am not sure if this number is also dependent on bit error ratio (BER).

"Example Hamming weights for data word size 48 bits" from "Cyclic Redundacy Code (CRC) Polynomial Selection For Embedded Networks" by Koopman and Chakravarty

Namely, I want to write script to calculate these, as data provided by authors of the paper do not cover polynomials and data lengths that I have at hand.

Any reference to the material where I could get necessary knowledge to come up with an algorithm is welcome (preferably without entering into field theory).

Best Answer

I would regard papers like this with great scepticism.

First of all, the polynomials listed in that paper are apparently the "reversed reciprocal" of the actual standardized polynomials. They don't explain why - and this is really weird. The polynomial for CRC-16-CCITT is for example 0x1021, or if you will x^16 + x^12 + x^5 + 1. Apparently Wikipedia makes some effort into trying to make sense of this particular paper.

The polynomials were designed to remind as little of the most probable error pattern of the intended application. While the whole idea of Hamming Distance as some sort of ideal in data communication, is often taken out of the blue. To quote the paper:

For embedded networks, the property of interest is usually the Hamming Distance (HD), which is the minimum possible number of bit inversions that must be injected into a message to create an error that is undetectable by that message's CRC-based Frame Check Sequence.

This is already making a whole lot of weird assumptions, such as that generic bit inversions are the most probable errors. They base the whole paper on this assumption.

But in a whole lot of cases, this only holds true when a low voltage bit gets turned into a high voltage bit by EMI - the other way around is far less likely. So if you come up with some theoretical test where you flip random bits in a generic data package, then ponder about Hamming Distance from there, you have already gone wrong and left scientific methods behind.

In order to make any sense of such assumptions, we must define the communication media. For example they mention the polynomial of CAN, which is a differential signal. When an error occurs from EMI, the chance of both lines flipping is non-existent and most of the error detection lies on the hardware level, not in the CRC. In fact CAN has such good error detection on the hardware level - sampling every bit carefully, that the CRC is mostly there for show.

And of course the probable error patterns on a CAN bus will be very different from error patterns in other applications. Take for example flash memory, where the main error-causing culprit is data retention over time, or possibly exceeding the number of write cycles - not EMI.

Therefore, to boldly claim that a polynomial with a particular "Hamming weight" is more/less suitable, without taking any specific application in account, is unscientific hogwash.