Interesting.
I don't think I've ever seen this anomaly before.
It's often convenient to think of a SAR ADC as if it samples the input analog voltage at some instant in time.
In practice, there is a narrow window of time where changes in the input analog voltage --
or noise on the analog voltage reference, or noise on the GND or other power pins of the ADC --
can affect the output digital value.
If the input voltage is slowly rising during that window, then the less-significant bits of the SAR output will be all-ones.
If the input voltage is slowly falling during that window, then the less-significant bits of the SAR output will be all-zeros.
A very narrow noise pulse at the "wrong" time during conversion can have a similar effect.
Right now my best guess is that you're using some sort of analog switches or op amps that don't work quite as well (higher resistance or something) near the high and low power rails as they do near mid-scale, somehow letting in one of the above kinds of noise, which causes the less-significant bits to be all-ones or all-zeros.
I've seen some sigma-delta ADCs and sigma-delta DACs that have good resolution at mid-scale, but worse resolution near the rails -- but the effect looks different than what you show.
The "plot of the difference between one sample and the next sample over the entire full scale range" is fascinating.
If I were you, I would make a similar plot that, instead making the X value the difference between one sample and the next, make the X value the least-significant 6 bits of the raw ADC output sample.
That would quickly show if the "stuck" values are mostly lots of 1s in the least-significant bits (maybe input is slowly rising?) or lots of 0s in the least-significant bits (maybe input is slowly falling?).
I am sampling "pulsed" DC voltages. That means that for each
measurement I put a voltage on the DAC, let it settle for at least 100
times it's settle time, then tell the ADC to convert - and when
conversion is finished, I put the DAC back to 0 V.
My understanding is that when ADC manufacturers say "no missing codes",
the test they use involves several capacitors adding up to a huge capacitance directly connected to the ADC input,
and some system driving a large resistor connected to that capacitance that very slowly charged or discharged that capacitor,
slowly enough that the ADC is expected to see exactly "the same" voltage (within 1/2 LSB) for several conversion cycles before it sees "the next" voltage (incremented by 1 going up, decremented by 1 going down).
If I were you, I would see if such a "continuous slope" test gives the same weird "stuck code" symptoms as the "pulsed test".
Perhaps that would give more clues as to exactly what component(s) are causing this problem.
Please tell us if you ever figure out what caused these symptoms.
Best Answer
You have to assume certain things just work, even in a world with error checking. Why pick on IIC or SPI when there are usually many more digital signals on a board? You seem to be OK with assuming those will all be interpreted as intended.
A properly designed circuit on a properly designed board should be reliable. Think of a CMOS output driving a CMOS input across a board. Other than outright component failure (which is a whole different problem from occasional data corruption), think about what can actually go wrong. At the driving end, you've got a FET with some maximum guaranteed on resistance connecting a line to either Vdd or ground. What exactly to you imagine can cause that not have the right level at the receiving end?
Initially the state can be undetermined as whatever capacitance on the line is charged or discharged. Then there can be ringing in the short trace. However, we can calculate maximum worst case times for all this to settle and the line to be reliably across some threshold at the other end.
Once this time has been reached and we've waited for whatever the worst case propagation delay of the logic is, there is little to change the signal. You may be thinking noise from other parts of the board can couple onto the signal. Yes, that can happen, but we can also design for that. The amount of noise in another part of the board is generally known. If not, then it's coming from elsewhere and in proper design it would be clamped to be limited to some maximum dV/dt and other characteristics. These things can all be designed for.
External noise can in theory upset traces on a board, but the field strength would need to be unreasonably large for a properly designed board. High noise environments do exist, but are limited to known locations. A board may not work 10 meters from a 10 kW transmitter, but even that can be designed for.
So the answer is basically that digital signals on the same board, if designed properly, can be considered absolutely reliable for most ordinary uses. In special cases where the cost of failure is very high, like space and some military applications, other strategies are used. These usually include redundant subsystems. You still consider individual signals on a board reliable, but assume boards or subsystems as a whole may occasionally err. Note also that these systems cost much more, and such a cost burden would make most ordinary systems, like personal computers for example, useless by being too expensive.
That all said, there are cases where even in ordinary consumer electronics error detection and correction is employed. This is usually because the process itself has a certain error probability and because limits are being pushed. High speed main memory for computers often do include extra bits for error detection and/or correction. It's cheaper to get the performance and ultimate error rate by pushing limits and adding resources to error correction than to slow things down and use more silicon to make everything inherently more reliable.