A delta-sigma modulator, used in both ADCs and DACs, comprises a difference (delta) circuit that measures the error between the input signal and the feedback signal, followed by an integrator (sigma), a quantizer (often just a comparator that yields one bit of information) and a time-domain sampler. The output of the sampler is fed back to the difference circuit through a suitable inverse quantizer (i.e., 1-bit DAC).
The trick to understanding noise shaping is to consider the quantizer as linear summing circuit that adds a "quantization noise" signal to the output of the integrator. You can then use superposition to separately evaluate the effect of the circuit on both the original signal and the quantization noise "signal".
For the original signal, the integrator is in the feed-forward path, and as you might expect, it acts as a low-pass filter for that signal — high frequencies have lower gain than low frequencies.
However, for the quantization noise signal, the integrator is in the feedback path, which means that overall, the circuit functions as a high-pass filter for the noise, reducing its gain in the low frequencies (where the desired signal is) and increasing it at the higher frequencies, where it will be subsequently removed by another filter.
It is this noise shaping that accounts for the reduction in noise density in the final passband.
Do digital Anti Aliasing filters exist for traditional ADCs?
Not in the sense you are discussing. There are other forms of "aliasing", but you seem to be considering only analog to digital conversion. If the signal is already digital (so that you can filter it digitally), then it's already been aliased. It's too late.
Where usually do people put these filters?. On the IC for the ADC itself?
I'm sure you can find ADCs with integrated filters, but it's not the norm. Different designs have different filtering requirements. You may need a linear phase filter, or you may not. You may need very good performance, or you may need very low cost. You might not even need filtering at all, if you know what frequencies your analog signal can contain.
Are physically big filters (e.g. through hole capacitors, inductors and resistors) the norm?
Not really, for reasons of cost. You need bigger components if you need to handle more power. High power is not usually something you need to drive an ADC. It may also be that a particular design requires a high capacitance or high inductance that's attainable only through large components, but a good engineer will avoid it if possible. Much better to use a 2 cent SMT capacitor, than a 20 cent through-hole electrolytic, wherever possible.
This is not what a matched filter does. If the pulse shape is \$h(n)\$ then the matched filter's impulse response is \$h(-n)\$ (plus a delay to make it causal). This means that the total phase response is linear (a pure delay). It can be shown that this choice maximizes the SNR at the sampling instant if the noise is white.