The difference between the filters you name is not that each new one invented made a closer approximation to the ideal filter, but that each one optimizes the filter for a different characteristic. Because there's a trade-off between different characteristics, each one chooses a different way to make this trade-off.
Like Andy said, the Butterworth filter has maximal flatness in the passband. And the Chebychev filter has the fastest roll-off between the passband and stop-band, at the cost of ripple in the passband.
The Elliptic filter (Cauer filter) parameterizes the balance between pass-band and stop-band ripple, with the fastest possible roll-off given the chosen ripple characteristics.
Now if I was to take my 5th order structure and was able to simulate for every possible inductor value and capacitor value would I find a combination that would give me the best possible / closest model to ideal, that beats all previously known filter types?
It depends what you mean by "best possible" or "closest model". If you mean the one with the flattest response in the pass-band, you'd end up with the Butterworth filter. If you mean the best possible roll-off given a fixed ripple in the pass-band, you'd end up with the Chebychev design, etc.
If you chose some other criterion to optimize (like mean-square error between the filter characteristic and the boxcar ideal, for example), you could end up with a different design.
Do mathematicians / engineers know of a "best" filter response that is physically possible for a given order but so far do not know how to create it.
The filters you named (Butterworth, Chebychev, Cauer) are the best, for the different definitions of "best" that define those filters.
If you had some other definition of "best" in mind, you could certainly design a filter to optimize that, with existing technology. Andy's answer names a couple of other criteria and the filters that optimize them, for example.
Let me add one other question you might ask as a follow up,
Why don't we in practice design filters to optimize the mean-square error between the filter characteristic and the boxcar ideal?
Probably because the mean-square error doesn't capture well the design-impact of
"errors" in the pass-band and stop-band response. Because the ideal response has 0 magnitude in the stop-band it's hard to define a "relative response" measurement that has equal weight in both regions.
For example, in some designs an error of -40 dB (.01 V/V) relative to the ideal 0 V/V response in the stop-band would be much worse than an error of 0.01 V/V in the passband.
Do digital Anti Aliasing filters exist for traditional ADCs?
Not in the sense you are discussing. There are other forms of "aliasing", but you seem to be considering only analog to digital conversion. If the signal is already digital (so that you can filter it digitally), then it's already been aliased. It's too late.
Where usually do people put these filters?. On the IC for the ADC itself?
I'm sure you can find ADCs with integrated filters, but it's not the norm. Different designs have different filtering requirements. You may need a linear phase filter, or you may not. You may need very good performance, or you may need very low cost. You might not even need filtering at all, if you know what frequencies your analog signal can contain.
Are physically big filters (e.g. through hole capacitors, inductors and resistors) the norm?
Not really, for reasons of cost. You need bigger components if you need to handle more power. High power is not usually something you need to drive an ADC. It may also be that a particular design requires a high capacitance or high inductance that's attainable only through large components, but a good engineer will avoid it if possible. Much better to use a 2 cent SMT capacitor, than a 20 cent through-hole electrolytic, wherever possible.
No, not if the filter is linear. By more or less definition, the output signal from a linear filter does not contain any frequency that isn't present in the input signal.
If the input is a sinusoidal signal and the output is not a sinusoidal signal, even if by just a bit, the filter is not linear since, as Fourier analysis shows, a non-sinusoidal signal necessarily has multiple sinusoidal components of different, related frequencies.
Thus, to make the sinusoid triangular a bit requires adding frequency components that are not present in the input signal, i.e., adding harmonic distortion.
In summary, if the filter is linear, a sinusoidal input of (angular) frequency \$\omega\$ will result in a sinusoidal output of frequency \$\omega \$ with, at most, a modified amplitude and phase.
$$v_I(t) = \cos\omega t $$
$$v_O(t) = |H(\omega)|\cos[\omega t + \phi(\omega)] $$