The difference between the filters you name is not that each new one invented made a closer approximation to the ideal filter, but that each one optimizes the filter for a different characteristic. Because there's a trade-off between different characteristics, each one chooses a different way to make this trade-off.
Like Andy said, the Butterworth filter has maximal flatness in the passband. And the Chebychev filter has the fastest roll-off between the passband and stop-band, at the cost of ripple in the passband.
The Elliptic filter (Cauer filter) parameterizes the balance between pass-band and stop-band ripple, with the fastest possible roll-off given the chosen ripple characteristics.
Now if I was to take my 5th order structure and was able to simulate for every possible inductor value and capacitor value would I find a combination that would give me the best possible / closest model to ideal, that beats all previously known filter types?
It depends what you mean by "best possible" or "closest model". If you mean the one with the flattest response in the pass-band, you'd end up with the Butterworth filter. If you mean the best possible roll-off given a fixed ripple in the pass-band, you'd end up with the Chebychev design, etc.
If you chose some other criterion to optimize (like mean-square error between the filter characteristic and the boxcar ideal, for example), you could end up with a different design.
Do mathematicians / engineers know of a "best" filter response that is physically possible for a given order but so far do not know how to create it.
The filters you named (Butterworth, Chebychev, Cauer) are the best, for the different definitions of "best" that define those filters.
If you had some other definition of "best" in mind, you could certainly design a filter to optimize that, with existing technology. Andy's answer names a couple of other criteria and the filters that optimize them, for example.
Let me add one other question you might ask as a follow up,
Why don't we in practice design filters to optimize the mean-square error between the filter characteristic and the boxcar ideal?
Probably because the mean-square error doesn't capture well the design-impact of
"errors" in the pass-band and stop-band response. Because the ideal response has 0 magnitude in the stop-band it's hard to define a "relative response" measurement that has equal weight in both regions.
For example, in some designs an error of -40 dB (.01 V/V) relative to the ideal 0 V/V response in the stop-band would be much worse than an error of 0.01 V/V in the passband.
Do digital Anti Aliasing filters exist for traditional ADCs?
Not in the sense you are discussing. There are other forms of "aliasing", but you seem to be considering only analog to digital conversion. If the signal is already digital (so that you can filter it digitally), then it's already been aliased. It's too late.
Where usually do people put these filters?. On the IC for the ADC itself?
I'm sure you can find ADCs with integrated filters, but it's not the norm. Different designs have different filtering requirements. You may need a linear phase filter, or you may not. You may need very good performance, or you may need very low cost. You might not even need filtering at all, if you know what frequencies your analog signal can contain.
Are physically big filters (e.g. through hole capacitors, inductors and resistors) the norm?
Not really, for reasons of cost. You need bigger components if you need to handle more power. High power is not usually something you need to drive an ADC. It may also be that a particular design requires a high capacitance or high inductance that's attainable only through large components, but a good engineer will avoid it if possible. Much better to use a 2 cent SMT capacitor, than a 20 cent through-hole electrolytic, wherever possible.
Best Answer
Since no one reading this was sure of the answer, I contacted the manufacturer of the drivers and asked for their assistance. The best solution here is a separate filter for each driver. Even better if it's a pi filter, as Matt Young proposed. The values don't change for any number of drivers. It is possible to design a common filter, but harder to do so.
Thanks for everyone who contributed!