If we go back to basic theory, we have a carrier signal of the form :-
\$E_c\cos\phi_c\$
... and a sinusoidal modulation signal of the form ...
\$E_m\cos(\omega_mt)\$
and if we let the frequency deviation be proportional to the modulation amplitude, so
\$\Delta\omega\propto E_m\$
the instantaneous frequency is given by ->
\$\dot{\phi_c}=\omega_c+\Delta\omega.\cos(\omega_mt)\$
Integrating this to get the instantaneous phase ->
\$\phi_c=\omega_ct+\dfrac{\Delta\omega}{\omega_m}\sin(\omega_mt)\$
So the modulated output is ->
\$E_c\cos\Big[\omega_ct+\dfrac{\Delta\omega}{\omega_m}\sin(\omega_mt)\Big]\$
As you say, the modulation index is dependent upon \$\omega_m\$ so the relative amplitudes of the spectral components will vary with \$\omega_m\$, but the modulation index is also a measure of the peak phase deviation, so if you want the spectral amplitudes to be independent of \$\omega_m\$ you must have \$\omega_m\propto \Delta_\omega \propto E_m\$, ie phase modulation.
One technique of producing phase modulation is to use a frequency modulator with pre-emphasis of the modulating signal to get the amplitude proportional to the frequency.
So I've done some research since I have a similar problem and everything leads me to fully integrated quadrature demodulators like LT5517 with a good NCO if you need AFC. All digital systems with direct sampling might have even better noise performance especially if you use oversampling (source) but they have a detection delay, so your application should tolerate this delay if you want to use this method. Search for FPGA or DSP FM demodulators, there are plenty of articles.
The best solution for data transmission I've found so far are specialized transceivers with interference resistance like AD9364, but those come in 144-LFBGA and cost $210 per chip. See this article for more info.
If you need a simple and decent demodulator, conventional quadrature demodulator has the best noise performance if you use high quality parts for it.
Best Answer
The formula is derived from practical experiences and not from mathematical 1st principles. It is unprovable other than by being practical and thinking what a diode detector has to achieve.
Firstly, the formula states that RC has to be equal to or greater than \$\dfrac{1}{\omega_c}\$.
If the RC time constant were too short there would be significant levels (ripple) of the carrier frequency on the output - this is not what is wanted from a diode detector (or an AC rectifier in a power supply) BUT, it's never going to be a perfect brick wall filter and so carrier ripple has to be acceptable (to some degree).
Personally, I would like to see the RC time constant 5 times greater than \$\dfrac{1}{\omega_c}\$
At the other end of the scale, RC cannot be too big or it will start to significantly attenuate high frequencies in the "detected" analogue waveform that is represented by \$\dfrac{1}{\omega_m}\$.
Here is a picture that hopefully explains: -
This picture was taken from here and basically is saying, if the modulation index is too high for the value of RC chosen there will come a point in the detection of the signal that the RC time constant is too long.
You should also note that as the modulation index approaches 1, the RC time constant has to theoretically be very small and this will make it likely clash with the requirement for it to be significantly greater than \$\dfrac{1}{\omega_c}\$.