When building a phase locked loop, the purpose of the low pass filter is poorly understood.
When you wrap a phase feedback loop around a VCO, the integration of frequency to phase causes a 20dB per decade fall in open loop gain with frequency. It's this falling gain that gives you an unavoidable low pass behaviour for the phase locked loop as a whole. You can choose the bandwidth where the loop gain passes through unity by choosing the loop gain. At modulation frequencies below the loop bandwidth, the PLL output follows the input, at modulation frequencies above it rejects them.
Notice I've not yet mentioned any low pass filter. It's not needed for operation. It's not needed for stability. It is needed however, to get higher performance, better following of the input within the loop bandwidth, better rejection of the input and of the doubled PSD output outside the loop bandwidth.
The loop filter is best designed as something you add on, after you've designed a working stable loop of the right bandwidth. It must have unity gain at the loop bandwidth, or it will change the loop bandwidth. It must have low phase shift, much less than 90 degrees, in the few octaves around the loop bandwidth, or it will make the PLL unstable.
With that introduction, your PSD and filter...
If you want to demodulate FM modulation to 25kHz, then your PLL loop bandwidth must be significantly more than this, say 50kHz. Given your VCO and PSD gains, choose an amplifier gain in your loop to get unity gain at 50kHz.
If you want your resulting loop to be stable, then any phase shift from a low pass filter must be small at 50kHz. If you only use a single pole lowpass filter, then a 100kHz break frequency gives you about 30 degrees phase shift at 50kHz, even a 50kHz break would give you 45 degrees. Beware going too close to the loop bandwidth. All circuits have extra lowpass poles from finite opamp GBW, stray capacitances etc, which will increase the phase shift at high frequencies. You must still be well clear of 90 degrees when all these unwanted extras are added to your explicit lowpass filter.
If you use a type 2 loop, with an extra integrator at low frequency, then you should push the breaks further away, with a lowpass break at 160kHz, and a broken integrator break at 16kHz, a decade in frequency between them, geometrically centred on your loop bandwidth.
You can use higher than first order for the lowpass filter. The main criteria are that it should reject the doubled output from the PSD, while still having << 90 degrees phase shift at your loop bandwidth. It would be as well to reject the fundamental 1MHz at the PSD as well, real PSD's won't have infinite rejection of it. This means a higher order 500kHz would be reasonable. Do a design and use a simulator to check phase shift at the loop bandwidth.
You'll recognise that this isn't the way we're usually taught to design PLLs. You start with the loop filter in place, and place it so that the damping factor is right. However, my design method tells you more about the fundamentals. And if the phase shift from the loop filter is much less than 90 degrees, say in the 30-50 degree region, the damping factor will be reasonable. Nobody fields a PLL without testing it, and during the testing you have the opportunity to tweak the damping while observing its transient behaviour. Bear in mind that many RF VCOs have a volts-to-frequency gain that varies by 2:1, even 3:1, over their range, which changes the loop gain, and so bandwidth, and so damping. PLL design becomes a compromise between stability and transient behaviour over the range, going in from damping direction is not going to help you. Interestingly the type 2 loop, because the phase shift from the integrator flattens the phase shift curve from the low pass filter, results in a smaller change of damping factor with varying VCO gain than a type 1 loop. Now who would have spotted that from damping factor equations?
Best Answer
The key here is this statement "My understanding is that a PLL is used to demodulate in situations when the demodulator knows the carrier frequency but does not know the phase."
There is only one small insight that you are lacking:
Lets say we have an input waveform \$ \sin ( \omega t - \phi_1) \$ and the output from the VCO which is frequency locked but out of phase \$ \sin ( \omega t - \phi_2) \$ but in the second waveform I have a way of changing the phase (we'll ignore HOW for now) so the second waveform becomes \$ \sin ( \omega t - \phi_2 +At) \$ clearing up the second waveform expression gives us \$ \sin ( (\omega+A)t - \phi_2) \$. From this you can see that an instantaneous change in phase is actually a change in frequency. Conversely you can also express a the difference in frequencies of two waveforms as being two waveforms that are at the same frequency but one has a time varying phase.
Frequency and phase are really just two sides of the same coin. Clearly if the frequencies are far apart it doesn't make sense to talk about differing phase. Also once the frequencies are close together or even locked then it does not make sense to talk about differing frequency.
However, the modulator/LPF combination is a phase detector that behaves well (i.e. gives the right signal direction - error voltage) and allows the VCO to slew in frequency until it gets close. In short it can't know the frequency w/o knowing the phase.
A good phase detector will have a sigmoid shaped response curve. It will saturate high when the VCO frequency is way to low, it will saturate low when the VCO frequency is way too high and at some point when it is close in frequency and it makes sense to be talking about phase, then it should have a nice linear curve that is an odd function. You could view it that the curve is what changes the Modulator/LPF combo from a frequency to a phase detector.