Basically, the circuit works by using the C2-R3 combination as a model (or analog) for the charging of C1. Instead of sensing the current in C1 directly, R3 senses the current in C2, and it is assumed that this value is proportional to the current in C1, which is true as long as the voltage across R3 is a small fraction of the total.
Whenever there is a drop across R3 because of current through C2, the drive to the pass transistor is reduced.
The Thevenin equivalent of the base drive to Q1 is a 1.5 V and 120 kΩ, so if the voltage across R3 ever rises as high as 1.5V - 0.6V = 0.9V, Q1 is cut off altogether, removing the drive from M1 as well. This would occur at a C2 current of \$\frac{0.9 V}{10 k\Omega} = 90 \mu A\$, which would correspond to a current in C1 of \$90 \mu A \cdot\frac{40000\mu F}{12\mu F} = 300 mA\$.
By adjusting the resistor and capacitor values, you can change that limiting current value. Note that the total current through M1 (and the power supply) is the C1 charging currrent plus the rising load current through R7, so pick the limit value accordingly.
C3 just serves to filter noise at the base of Q1. I would be inclined to make sure that the time constant associated with this (C3 times the Thevenin resistance calculated above) is less than 10% of the C2-R3 time constant. In other words, I would reduce C3 to 0.1 µF or less.
I am not sure why can we set C2 and R3 into relation to get the current through C1?
The basic equation of a capacitor says that the current through a capacitor is proportional to the rate of change of the voltage across it, and also to its capacitance:
$$i(t) = C \frac{dV(t)}{dt}$$
As I said, the assumption is that the voltage across R3 is "small", which means that V(t) is essentially the same for both capacitors. This means that the current through each is directly proportional to its capacitance.
In this specific case, the voltage across R3 can be as high as 0.9 V, which is 10% of the supply voltage, so the proportionality isn't as precise as it could be, but it's good enough for this purpose.
Regarding C3, it seems to be important to prevent an initial inrush peak due to Q1 switching "fully" on immediately. I have to choose a value above 1uF to prevent that.
Yes. I neglected to take into account the very high resistances connected to the gate of M1. These are needed in order to keep the maximum current through Q1 (which also flows through R3) to a small fraction of the C2 current, but they severely limit M1's bandwidth (its ability to switch quickly). In particular, it turns off very slowly because the gate charge must be dissipated through R1 (220 kΩ) alone.
Therefore, it makes sense that having the voltage reference (R4, R5, R6 and C3) start up slowly would help avoid that initial spike in current.
Regarding the choice of M1 — as long as it can handle the voltage and the current, and that you control the maximum VGS it sees by appropriate selection of R1 and R2, there's really nothing special required.
Pay attention to the SOA (safe operating area) diagram in the datasheet. You can plot some sample voltage and current values from the simulation in order to make sure it stays in the safe area.
Also, be sure give it an adequate way to dissipate the pulse of heat associated with the charging surge of C1 without having its temperature rise too high.
Best Answer
The problem with a transformer/rectifier is that it's a low impedance voltage output power supply. Connected to a discharged capacitor, a very high current could flow. You can take the sting out of this by using a series resistor. However, this is not a 'nice' solution as it (a) wastes power and (b) provides a very low charge rate as the capacitor gets to a similar voltage to the supply.
Charging a capacitor is really a job for a current source. At low power, a conventional linear regulator configured for output current control would do. At high power, you need some sort of inductive energy storage somewhere.
One solution is a boost converter, configured for current output control. Related to this is a flyback. In both cases, you store energy in an inductor, and it feeds this out to a high voltage at a controlled current. They differ in isolation, and the degree of step-up that they can achieve efficiently.
An alternative is the semi-resonant voltage doubler. It's strongly related to the boost converter, but is automatically operating.
simulate this circuit – Schematic created using CircuitLab
Assume C1 starts off uncharged. Close SW1. The supply voltage appears across the inductor, and current starts to build. If you want to describe the waveforms for the current and C1 voltage, then L1 and C1 are most easily understood as a resonant circuit. After one quarter cycle of the L1C1 resonance, C1 voltage is equal to the supply, and the current has stopped increasing. The next quarter cycle of the resonance sees L1 continuing to push current into C1, whose voltage rises to twice that of the supply, while L1 current falls back to zero.
As L1 current tries to go negative, it's blocked by D1. The charging cycle is now finished. No energy has been dissipated. The peak current is controlled and predictable. The charging time is predictable and takes exactly one half cycle (hence the name semi-resonant) of the L1C1 natural frequency.
If C1 starts off with some initial voltage, then it doesn't rise to double the input voltage, only increases by twice the difference of the supply and its initial voltage.