Ditch R1. Yes, 0.001\$\Omega\$ resistors exist, but what would you do with it? At 2A it will drop 2mV. The collector current is defined by the base current, there's no need to limit it this way (if that would have been the purpose).
You don't necessarily need a MOSFET to switch 2A, but if you use a BJT it will probably have to be a Darlington. OTOH a MOSFET is much faster than a BJT, so better suited for PWM work.
Then the flyback diode. It's wrong polarized, but you mention that in the question, so I won't say anything about that.
If the solenoid is a 24V/2A type its resistance will be 12\$\Omega\$, not 5.6.
I missed the line that says you'll be driving it from a microcontroller. The following assumes you drive it from 24V, like in the schematic. Later I'll make a note about the microcontroller.
Then R2. Assuming a solenoid of 12\$\Omega\$, and an \$H_{FE}\$ for your Darlington of 100, then from the base this will look as a 1200\$\Omega\$ resistance. You'll need 20mA of base current. With a voltage of 22V (24V minus a couple BE junctions) that would mean you should have maximum 1100\$\Omega\$ for R2 + \$H_{FE}\$ \$\times\$R3. So even without R2 you won't get the 2A. You'll need a transistor with a higher \$H_{FE}\$.
But even then R2 won't be necessary. With an \$H_{FE}\$ of 1000, if the base current would be higher than 2mA the transistor will saturate and the solenoid will limit the collector current to 2A.
Important notice on the common collector configuration you're using. Even if you would drive it from 24V the emitter voltage won't be 24V, but 22V. The base voltage will be 24V maximum, and if it would drive the emitter higher than 24V minus 2 BE junctions there wouldn't be no current anymore.
If you drive it from a 5V microcontroller the emitter voltage won't go higher than 3V! Again, if it would be higher there wouldn't flow any base current. You might use a common collector with a 24V input, but not with 5V.
Usually you'll use a common emitter configuration, where the solenoid comes at the place of R1. In that case you'll need R2. If your microcontroller runs at 5V and you're using the KSD1222 (see below), you'll have a voltage drop of 5V - 2V = 3V across R2. You'll need at least 2mA, but let's play safe and give it 10mA. Then R2 should be maximum 3V/10mA = 300\$\Omega\$.
If you want to use a MOSFET the Si2318DS is suitable. It's a 40V FET which can drive 3A at less than 4V \$V_{GS}\$. \$R_{DS(ON)}\$ is 45m\$\Omega\$, so at 2A it will only dissipate 180mW. That sounds safe, but when you're going to PWM this will rise due to switching losses. At 300Hz this will not really be a problem, however.
If you would want to use the Darlington, the KSD1222 is also a 40V type, with \$H_{FE}\$ of minimum 1000. Can drive 3A. But here saturation voltage can be as high as 1.5V. At 2A this means the transistor will dissipate 3W, so you'll need a heatsink. The MOSFET is the better solution.
Let's first rule out static losses as the cause for your troubles: Your MOSFETs have an on-resistance of approx. 100 m\$\Omega\$ (or something much lower). With a load current of not more than 4 A, the power dissipation for a full (100 %) duty cycle should not be more than
PV, max = RDS, on \$\cdot\$ I2
PV, max = 100 m\$\Omega\$ \$\cdot\$ (4 A)2
PV, max = 1.6 W
To adress your question #1: Don't try to use a MOSFET with a super low RDS, on when you don't have to. The low on-resistance comes with the price of a larger gate charge, making it harder for your MOSFET driver to switch it fast. Also, a DPAK should be able to handle the static losses with a PCB like yours (your question #2).
Having checked this, and reading your note on not being able to use more than 40 Hz as a PWM frequency, I suspect something is wrong about getting a clean signal from your µC board to the power PCB (question #3). It could happen that every time you switch on the MOSFET, the ground voltages of your power circuit and your small-signal circuit bounce with regard to each other, causing your MOSFET to switch quite a number of times whenever it should just switch once. How long is the connection between the microcontroller and the MOSFET driver's input? How does the overall supply wiring look?
Edit: Now that things are a bit clearer after you have added your schematic, I feel that your input side (driver IC and MOSFET gate) is in danger. The flyback energy released by the solenoid after switching off needs a place to go. Your paralleled 1 µF and 100 nF capacitors may not be enough, and the voltage may rise beyond the max. voltage allowed as VDD for the IC or as VGS for the MOSFET. It is not clear how long the wire from the next stiff source (read: good capacitor) to your board's input is, and I strongly recommend a large, local electrolytic capacitor (1000 µF, 35 V).
Best Answer
The internal construction of a mosfet is different and you need different voltage levels to switch it on. Higher than source for N channel and lower than source for P channel. As you will be switching 25V load from a 5V microcontroller, choose an N channel logic level mosfet.
It's the maximum voltage whitch the mosfet can withstand without letting the current to run through it.
By the rule of thumb you should double the rating to get a reliably working system. So, look for a mosfet with Vds in the range of 50V-60V. It would be OK to use a 25V mosfet but you usually don't want to operate near maximum limited values.
Again - double it.
Yes, mosfet dissipates least power when it's either fully on or off. Look at the graphs in the datasheet that specify Rdson depending on Vg - you want Rdson as small as possible, so you want to drive the gate above the Vgth. But note, that there is a maximum value that can be safely applied to a gate - Vgsmax. You should be safe driving it with a microcontroller, just a point to note.
No, power dissipated by a mosfet would be I*I*Rdson - that's why you want as little Rdson as possible.
When a mosfet is on, it's not an ideal conductor with no resistance. Rdson is the resistance of the mosfet and is dependent on different factors, datasheets usually give graphs how Rdson changes with different parameters.
You don't have to deal with gate charge and input capacitance in you application as fast (submilisecond) switching is not required. A mosfet gate presents itself as a capacitor to a driving circuitry and as it takes time for a capacitor to charge, it takes time for a mosfet to turn on that's why in high speed applications special mosfet driver ics are used that force high currents into gate to charge this capacitance as quickly as possible.
You can find cheaper mosfets with lower Rdson, just use the parametric search on digikey. Pay attention to the graph that displays Rdson against Vgth - sometimes manufacturers claim 4V Vgth and 4mOhm Rdsn, but when you look at the graph you see, that at 4V it's 20mOhm and you need to get to 9V to get the advertised 4mOhm Rdson.