I'm not sure why you think BJTs are significantly slower than power MOSFETs; that's certainly not an inherent characteristic. But there's nothing wrong with using FETs if that's what you prefer.
And MOSFET gates do indeed need significant amounts of current, especially if you want to switch them quickly, to charge and discharge the gate capacitance — sometimes up to a few amps! Your 10K gate resistors are going to significantly slow down your transitions. Normally, you'd use resistors of just 100Ω or so in series with the gates, for stability.
If you really want fast switching, you should use special-purpose gate-driver ICs between the PWM output of the MCU and the power MOSFETs. For example, International Rectifier has a wide range of driver chips, and there are versions that handle the details of the high-side drive for the P-channel FETs for you.
Additional:
How fast do you want the FETs to switch? Each time one switches on or off, it's going to dissipate a pulse of energy during the transition, and the shorter you can make this, the better. This pulse, multiplied by the PWM cycle frequency, is one component of the average power the FET needs to dissipate — often the dominant component. Other components include the on-state power (ID2 × RDS(ON) multiplied by the PWM duty cycle) and any energy dumped into the body diode in the off state.
One simple way to model the switching losses is to assume that the instantaneous power is roughly a triangular waveform whose peak is (VCC/2)×(ID/2) and whose base is equal to the transition time TRISE or TFALL. The area of these two triangles is the total switching energy dissipated during each full PWM cycle: (TRISE + TFALL) × VCC × ID / 8. Multiply this by the PWM cycle frequency to get the average switching-loss power.
The main thing that dominates the rise and fall times is how fast you can move the gate charge on and off the gate of the MOSFET. A typical medium-size MOSFET might have a total gate charge on the order of 50-100 nC. If you want to move that charge in, say, 1 µs, you need a gate driver capable of at least 50-100 mA. If you want it to switch twice as fast, you need twice the current.
If we plug in all the numbers for your design, we get: 12V × 3A
× 2µs / 8 × 32kHz = 0.288 W (per MOSFET). If we assume RDS(ON) of 20mΩ and a duty cycle of 50%, then the I2R losses will be 3A2 × 0.02Ω × 0.5 = 90 mW (again, per MOSFET). Together, the two active FETs at any given moment are going to be dissipating about 2/3 watt of power because of the switching.
Ultimately, it's a tradeoff between how efficient you want the circuit to be and how much effort you want to put into optimizing it.
Check out this device.
http://www.linear.com/product/LT1910
It's a high side N-channel MOSFET driver. I think that it has a little boost converter internally. We've just started playing with these and it will drive the MOSFET gate to 20 VDC with a 12 VDC supply. Nifty huh?!?
Though the device isn't fast, relatively speaking, so you would be better off doing PWM for speed control on the low side MOSFET. You can get a regular FET driver to do this with your 5 V input.
Best Answer
Let's start with what's known. If the P MOSFETs (Q5 and Q6) are repeatedly failing, it's most likely the circuit is exceeding the V-gs (max) rating for the parts. For these MOSFETs, V-gs (max) is +/-16V. For your circuit (without R9, R10), that means the gate is seeing over 40V (not likely) or less than 8V. Notice that Q9, Q10 are configured as switches (good), whereas Q1, Q2 are configured as emitter followers (bad!). When Q3, Q4 are OFF, there is no path for base current for Q1, Q2 so they're OFF and the gates of Q5, Q6 are pulled to +24V through R7, R8, V-gs = 0, and Q5, Q6 are OFF. So far so good. However, when Q3, Q4 turn ON. This pulls the base of Q1, Q2 toward ground, forward biasing the collector-base junction. I'm not certain, but I suspect this zaps Q1, Q2, leaving only the emitter-base junction intact, looking like a diode between R1-R7, R2-R8. So you now have a resistor divider between +24 and ground consisting of R7, R1, the Q2 forward biased base-emitter junction, and the Vce-sat of Q4. If you do the arithmetic, this pulls the gate of Q5 down to almost 8V, which from the MOSFET spec is VERY BAD.
If the above is correct, the reason the circuit works (even with Q1, Q2 zapped) with the addition of R9, R10 is that they create another divider with R1, R2 that limits the minimum gate voltage on Q5, Q6 to something around 20V rather than 8V, well within spec.
One solution would be to reconfigure Q1, Q2 to be switches like Q9, Q10, with the emitters tied directly to +24V; R7, R8 connected between the collectors and +12V; and R9, R10 left in the circuit like R14. With this configuration, the gates of Q5, Q6 would switch between +24V and +12V, a 12 volt swing, well within the V-gs (max) spec.
One other point to keep in mind is the sequence you use with your PIC microcontroller to change motor direction. Be careful not to allow both MOSFETs on one side of the H-bridge to be on simultaneously, even for a short time, since this creates a short between +24V and ground. The sequence of instructions should be: Q5,Q8 ON, Q6,Q7 OFF -> Q5 OFF -> Q7 ON -> insert braking delay -> Q8 OFF -> Q6 ON.
Hope this helps.