PNP transistors work the same way as NPNs do but all voltages and currents are reversed. You connect the emitter to the higher potential, source current from the base and the main current flows into the emitter and then exits through the collector.
\$V_\rm{BE}\$ will be \$-0.7\,\rm{V}\$ but it's magnitude should be the same in both PNP and NPN if you use complementary parts.
The main division is between BJTs and FETs, with the big difference being the former are controlled with current and the latter with voltage.
If you're building small quantities of something and aren't very familiar with the various choices and how you can use the characteristics to advantage, it's probably simpler to stick mosly with MOSFETs. They tend to be more expensive than equivalent BJTs, but are conceptually easier to work with for beginners. If you get "logic level" MOSFETS, then it becomes particularly simple to drive them. You can drive a N channel low side switch directly from a microcontroller pin. IRLML2502 is a great little FET for this as long as you aren't exceeding 20V.
Once you get familiar with simple FETs, it's worth it to get used to how bipolars work too. Being different, they have the own advantages and disadvantages. Having to drive them with current may seem like a hassle, but can be a advantage too. They basically look like a diode accross the B-E junction, so this never goes very high in voltage. That means you can switch 100s of Volts or more from low voltage logic circuits. Since the B-E voltage is fixed at first approximation, it allows for topologies like emitter followers. You can use a FET in source follower configuration, but generally the characteristics aren't as good.
Another important difference is in full on switching behaviour. BJTs look like a fixed voltage source, usually 200mV or so at full saturation to as high as a Volt in high current cases. MOSFETs look more like a low resistance. This allows lower voltage accross the switch in most cases, which is one reason you see FETs in power switching applications so much. However, at high currents the fixed voltage of a BJT is lower than the current times the Rdson of the FET. This is especially true when the transistor has to be able to handle high voltages. BJT have generally better characteristics at high voltages, hence the existance of IGBTs. A IGBT is really a FET used to turn on a BJT, which then does the heavy lifting.
There are many many more things that could be said. I've listed only a few to get things started. The real answer would be a whole book, which I don't have time for.
Best Answer
There are various choices that can be made in the design of transistors, with some tradeoffs being better for switching applications and others for "linear" applications.
Switches are intended to spend most of their time fully on or fully off. The on and off states are therefore important with the response curve of the in-between states being not too relevant.
For most applications, the off state leakage current of most transistors is low enough to not matter. For switching applications, one of the most important parameters is how "on" on is, as quantified by Rdson in FETs and the saturation voltage and current in bipolars. This is why switching FETs will have Rdson specs, not only to show how good they are at being fully on, but because this is also important for designers of the circuit to know how much voltage they will drop and heat they will dissipate.
Transistors used as general purpose amplifiers operate in the "linear" region. They may not be all that much linear in their characteristics, but this is the name used in the industry to denote the in-between range where the transistor is neither fully on nor fully off. In fact, for amplifier use you want to never quite hit either of the limit states. The Rdson is therefore not that relevant since you plan to never be in that state. You do however want to know how the device reacts to various combinations of gate voltage and and drain voltage because you plan to use it accross a wide continuum of those.
There are tradeoffs the transistor designer can make that favor a more proportional response to gate voltage versus the best fully on effective resistance. This is why some transistors are promoted as switches versus for linear operations. The datasheets then also focus on the specs most relevant to the circuit designer for the intended use.