The main division is between BJTs and FETs, with the big difference being the former are controlled with current and the latter with voltage.
If you're building small quantities of something and aren't very familiar with the various choices and how you can use the characteristics to advantage, it's probably simpler to stick mosly with MOSFETs. They tend to be more expensive than equivalent BJTs, but are conceptually easier to work with for beginners. If you get "logic level" MOSFETS, then it becomes particularly simple to drive them. You can drive a N channel low side switch directly from a microcontroller pin. IRLML2502 is a great little FET for this as long as you aren't exceeding 20V.
Once you get familiar with simple FETs, it's worth it to get used to how bipolars work too. Being different, they have the own advantages and disadvantages. Having to drive them with current may seem like a hassle, but can be a advantage too. They basically look like a diode accross the B-E junction, so this never goes very high in voltage. That means you can switch 100s of Volts or more from low voltage logic circuits. Since the B-E voltage is fixed at first approximation, it allows for topologies like emitter followers. You can use a FET in source follower configuration, but generally the characteristics aren't as good.
Another important difference is in full on switching behaviour. BJTs look like a fixed voltage source, usually 200mV or so at full saturation to as high as a Volt in high current cases. MOSFETs look more like a low resistance. This allows lower voltage accross the switch in most cases, which is one reason you see FETs in power switching applications so much. However, at high currents the fixed voltage of a BJT is lower than the current times the Rdson of the FET. This is especially true when the transistor has to be able to handle high voltages. BJT have generally better characteristics at high voltages, hence the existance of IGBTs. A IGBT is really a FET used to turn on a BJT, which then does the heavy lifting.
There are many many more things that could be said. I've listed only a few to get things started. The real answer would be a whole book, which I don't have time for.
Emitter followers are perilously close to being UHF oscillators; it is quite possible for them to start oscillating during only a small part of the wanted signal cycle; this appears as periodic noise and may also cause the observed distortion. (It is usually too high frequency to observe on a scope!)
The usual cure is a small value resistor (experiment with 22 and 47R) in series with the base, (aka "base stopper") as close to it as possible.
What happens is that the device capacitances in conjunction with the inductance of the base connection cause a parasitic series-resonant circuit, and Cbe provides positive feedback, the emitter current providing the power. The base stopper lowers the Q of the resonant circuit to prevent oscillation.
(It is also possible, as Andy AKA is hinting, that R3 is too high in value and this is clipping the -ve peaks)
Best Answer
It depends on the crystalline structure of the epitaxial wafer and junction geometry and square of the current needed for bias in both conductor and dielectric. Flicker Noise is random pink noise usually measured <100Hz in \$A^2/Hz \$ as 1/f noise, but contributes to phase noise in RF.
The explanation is simple but hard to visualize.
Imagine a small low leakage cap on a unijunction gate or a DIAC with a small bias current and a breakdown voltage for the semiconductor shorting out the cap and then charging up again. This is a fixed f relaxation oscillator. Now imagine that certain semiconductor crystals have a higher leakage current (Early Effect) with greater BDV generating bigger partial discharges between charged molecules before they break down under the electric field (in a nano scale ) . Then imagine millions of relaxation oscillators of random pulse rate low pass filtered by the dielectric between the charged conductor atoms. So this RC time contant affects the relaxation rate while the high series leakage and shunt capacitance low pass filters these broadband pulses or "flickers".
So in my theory, it is the device with the highest RC leakage time constant and highest BDV where the product produces a corner frequency at some A^2/Hz with applied bias voltage. I call this flicker, random (PD) or partial (nano crystalline) discharge , for your understanding and consideration.
Which device?. depends on conduction power, leakage input bias current, dielectric doping levels, crystalline épiwafer nano-structure and meta structure such as FET, BJT or HJT.
But we know for sure how to rank conductors as I did above so we generally use MF for lower noise exclusively and carbon or WW where high current noise does not matter.
I don't know how to tell you which device has lowest random flicker noise for ANY random design.
... but you may look at GaAs FETs and compare.