The main division is between BJTs and FETs, with the big difference being the former are controlled with current and the latter with voltage.
If you're building small quantities of something and aren't very familiar with the various choices and how you can use the characteristics to advantage, it's probably simpler to stick mosly with MOSFETs. They tend to be more expensive than equivalent BJTs, but are conceptually easier to work with for beginners. If you get "logic level" MOSFETS, then it becomes particularly simple to drive them. You can drive a N channel low side switch directly from a microcontroller pin. IRLML2502 is a great little FET for this as long as you aren't exceeding 20V.
Once you get familiar with simple FETs, it's worth it to get used to how bipolars work too. Being different, they have the own advantages and disadvantages. Having to drive them with current may seem like a hassle, but can be a advantage too. They basically look like a diode accross the B-E junction, so this never goes very high in voltage. That means you can switch 100s of Volts or more from low voltage logic circuits. Since the B-E voltage is fixed at first approximation, it allows for topologies like emitter followers. You can use a FET in source follower configuration, but generally the characteristics aren't as good.
Another important difference is in full on switching behaviour. BJTs look like a fixed voltage source, usually 200mV or so at full saturation to as high as a Volt in high current cases. MOSFETs look more like a low resistance. This allows lower voltage accross the switch in most cases, which is one reason you see FETs in power switching applications so much. However, at high currents the fixed voltage of a BJT is lower than the current times the Rdson of the FET. This is especially true when the transistor has to be able to handle high voltages. BJT have generally better characteristics at high voltages, hence the existance of IGBTs. A IGBT is really a FET used to turn on a BJT, which then does the heavy lifting.
There are many many more things that could be said. I've listed only a few to get things started. The real answer would be a whole book, which I don't have time for.
Read the datasheet more closely : there will be conditions attached to that spec for a minimum hFE. One of those conditions will be that Vce is greater than some voltage : probably 2V. (Just checked the datasheet; actually Vce=5V). When you are trying to saturate the transistor, those conditions obviously no longer apply. Instead, as Vce approaches Vce(sat) hFE decreases quite dramatically.
Note that the actual base current and hFE will vary for Vce(sat) - you might find hFE=25 at that point for a particular transistor and a particular current Ic, or alternatively at hFE=10 you might see Vce lower than the rated Vce(sat) - the datasheet Figure 2 shows Vce(sat) around 0.05V for a useful range of currents!
But these are the figures guaranteed by the makers for all their transistors.
Best Answer
In their book "Feedback Amplifiers - Theory and Design" (Kluwer, 2002), Gaetano Palumbo and Salvatore Pennisi describe their circuits by making use of a generic transistor that represents BJTs, HBTs, Mosfets and Mesfets.
The generic device of a given polarity is represented by this symbol (actually two, to consider the possibility of a substrate)
and is used in conjunction with "a generic small-signal model applicable to a variety of different transistor types operating in the active region" that is this one:
This 'unified' device is introduced
and
The first appearance ought to be in the paper
G. Palumbo, J. Choma Jr.,
“An Overview of Analog Feedback Part I: Basic Theory,”
Analog Integrated Circuits and Signal Processing,
Vol. 17, No. 3, pp. 175-194, Nov. 1998.