The first key, so they say, to understanding BJT behaviour is to understand that its driven by minority carrier behaviour. In an NPN device, that means that electrons in the p-type base region control the behaviour.
I think you captured that in your description, but most of the rest of what you wrote doesn't fit the usual way of describing the physics.
Since the base is very thin in relation to the collector and emitter, ... there are not many holes available to be recombined with emitter electrons. The emitter on the other hand is a heavily doped N+ material with many,many electrons in the conduction band.
This is the only part of what you wrote that makes sense. The forward bias on the b-e junction creates excess carriers in the base region. There are not enough holes to recombine with those electrons instantaneously, so the region of excess holes extends some distance from the beginning of the depletion region associated with the b-e junction. If it extends far enough, it will reach the opposite depletion region (for the c-b junction). Any electrons that get to that depletion region are quickly swept away by the electric field in the depletion region and that creates the collector current.
OK, so how is entropy involved?
A key point is that the spread of excess electrons away from the b-e junction is described by diffusion. And diffusion is, in some sense, a process that takes a low-entropy situation (a large number of particles segregated in one part of a volume) and turns it into a high-entropy situation (particles spread evenly across a volume).
So when you talk about "a high entropy of electrons", you actually have it backwards. Diffusion actually acts to increase entropy, not reduce it.
The idea that excess electrons are "effectively doping and shrinking the base/collector depletion region into N-type material" also doesn't make any sense. The excess carriers don't affect the extent of the c-b depletion region much. Electrons that reach the c-b depletion region are simply swept through by the electric field.
Given that \$\alpha\$ and \$\beta\$ are related by \$\alpha = \frac{\beta}{1+\beta}\$ as stated in the wiki article, obviously you can do your sums with either.
However, which is going to be easier to use? I personally always use \$\beta\$, regardless of the transistor configuration.
In common emitter \$I_c = \beta\times I_b\$, so I can say 'I need to control \$I_c\$ collector current, I need at least \$\frac{I_c}{\beta}\$ of base current to do it'.
But as \$\beta >> 1\$ (for most transistors), \$\alpha \approx 1\$, and \$I_c \approx I_e\$. You may object to the approximation, but given the way that \$\beta\$ varies with temperature, \$I_c\$, and between transistors of the same type, that is a far far better approximation than insisting that \$\beta\$ is constant. Any good transistor design will allow for operation with a range of \$\beta\$, at least \$2:1\$, preferably more.
Once you have made the approximation \$I_c \approx I_e\$, then common collector operation is given by 'I need to allow for a base current of \$\frac{I_c}{\beta}\$ to flow in the base circuit, without upsetting operation'.
With a common base stage, you say much the same thing, allowing an amount of base current, however you also say that the emitter to collector gain is slightly less than \$1\$, a fraction of \$\frac{1}{\beta}\$ less than one. The error of the gain from \$1\$ will usually be a smaller error than resistor tolerances and other sources of gain error.
Given that you can write an equation for \$\alpha\$, does that mean that you need to? For most practical engineering designs, the answer is no. If you are in college, and the tutor really likes to use \$\alpha\$, then the answer is yes.
Best Answer
It's better to go back to the original papers to understand something well. In this case, "Effects of Space-Charge Layer Widening in Junction Transistors," by J. M. Early, 1952, from the Proceedings of the I.R.E.
Here's the first diagram from that paper (illustrating an NPN BJT):
While Shockley\$^1\$, in 1949, and then Shockley, Sparks, and Teal\$^2\$, in 1951, showed that an increase in the potential across a barrier would increase the barrier thickness, they also assumed that such changes in collector- and emitter-barrier thicknesses would not affect base-layer thickness. The assumption turned out to be false, so Early\$^3\$ addressed it by realizing that an increase in the barrier thickness, \$x_m\$ in the above diagram, would spread in both directions and therefore reduce the thickness of \$w\$ in the above diagram.
This decrease in \$w\$ has two important effects.
Perhaps the key insight here, one that wasn't reached earlier than 1952 (as Schockley et. al. failed to recognize it in their earlier paper) is that increases in \$x_m\$ spreads in both directions and that the dual principal impacts each tend to increase \$\beta\$.
The reality for an active mode NPN is that the depletion region at the BE junction shrinks (Late Effect that wasn't accounted for until the Gummel Poon model, as Early's paper wrote: ".. it is very thin and may be neglected .."), while the depletion region at the BC junction widens (Early Effect.) Keep in mind that the base itself is made quite thin and is lightly doped.
\$^1\$ W. Shockley, "The theory of p-n junctions in semiconductors and p-n junction transistors," Bell Sys. Tech. Jour., vol. 28, p. 435; July, 1949.
\$^2\$ W. Shockley, M. Sparks, G. K. Teal, "The p-n junction transistors," Phys. Rev., vol. 83, p. 151; July, 1951.
\$^3\$ J. M. Early, "Effects of Space-Charge Layer Widening in Junction Transistors," Proc. of the I.R.E., vol. 40, p. 1401-1406, November, 1952