The first key, so they say, to understanding BJT behaviour is to understand that its driven by minority carrier behaviour. In an NPN device, that means that electrons in the p-type base region control the behaviour.
I think you captured that in your description, but most of the rest of what you wrote doesn't fit the usual way of describing the physics.
Since the base is very thin in relation to the collector and emitter, ... there are not many holes available to be recombined with emitter electrons. The emitter on the other hand is a heavily doped N+ material with many,many electrons in the conduction band.
This is the only part of what you wrote that makes sense. The forward bias on the b-e junction creates excess carriers in the base region. There are not enough holes to recombine with those electrons instantaneously, so the region of excess holes extends some distance from the beginning of the depletion region associated with the b-e junction. If it extends far enough, it will reach the opposite depletion region (for the c-b junction). Any electrons that get to that depletion region are quickly swept away by the electric field in the depletion region and that creates the collector current.
OK, so how is entropy involved?
A key point is that the spread of excess electrons away from the b-e junction is described by diffusion. And diffusion is, in some sense, a process that takes a low-entropy situation (a large number of particles segregated in one part of a volume) and turns it into a high-entropy situation (particles spread evenly across a volume).
So when you talk about "a high entropy of electrons", you actually have it backwards. Diffusion actually acts to increase entropy, not reduce it.
The idea that excess electrons are "effectively doping and shrinking the base/collector depletion region into N-type material" also doesn't make any sense. The excess carriers don't affect the extent of the c-b depletion region much. Electrons that reach the c-b depletion region are simply swept through by the electric field.
The key to it all is the minority carriers in the base.
Your suspicion is correct that if all you had was the CB junction it would just become a diode. Reverse biasing this diode does not give you any current. The p-base of an npn is full of holes and the n-collector is has lots of electrons. In reverse bias the majority carriers move away from the junction on both sides and you do not get any current, just like an normal diode.
The tricky part happens when you forward bias the base-emitter junction. The holes in the p-base move towards the BE junction and the electrons in the emitter also move towards the junction. Some of them annihilate each other but because of the doping inequity a lot of the electrons from the emitter pop through into the base!!! As a result they can keep propagating through the base to the collector and you get the collector-emitter current that you were hoping for.
You should take long look at the diagram labelled Lecture 7 - Slide 12
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-012-microelectronic-devices-and-circuits-fall-2009/lecture-notes/MIT6_012F09_lec07.pdf Holes are green and electrons are blue.
Best Answer
What is the Early effect?
Indeed it has to do with influence of Vce on the Base region. A higher Vce increases the size of the Base-Collector depletion region. This depletion region partly expands into the Base region making it smaller.
See this illustration, (a) shows the transistor in forward mode with a certain Vce. (b) shows the same but with a higher Vce.
This results in a smaller Base region and an increase in current.
What can we do against this?
We cannot prevent the BC depletion region from expanding but we can make the effect smaller by increasing the doping of the Base and Collector regions. A property of a PN junction is that the depletion region decreases in size as the doping levels increase. So indeed, higher doping levels can help however that has other implications.
The current amplification beta of a BJT is determined by the ratio of the dopings. If the doping levels of Base and Collector are increase, the Emitter doping must be increased by the same factor to maintain the same beta. Also the Emitter must have the highest doping level. If the Collector had a higher doping level than the Emitter, it would become the Emitter! and the Emitter would be the collector (or the "reverse beta" would become higher than the "forward beta" and that would be silly).
Increasing doping levels has another disadvantage, as the depletion regions become smaller, the maximum Vce before breakdown decreases.
You could make the Base region larger but that will decrease beta as the minority carriers in the base will have more chance to recombine. A high beta BJT relies on having a short Base region.
So it is a compromise, high-beta transistors suffer more form the Early effect than low beta transistors. So you could also compromise on beta for less Early effect.
What else can we do?
As an IC designer I cannot do all the things I mention above, the fabrication process for the NPNs I have available is fixed. The NPNs are what they are and I have to deal with it. So I have to use circuit solutions.
Like cascoding:
simulate this circuit – Schematic created using CircuitLab
Here Qcasc takes care of most of the Vce so it will suffer from the Early effect. However, Qcasc does not set Ic, Ic is set by Q1 and Q1 has a nice fixed Vce of around 0.7 V.
A disadvantage of cascoding is that the minimum Vce will be larger than using only Q1 of course so cascoding is not always an option.
An advantage of cascoding is that the Miller effect become much less dominant as the voltage gain from the base to the collector of Q1 is only about 1 (one). For high frequency (RF) amplifiers this advantage might be the only reason why cascoding is done. Then the Early effect isn't really a problem but the bandwith of Q1 is. Cascoding is needed to get and use the full available bandwith.