Your question is about the fundamental operation of a BJT. Surely there is much written about that out there.
Briefly, C-B-E is a sandwich of three semiconductors of opposite polarity doping. However, what makes a BJT more than just two diodes in series pointing in opposite directions is that the base region is so thin that the depletion region of each junction extends to the other junction. The collector is still within reach of the emitter, if it weren't for the base region in between with all its carriers depleted. A little bit of externally applied base current injects carriers into the base region, which now allows current to flow accross it between collector and emitter.
Due to a whole bunch of semiconductor physics you should look up elsewhere, a few carriers in the base go a long way. This is where the transistor gets its gain from. You inject a few carriers into the base (provide a small base current) and that lets a lot of carriers conduct (allows a larger current to flow) accross the otherwise depleted base region between collector and emitter.
In a nutshell, bipolar junction transistors work because of the physical geometry of the two junctions. The base layer is very thin, and the charge carriers that are flowing from the emitter to the base do not recombine right away — most of them pass right through the base altogether and enter the depletion region of the reverse-biased base-collector junction. Once this happens, the strong field in this region quickly sweeps them the rest of the way to the collector terminal, becoming the collector current.
Best Answer
The Ebers-Moll model actually considers this issue.
Having noted that it is not really possible to model a transistor as two diodes, it is possible to model it as two functions of the same transistor.
If you wish for a full canonical answer I can provide it, but I will try to stay intuitive at this point.
Start in the normal operating mode, where \$V{bc}\$ is \$\le \ 0\$ ( reverse biased) and \$V_{be}\$ is present and above the threshold; therefore the current gain is in the active region.
Now reverse the situation such that \$V_{bc}\$ is present (collector base forward biased) and \$V_{be}\$ is 0. This reverses the transistor and swaps emitter and collector, but due to the doping levels of a standard transistor, the current gain is much lower in this mode. (The gain is proportional to doping levels and the emitter is more heavily doped than the collector)
When superimposing the gains, the normal forward gain is still larger than the reverse current gain and therefore the overall current gain is still in the sign of the normal forward mode, but at a much lower value (which is why \$ \beta \$ is very low at low \$V_{ce}\$ and therefore why \$I_c\$ is very low at low \$V_{ce}\$; this implies that \$V_b \ is \gt V_c\$ for a NPN device).
The overall large signal current gain (and therefore the effective direction of current) is strictly given by:
The first term describes the first situation (normal forward bias outside of saturation) and the second term the reverse situation (collector > base for NPN); \$ \beta_R\$ is the reverse current gain.
There is an excellent thorough analysis available.