First, lets look at why there's only very small current in a reverse-biased pn-junction diode. The junction doesn't block all current when reverse biased. The electric field in the junction opposes the majority carrier current whether forward-biased or reverse-biased, but quickly sweeps any available minority carriers (electrons in the p-region, holes in the n-region) across it. In forward bias, minority carriers are being continuously injected from the contacts, so there is a sustained current. In reverse bias conditions, there's very few minority carriers available, so the junction carries all the available carriers away in a very short time, and there's no more carriers available to sustain a current.
So what happens in a forward-active BJT is that the forward biased base-emitter junction creates a large number of minority carriers in the base region. The (reverse-biased) collector-base junction then has no problem "finding" carriers to create a current, and so you can have a large collector current.
It is not correct that the depletion regions of the two junctions overlap. If that happens, you have a condition called "punch-through" where there is no gain in the device.
I found a slide-set that gives a very quick explanation of BJT operation here. In particular note that current in the depletion regions is mainly caused by "drift" (carriers being pushed around by electric fields), but in the bulk regions it is mainly caused by diffusion --- that is simply carriers randomly moving around, so that the net movement is from areas of high concentration to areas of low concentration. Finally, remember that the important currents are the minority-carrier currents.
Edit
My explanation of forward biased operation was not right. Let's try again: Whether the junction is forward or reverse biased, the electric field in the depletion region (the area right around the junction) opposes "majority" carriers crossing the junction and encourages minority carriers. In forward bias, the size of this barrier is reduced to the point where some fraction of the majority carriers have enough thermal energy to overcome the barrier. But anyway, the operation in reverse bias is more important to answering your question.
Rule 1 isn't a "good idea", it isn't a "guideline", it is a fundamental tenet of transistor physics. If for any reason (during normal usage) it is unable to hold then the circuit will not operate.
As for the lamp, it is a purely resistive element. It should have 10V across it, but thanks to the transistor it won't. So the transistor gets 0.2V and the lamp gets 9.8V and reality is saved.
Best Answer
Short answer: For the same reason that the severely reversed-biased BC diode passes large currents despite being reverse-biased when the npn is in active mode (and reverse-biased diodes should have negligible current)!
Long answer: Imagine the same npn transistor in common-emitter configuration, emitter is connected to ground, collector to Vcc = 10V through a resistor R and base to a variable voltage source. When Vb is below the threshold voltage Vt, say 0.5V, the transistor is off, therefore Ic = Ie ~ 0 and there is no voltage drop on the resistor R, therefore Vc ~ Vcc ~ 10V. Then gradually increase Vb, the BE diode becomes forward biased, and assuming the electron-driven devices, BE draws a large amount of electrons from the ground (going opposite to the conventional direction of current) and provides them to the p end of the BC diode, which is currently reverse-biased. However, BC is not an isolated diode here, therefore do not expect it to act as one. The junction field in the BC sweeps all the provided electrons across the junction, generating a large downward collector-emitter current, while the BC is reverse biased. Obviously this is contrary to what a normal diode should do, but then again BC is not a normal diode; you can think that it is "hacked" in a sense. For clarity, the E-field in the BC diode is always from n (collector) to p (base).
Still on the same story, by increasing Vb, Ic increases, the voltage drop across R increases which leads to Vc decreasing until it reaches Vb from above. Now you would expect Ic to be zero, because the potential difference across CB is zero, but then again, we are dealing with a hacked diode. The same E-field across its junction (which is also decreased but never changed direction) sweeps away all the provided electrons, continuously maintaining the downward current.
Continuing on, even further increase in Vb drags Vc lower than Vb, but the internal E-field still does not change direction (although waning in magnitude), acting as just like above. The back-to-back diode picture should never be taken literally, i.e. you cannot make a transistor by wiring up two discrete diodes back to back, because you cannot keep the bipolar nature of charge carriers (electrons and holes) using discrete components (all holes and electrons will become electrons when transmitted across the wire connecting the diodes).
As for the current flowing in opposite direction to the voltage, here you are dealing with active devices as opposed to passive. Active devices can exhibit negative conductances.