It is the clock frequency, i.e. frequency of the switching, and the number of transistors actually being switched at the same time that consume the power, and consequently, generate the heat, and not the number of transistors.
For example there could be a large amount of transistors in a area of the die that aren't being used, and so, if they are in the off (closed) state, then they will not draw any current, so less energy and less heat.
However, if some active area is switching at a higher clock frequency, then (on average) they are open for more of the time, and so draw more current, thus energy and heat.
Hope that helps.
The original 8051 used so-called pseudo-bidirectional output ports (open-drain with pullups), so there was really no port direction setting.
Of course for modern true bidirectional output ports it's better to have a known value set before enabling the port pin for output, because otherwise you could have a transient on the output that could do something undesirable.
See my answer here, for example.
Edit: Here is the I/O pin structure for a (relatively) modern CMOS microcontroller:
TRIS (TRIState) is called DDR (Data Direction Register) in many other micros. In this case, if the TRIS latch output is high then both transistors are 'off', but the port can still be read.
Here is a slightly more complex I/O pin structure for a newer Microchip micro.
Again, the TRIS latch disables the output. This one includes a LAT latch that helps avoid read-modify-write issues. On the PIC series you should write to the LAT register only (and read from the PORT register).
Here is the original 8051 and CMOS 8051 classic I/O port pin internal circuitry (from this source):
There's a bit of extra complexity in that there is a speed-up transistor in parallel with the pull-up that is briefly turned on to overcome external capacitance. As you can see, there is no TRIS/DDR control at all. The pull-up MOSFETs used in normal operation are 'weak'- they are small enough (low Idss) that an external output connected to the pin can pull the pseudo-bidirectional port line low.
Best Answer
The atom processor is based on an older architecture, plus it has to carry the baggage of being PC-compatible. So it is more complicated to implement therefore requiring significantly more transistors for similar capabilities. I believe the term RISC came after the x86 was well on its way.