In some device technologies, registers are connected to a bus using three-state outputs. Such an approach does have some advantages, but it generally either requires that either there be some "dead time" between the moment one register releases the bus and the moment another register starts driving it, or else runs the risk that a device might start driving the bus before the previous device has fully released it.
In other technologies, this approach is avoided in favor of using nested multiplexers. If there are 64 registers that can output to a bus bit, the device might have eight 8-way multiplexers each of which accepts input from one register, and one more 8-way multiplexer which accepts input from one of the first eight. While this may use slightly more circuitry than the bus-based approach, it has the advantage that every signal throughout the system will be driven by exactly one device at all times.
Practically, what limits CPU speed is both the heat generated and the gate delays, but usually, the heat becomes a far greater issue before the latter kicks in.
Recent processors are manufactured using CMOS technology. Every time there is a clock cycle, power is dissipated. Therefore, higher processor speeds means more heat dissipation.
http://en.wikipedia.org/wiki/CMOS
Here are some figures:
Core i7-860 (45 nm) 2.8 GHz 95 W
Core i7-965 (45 nm) 3.2 GHz 130 W
Core i7-3970X (32 nm) 3.5 GHz 150 W
You can really see how the CPU transition power increases (exponentially!).
Also, there are some quantum effects which kick in as the size of transistors shrink. At nanometer levels, transistor gates actually become "leaky".
http://computer.howstuffworks.com/small-cpu2.htm
I won't get into how this technology works here, but I'm sure you can use Google to look up these topics.
Okay, now, for the transmission delays.
Each "wire" inside the CPU acts as a small capacitor. Also, the base of the transistor or the gate of the MOSFET act as small capacitors. In order to change the voltage on a connection, you must either charge the wire or remove the charge. As transistors shrink, it becomes more difficult to do that. This is why SRAM needs amplification transistors, because the actually memory array transistors are so small and weak.
In typical IC designs, where density is very important, the bit-cells have very small transistors. Additionally, they are typically built into large arrays, which have very large bit-line capacitances. This results in a very slow (relatively) discharge of the bit-line by the bit-cell.
From: How to implement SRAM sense amplifier?
Basically, the point is that it is harder for small transistors to drive the interconnects.
Also, there are gate delays. Modern CPUs have more than ten pipeline stages, perhaps up to twenty.
Performance Issues in Pipelining
There are also inductive effects. At microwave frequencies, they become quite significant. You can look up crosstalk and that kind of stuff.
Now, even if you do manage to get a 3265810 THz processor working, another practical limit is how fast the rest of the system can support it. You either must have RAM, storage, glue logic, and other interconnects that perform just as fast, or you need an immense cache.
Best Answer
Of course they can be soldered. That's how they are mounted to the board during manufacturing. The BGA (ball grid array) style of package you are referring to needs special equipment to solder, but absolutely it can be soldered. That's how it's intended to be used.