The main reason why ARM processors are not clocked at 4GHz is power consumption. Architecture, fabrication, etc do play a big role, but the reality is that a tablet or mobile phone needs to last as much as it can off a battery, so all those factors are designed so that power consumption will be minimized. When going for lower power consumption, you sacrifice performance because of design choices in the node, architecture etc. Higher frequency is a battery killer because:
P = CV2f
Where C is a capacitance, V is the voltage, and f is the frequency. So it varies linearly with frequency, and it's why frequency scaling is so prevalent, even in laptops.
All microprocessors, and indeed all synchronous digital circuits work in what is called a "Register Transfer Level". Basically all that any microprocessor does is loading values into registers from different sources. Those sources can be memory, other registers or the ALU (Artihmitical-Logical Unit, a calculator inside the processor). Some of the registers are simple registers inside the procesor, some registers can be special function registers that are located around the CPU, in 'peripherals' such as I/O ports, memory management unit, interrupt unit, this and that.
In this model, 'Instructions' are basic sequences of register transfers. Normally it doesn't make sense to give the programmer the ability to control each register transfer individually, because not all of the possible register transfer combinations are meaningful, so allowing the programmer to express them all would be wasteful in terms of memory consumption. So basically each processor declares a set of sets of register transfers that it allows the programmer to ask the processor to do, and these are called 'Instructions'.
For example ADD A, B, C might be an operation where the sum of registers A and B is placed into register C. Internally, that would be three register transfers: Load adder left input from A, load adder right input from B, then load C from adder output. Additionally, the processor makes the necessary transfers to load memory address register from program counter, load instruction register from memory data bus, and finally load program counter from program counter incrementer.
The 8086 used an internal ROM look-up table to see which register transfers make each instruction. The contents of that ROM were quite freely programmable by the designers of the 8086 CPU so they chose instruction sequences, which seemed useful for the programmer, instead of choosing sequences which would be simple and fast to execute by the machine. Remember, that in those days most software was written in assembly language, so it made sense to make that as easy as possible for the programmer. Later on, Intel designed 80286, in which they made, what now seems, a critical error. They had some unused microcode memory left and they thought that they might as well fill it with something, and came up with a bunch of instructions just to fill the microcode. This bit them in the end, as all those extra instructions needed to be supported by the 386, 486, Pentium and later processors, which didn't use microcode any more.
ARM is a lot newer processor design than the 8086 and the ARM people took a different design route. By then, computers were common and there were a lot of compilers available. So instead of designing an instruction set that is nice for the programmer, they chose an instruction set which is fast for the machine to execute and efficient for the compiler to generate code on. And for a while, the X86 and ARM were different in the way that they execute instructions.
Time then goes by and CPUs become more and more complex. And also microprocessors are designed using computers and not pencil and paper. Nobody uses microcode any more, all processors have a hardwired (pure logic) execution control unit. All have multiple integer calculation units and multiple data buses. All translate their incoming instructions, reschedule them and distribute them among the processing units. Even old RISC instruction sets are translated into new RISC operation sets. So the old question of RISC versus CISC
doesn't really exist anymore. We're again back in the register transfer level, programmers ask CPU's to do operations and CPU's translate them into register transfers. And whether that translation is done by a microcode ROM or hardwired digital logic, really isn't that interesting any more.
Best Answer
Today ARM processors have a big advantage in mobile devices: they need less energy in order to work. This is very important in smartphones and tablets because the technology of the batteries is always the same and so if you want to increase the autonomy of these devices you need components that use less power. For now, Intel is some steps behind in power usage, so manufacturers prefer to use ARM CPUs in mobile devices. This is mainly due to the retrocompatibility of the x86 architecture that Intel is forced to maintain. This involves a higher number of transistors and the more transistors, the more power needed. Intel is investing a lot in this sector and today some devices are starting to use its processor (Motorola RAZR i, Samsung Galaxy Tab 3 10.1).
For now Intel processors have better performance and so are preferred in laptops and desktops to ARM. ARM is growing fast and I think that in the future its processors will also be used in laptops (that have more benefits than desktops in reduced power consumption) and finally in desktops.
For now Intel wins in performance and ARM wins in consumption but they are working hard to reduce their gaps. Intel also has the best manufacturing process in the world and this is a great advantage that allows to them to reduce the gap from ARM in power consumption.