Electronic – ARM8/ARMv4 properties for qualitative comparison between RISC, CISC, and MISC processor designs

armlow-powermicroprocessorspecifications

Question:

I'm wondering if anyone knows roughly what kind of power consumption one can expect from an ARM processor having the ARM8 architecture (ARMv4 instruction set)? (Edited)

Note: I'm not looking for technical specs, but rather a ballpark power consumption in 'typical' circumstances (typical is left open.)

For size of the ARM8/ARMv4 instruction set, I haven't found a source that simply lists them all for the ARM8 (apart from the datasheet — which would require counting manually!). Surprisingly, I wasn't able to find any marketing material or technical summary that listed the size of the instruction, unlike for Intel, where it was easy to find that 80486 has about 190 instructions.

Edit: Loosely counting the ARMv4 instruction set, I get roughly between 75-100 instructions, depending on whether one includes minor variations or not.

Context: (Edited)

I'm hoping to use the information to make a rough but qualitatively correct comparison between similar chips of similar capability but having different designs, i.e. RISC, CISC, or MISC (minimal instruction set).

For CISC, I have the Intel 80486 at about 50 MIPS, 190 instructions, but using a whopping 1.2 million transistors, and consuming ~3W power.

For RISC, I have the ARM8 at 84 MIPS, 75-100 instructions, using fewer than 50k transistors, and consuming ??W power.

But most interesting in my opinion is MISC, specifically Chuck Moore's MuP21 Forth-in-hardware chip which is listed at about 80 MIPS performance, has only 25 instructions, only 7k transistors, and 50 mW power consumption.

Filling in the ARM chip information will hopefully support factor / order of magnitude comparisons between the three designs.

Best Answer

You are not going to conclusively be able to come up with a result, not possible. Benchmarking is always subjective and can often be and is often used to give the desired result (A is better, B is better, C is better etc).

Number of instructions is not relevant, any more than number of registers would be etc. Number of transistors is interesting, but comparing a single soc chip vs a processor that required external chips to provide the same functionality. Or one chip may have large chunks turned off at any one time relative to the other, or may have a large chunk turned off to complete the benchmark, or one may have more transistors, but switch them less frequently than the other which has fewer, possibly leading to different power consumption.

Intel makes and sells chips which happens to have (much of) their stuff inside. Arm does not make chips, they sell IP. Just like how fast does this program in source code form run, varies widely depending on compiler options, processor, etc. That same IP can consume widely different amounts of power depending on the foundry and cell library and process used to implement it. Same architecture, same clock rate, different power consumption. So right off you are comparing apples to oranges in yet another way. I cant think of a real case where an arm core is all that is in the chip, generally you wrap the arm in the chip with a lot of stuff, stuff that with the other processor would be off chip. The proper comparison would be the whole system not just the power of the processor.

This takes you into clocking differences, one processor may be way more efficient than another and can perform the same benchmark at a different clock rate or otherwise using less hardware or power or whatever. Very easy to write a benchmark that runs on a small battery powered microcontroller board that uses less power than an x86 computer even if the x86 is or could be grossly underclocked. Just as easy to write a benchmark that runs lightning fast on an x86 that takes that microcontroller an eternity to finish. Even if you clocked them the same or could.

Just compilers alone make the same computer run the same source code and vastly different speeds. It is simply not possible to compare two systems in this way except for clearly stating exactly the benchmark. This specific code compiled for speed using this compiler which was hand checked to produce this quality of optimized code, ran on this specific system with the system consuming this much power. This other system using this compiler hand checked to provide similar optimization, required this clock rate and this much power to execute in about the same time. Repeat for each of the infinite number of possible benchmark applications users might be interested in.

The mips/mhz comparison relies heavily on the compiler and the application, big variations in mips on the same system with no hardware changes. In no way can you really compare two processors with this method. Published mips to mhz is just marketing fluff, ignore it. Likewise you can put as much faith in the published power consumption as you can in the mips/mhz numbers, it was based on some benchmark, if your application is not the same benchmark, what good is it?

You will need to build a number of systems (lay out each board design specific to the benchmark) and attempt to reduce the number of variables, or ideally take the approach of making the bare minimum system, max optimization, capable of running the benchmark at exactly X amount of time. Repeat for the other system, then compare the power consumption of the whole system for the duration of the execution of that benchmark. Repeat for the millions of different benchmarks, in order to get a fair and general comparison it may not be possible to reduce the results in any conclusive way.

For an architecture difference you ideally want to have the processors built at the same foundry using the same cell library and same process, etc. If you are willing and able to license competing cores, populate the chips to the same level, use similar bus rates and as much similar external hardware (the system buses are no doubt different, making a common bus from them might give one an unfair advantage). Same amount of caches with same advantages, etc. You might have a better chance at a comparison that actually looks real. This would be the only way to come up with something plausible, same benchmark run on different architectures made at the same foundry, same cell library, same process, same cache size, same dram, etc. Can still manipulate the benchmarks to make either one the fastest or lowest power consumer.

What would be more interesting is an empirical comparison. Take or create benchmarks one at a time, look at the various ways to generate code from the compiler. Examine the buses that you can examine, get a feel for fetch sizes. If possible with the fixed vs variable word length instructions can you tell from the buses where the variable length decisions are made, first byte tells you might need to examine the second byte, second byte may make you realize you need 4 more bytes for the immediate, now you can execute. How much has to be jammed in near the decoder to make this efficient? How much do you have to discard and fetch if there is a branch, how fast does this happen? You have to look at quantity of code to perform similar tasks, due to different number of registers (real or virtual)(x86 is microcoded or many are, arm is not microcoded) how often does the code have to swap out registers on the stack (very easy to write benchmarks that punish one architecture relative to another for this). x86 can store more program in the same size cache than an arm, but the arm is more deterministic in decoding that code. x86 incurs more alignment punishments than arm, as it lends itself to not be aligned where arm is either encouraged or forced. Can you construct benchmarks that show an advantage for each instruction set, should be very easy to make a loop that fits x86 instructions within a cache of some size, but does not fit arm instructions in the cache of the same size. might be easy to have a benchmark that branches heavily that might show arms advantages or at least shoes one branch predictor vs another. clocks and power are still out of the picture, but performance at least at some layer you can see and the project that into the caches and dram responses to finish the understanding.

Anyway that was a tangent, you cannot compare two processors in this way and have those in the know accept the results as anything meaningful. The masses may be fooled, but not those who know what is going on. Empirically demonstrating advantages and disadvantages, that might be more doable and interesting to all. comparing opencores in the same fpga, that might be interesting as well, but one commercial processor chip on a commercial board, compared to IP that can be implemented many many different ways on many different boards, just wont be plausible.