Compiler – Do Compilers Need to Be Written for Each CPU Model?

compilercpuhigh-levelmachine-code

Do you need to take account of the different processors and their instructions when writing a compiler? Have instructions been standardised? Or what tools and techniques are available to assist with this? E.g. Ignoring machine instructions that are specific to a certain processor model.

Best Answer

No, instruction sets aren't "standardized" in a way that you could produce assembly that's fit for – or is simply mappable to – ARM, x86, PPC, MIPS, Itanium, Sparc, ... (and their variants).

Native code compilers are pretty complex beasts. Not all the work they do is processor-specific. All the lexing/parsing is language-dependent but not chip-related. Some optimization passes are also hardware-independent, but possibly not all – e.g. the right code size v.s. raw speed tradeoffs might depend on the target.

At some point, if you're producing native code, you'll need to know the details of the chip you're targeting. You need to be aware of their "quirks" (memory coherency properties for instance) and complete instruction sets to produce an instruction stream that is both correct and reasonably efficient.

Even if you restrict yourself to one instruction set (say x86_64), different brands of chips have different extensions that need to be considered. Different models of the same brand also have instruction set differences (new features added, sometimes old features removed). Sticking with the "lowest common denominator" could work, but you'll be missing out on a lot of stuff.

Does that mean the you do a complete rewrite of the compiler for every new instruction set or extension that hits the market? Of course not. Those are incremental changes, sometimes only to "machine description files" or whatever the compiler uses to model the target instruction set.
But introducing a new ISA altogether is not a trivial task and requires detailed knowledge of the target.

If you're setting out to build a compiler yourself, do have a look at LLVM. Chances are you use it for the "emitting native code" part at least, whatever language it is you're trying to compile.

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.