Low-Level Programming – Why Do Executables Depend on OS but Not CPU?

cpulow-levelmachine-code

If I write a C program and compile it to an .exe file, the .exe file contains raw machine instructions to the CPU. (I think).

If so, how is it possible for me to run the compiled file on any computer that runs a modern version of Windows? Each family of CPUs has a different instruction set. So how come any computer that runs the appropriate OS can understand the instructions in my .exe file, regardless of it's physical CPU?

Also, often in websites in the "download" page of some application, you have a download for Windows, for Linux, and for Mac (often two downloads for each OS, for 86 and 64 bit computers). Why aren't there many more downloads, for each family of CPUs?

Best Answer

Executables do depend on both the OS and the CPU:

Instruction Set: The binary instructions in the executable are decoded by the CPU according to some instruction set. Most consumer CPUs support the x86 (“32bit”) and/or AMD64 (“64bit”) instruction sets. A program can be compiled for either of these instruction sets, but not both. There are extensions to these instruction sets; support for these can be queried at runtime. Such extensions offer SIMD support, for example. Optimizing compilers might try to take advantage of these extensions if they are present, but usually also offer a code path that works without any extensions.
Binary Format: The executable has to conform to a certain binary format, which allows the operating system to correctly load, initialize, and start the program. Windows mainly uses the Portable Executable format, while Linux uses ELF.
System APIs: The program may be using libraries, which have to be present on the executing system. If a program uses functions from Windows APIs, it can't be run on Linux. In the Unix world, the central operating system APIs have been standardized to POSIX: a program using only the POSIX functions will be able to run on any conformant Unix system, such as Mac OS X and Solaris.

So if two systems offers the same system APIs and libraries, run on the same instruction set, and use the same binary format, then a program compiled for one system will also run on the other.

However, there are ways to achieve more compatibility:

Systems running on the AMD64 instruction set will commonly also run x86 executables. The binary format indicates which mode to run. Handling both 32bit and 64bit programs requires additional effort by the operating system.
Some binary formats allow a file to contain multiple versions of a program, compiled for different instruction sets. Such “fat binaries” were encouraged by Apple while they transitioning from the PowerPC architecture to x86.
Some programs are not compiled to machine code, but to some intermediate representation. This is then translated on-the-fly to actual instructions, or might be interpreted. This makes a program independent from the specific architecture. Such a strategy was used on the UCSD p-System.
One operating system can support multiple binary formats. Windows is quite backwards compatible and still supports formats from the DOS era. On Linux, Wine allows the Windows formats to be loaded.
The APIs of one operating system can be reimplemented for another host OS. On Windows, Cygwin and the POSIX subsystem can be used to get a (mostly) POSIX-compliant environment. On Linux, Wine reimplements many of the Windows APIs.
Cross-platform libraries allow a program to be independent of the OS APIs. Many programming languages have standard libraries that try to achieve this, e.g. Java and C.
An emulator simulates a different system by parsing the foreign binary format, interpreting the instructions, and offering a reimplementation of all required APIs. Emulators are commonly used to run old Nitendo games on a modern PC.

Related Solutions

Computer Science – How Do Lines of Code Get Executed by the CPU?

The lines of code have nothing to do with how the CPU executes it. I'd recommend reading up on assembler, because that will teach you a lot about how the hardware actually does things. You can also get assembler output from many compilers.

That code might compile into something like (in a made up assembly language):

load R1, [x] ; meaning load the data stored at memory location x into register 1
add R1, 5
store [x], R1 ; store the modified value into the memory location x
sub R1, 3
store R1, [y]

However, if the compiler knows that a variable isn't used again, the store operation may not be emitted.

Now for the debugger to know what machine code corresponds to a line of program source, annotations are added by the compiler to show what line corresponds to where in the machine code.

Assembly – Were the First Assemblers Written in Machine Code?

for the very first assembler ever written (i.e. in history), wouldn't it need to be written in machine code

Not necessarily. Of course the very first version v0.00 of the assembler must have been written in machine code, but it would not be sufficiently powerful to be called an assembler. It would not support even half the features of a "real" assembler, but it would be sufficient to write the next version of itself. Then you could re-write v0.00 in the subset of the assembly language, call it v0.01, use it to build the next feature set of your assembler v0.02, then use v0.02 to build v0.03, and so on, until you get to v1.00. As the result, only the first version will be in machine code; the first released version will be in the assembly language.

I have bootstrapped development of a template language compiler using this trick. My initial version was using printf statements, but the first version that I put to use in my company was using the very template processor that it was processing. The bootstrapping phase lasted less than four hours: as soon as my processor could produce barely useful output, I re-wrote it in its own language, compiled, and threw away the non-templated version.

Best Answer

Related Solutions

Computer Science – How Do Lines of Code Get Executed by the CPU?

Assembly – Were the First Assemblers Written in Machine Code?

Related Topic