Low-Level Programming – Why Do Executables Depend on OS but Not CPU?

cpulow-levelmachine-code

If I write a C program and compile it to an .exe file, the .exe file contains raw machine instructions to the CPU. (I think).

If so, how is it possible for me to run the compiled file on any computer that runs a modern version of Windows? Each family of CPUs has a different instruction set. So how come any computer that runs the appropriate OS can understand the instructions in my .exe file, regardless of it's physical CPU?

Also, often in websites in the "download" page of some application, you have a download for Windows, for Linux, and for Mac (often two downloads for each OS, for 86 and 64 bit computers). Why aren't there many more downloads, for each family of CPUs?

Best Answer

Executables do depend on both the OS and the CPU:

  • Instruction Set: The binary instructions in the executable are decoded by the CPU according to some instruction set. Most consumer CPUs support the x86 (“32bit”) and/or AMD64 (“64bit”) instruction sets. A program can be compiled for either of these instruction sets, but not both. There are extensions to these instruction sets; support for these can be queried at runtime. Such extensions offer SIMD support, for example. Optimizing compilers might try to take advantage of these extensions if they are present, but usually also offer a code path that works without any extensions.

  • Binary Format: The executable has to conform to a certain binary format, which allows the operating system to correctly load, initialize, and start the program. Windows mainly uses the Portable Executable format, while Linux uses ELF.

  • System APIs: The program may be using libraries, which have to be present on the executing system. If a program uses functions from Windows APIs, it can't be run on Linux. In the Unix world, the central operating system APIs have been standardized to POSIX: a program using only the POSIX functions will be able to run on any conformant Unix system, such as Mac OS X and Solaris.

So if two systems offers the same system APIs and libraries, run on the same instruction set, and use the same binary format, then a program compiled for one system will also run on the other.

However, there are ways to achieve more compatibility:

  • Systems running on the AMD64 instruction set will commonly also run x86 executables. The binary format indicates which mode to run. Handling both 32bit and 64bit programs requires additional effort by the operating system.

  • Some binary formats allow a file to contain multiple versions of a program, compiled for different instruction sets. Such “fat binaries” were encouraged by Apple while they transitioning from the PowerPC architecture to x86.

  • Some programs are not compiled to machine code, but to some intermediate representation. This is then translated on-the-fly to actual instructions, or might be interpreted. This makes a program independent from the specific architecture. Such a strategy was used on the UCSD p-System.

  • One operating system can support multiple binary formats. Windows is quite backwards compatible and still supports formats from the DOS era. On Linux, Wine allows the Windows formats to be loaded.

  • The APIs of one operating system can be reimplemented for another host OS. On Windows, Cygwin and the POSIX subsystem can be used to get a (mostly) POSIX-compliant environment. On Linux, Wine reimplements many of the Windows APIs.

  • Cross-platform libraries allow a program to be independent of the OS APIs. Many programming languages have standard libraries that try to achieve this, e.g. Java and C.

  • An emulator simulates a different system by parsing the foreign binary format, interpreting the instructions, and offering a reimplementation of all required APIs. Emulators are commonly used to run old Nitendo games on a modern PC.