Why Compilers Generate Executables for Installed Platforms Only

compilercpucross platformlow-level

I'm a C++ developer and in an attempt to better understand cross-platform development, I'm trying to get a better understanding of some implementation details of compilers and how exactly they create OS specific binaries. In the midst of my study I realized that, at least for a while, most compilers you downloaded for a specific platform only compiled binaries for that platform. So if you downloaded an IDE that came with a compiler exe for Windows, then that compiler would only be able to compile your program for x86-x64 windows applications and not Linux or Mac applications.

Now I understand that different platforms require different binary formats, but what makes it difficult for say the visual C++ compiler on windows to generate a linux binary executable file? As long as you have the assembly instructions for the CPU your running on, as well as the OS specific libraries, shouldn't you be able to compile exectuables for any platform on any machine?

Best Answer

what makes it difficult for say the visual C++ compiler on windows to generate a linux binary executable file?

Other than an unwillingness to do that on Microsoft's part, absolutely nothing. The obstacles aren't technical.

Development toolchains are just programs that take input and produce output. Visual C++ produces x86 assembly and then uses an assembler to convert that into a COFF object file. If Microsoft wanted to make it generate ELF instead, it's just code: assembly comes in, ELF goes out. There's nothing magic about object files or libraries; they're just blobs of data in a well-understood format.

Way back in the stone age, cross compilation was a lot more difficult because more often than not, you would have been writing the tool chain for your target platform in assembly for the platform where it would run. This meant that if all there was in the world were the VAX, M68K, and Alpha architectures, a full suite of cross-compilers would require writing nine of them, mostly from scratch. (VAX-to-VAX, VAX-to-M68K, VAX-to-Alpha, M68K-to-VAX, M68K-to-M68K, etc.) That's a bit of an exaggeration since parts of the VAX compiler could be reused and attached to code generators for each target (e.g., VAX, M68K and Alpha, each written for VAX.)

That problem went away when we started writing compilers in a language that wasn't tied to a specific processor, such a C. Going that route means you write the entire toolchain once in C and use a written-for-the-local-platform C compiler to build it. (You'd often use the compiler to recompile itself after it had been bootstrapped on the local platform's compiler, but that's another discussion.) The upshot of this is that building a cross-compiler became essentially the same effort as building a native compiler on the local platform. The only significant difference is that somewhere in the build process, you told it to compile in the code generator for your target platform instead of the one for the local platform, which would have been the logical choice. GCC did (and still does) this by building one binary per local/target platform pair.

As the architecture of compilers evolved, it became convenient to simply include and build all of the code generators with the product and select which one gets used at runtime. Clang/LLVM does this, and I'm sure there are others.

Once you have a working toolchain (compiler, assembler, linker), libraries get built from sources and eventually you end up with everything you need to produce an executable file for some other platform.