Theory – How to Automatically Convert Code from Low-Level to High-Level Language

high-levellow-leveltheory

I have seen several applications that claim to convert Java code to valid C or even C++. Converting from a high level language to a low level language is possible, no doubt about it. At least in theory, can the reverse be done without any manual steps?

For instance:

  • Converting Assembly to C or Machine Code to Assembly?

  • Hardware Description Languages (HDL) to Assembly? (which ever is lowest?)

  • C to C#?

Best Answer

Although it's possible, it's likely that the "lifting" compiler will end up generating code whose structure emulates the programming model of the lower-level language. Thus, you will end up with "COBOL in Haskell" or "ASM in Java" or what-have-you, and it will be more complex and less efficient than your lower-level language.

For instance, if the lower-level language has explicit memory management and your higher-level language does not, you cannot just throw away the frees -- perhaps the behavior of the underlying program depends upon determinism. So you would have to model, in your high-level language, the memory model of the low-level language (yuck). Similarly, if the lower-level language has arbitrary gotos a la JMP you'd have to generate high-level code where such gotos could be executed (arbitrary function boundaries).

The reason that decompilers do not face this problem is that they are not really working with the complete capabilities of the underlying machine code unless they are on a VM that happens to be very tightly coupled to a language's programming model.