Assembly language translates almost directly to machine code. mov
becomes a mov
instruction. call
becomes a call
instruction. The arguments on the same line become the argument fields for those instructions. There's a bit of assistance in computing addresses, but not a lot beyond that.
The operating system can be treated much like a subroutine library. The "magic numbers" you're asking about are operating system entry points; the call
instruction, like a function call in higher level languages, invokes them; they run until they return
, at which point your program picks up where it left off. Your OS's user manual will tell you which entry point to invoke to do what, how to set up any arguments required (such as putting the address of the string to be printed in the si register before calling os_print_string
, though some may involve pushing values onto the stack rather than putting them in registers), and how to read their returned results if any (again, which registers will have the result or what to pop off the stack).
As far as question (b) goes -- That's all stuff the OS and its device drivers, or a function library linked with your assembly code, will normally handle for you. If you really need to know it (eg because you're writing an OS or device drivers), you'll need to study the documentation for your specific hardware to understand how to communicate with it... but what you'll wind up doing is writing a library of functions which do the necessary work, and packaging it so main programs just invoke those functions. In other words, for most programs I/O is much like working in a higher-level language; the runtime library does all the work and all you need to know is how to use it. (Sane assembler code is critically dependent upon writing good functions so you don't spend time endlessly reinventing wheels!)
At every step of compilation you lose information that is irrecoverable. The more information you lose from the original source, the harder it is to decompile.
You can create a useful de-compiler for byte-code because a lot more information is preserved from the original source than is preserved when producing the final target machine code.
The first step of a compiler is to turn the source into some for of intermediate representation often represented as a tree. Traditionally this tree does not contain non-semantic information such as comments, white-space, etc. Once this is thrown away you cannot recover the original source from that tree.
The next step is to render the tree into some form of intermediate language that makes optimizations easier. There are quite a few choices here and each compiler infrastructure has there own. Typically, however, information like local variable names, large control flow structures (such as whether you used a for or while loop) are lost. Some important optimizations typically happen here, constant propagation, invariant code motion, function inlining, etc. Each of which transform the representation into a representation that has equivalent functionality but looks substantially different.
A step after that is to generate the actual machine instructions which might involve what are called "peep-hole" optimization that produce optimized version of common instruction patterns.
At each step you lose more and more information until, at the end, you lose so much it become impossible to recover anything resembling the original code.
Byte-code, on the other hand, typically saves the interesting and transformative optimizations until the JIT phase (the just-in-time compiler) when the target machine code is produced. Byte-code contains a lot of meta-data such as local variable types, class structure, to allow the same byte-code to be compiled to multiple target machine code. All this information is not necessary in a C++ program and is discarded in the compilation process.
There are decompilers for various target machine codes but they often do not produce useful results (something you can modify and then recompile) as too much of the original source is lost. If you have debug information for the executable you can do an even better job; but, if you have debug information, you probably have the original source too.
Best Answer
G-code was created to be extremely easy to parse by devices with extremely limited computing resources. It's almost more of a data file format than a programming language. There is no "compilation" step. It's interpreted as it is read, line by line, with a small buffer to avoid mechanical issues from timing latency. There's also no "standard library." Firmware typically has to be recompiled for each different combination of microcontroller and motor hardware used, and it takes quite a bit of work to even support what might seem like those minor variations.
In the case of the Marlin firmware, inside the
Marlin_main.cpp
you have aget_command()
function that keeps a queue filled with the commands, and you have aprocess_next_command()
that contains a massiveswitch
statement to pull the next command from the queue and call the appropriate function.As far as what those individual functions do, that depends a lot on what kind of hardware you have connected, but if you know you have a certain clock rate, a certain type of stepper motors, connected to certain axes, with a certain resolution, connected to certain pins, you can work out the right pins to toggle at the right time to say, move the head in a straight line from a to b along the x axis with a certain speed. From there it's really just a big grind to implement all the different required commands with the correct timing.