C++ – How does assembly relate to machine/binary code

assemblyccompilermachine-code

How does assembly relate to machine/binary code.

For example here is how to print to the screen in mikeOS(a small pure assembly OS), mikeOS it uses NASM to assemble.

BITS 16
    ORG 32768
    %INCLUDE 'mikedev.inc'

start:
    mov si, mystring
    call os_print_string

    call os_wait_for_key

    ret

    mystring db 'My first MikeOS program!', 0

Where os_print_string and os_wait_for_key are defined as

os_print_string     equ 0003h   ; SI = zero-terminated string 

and

os_wait_for_key     equ 0012h   ; Returns AL = key pressed

in mikedev.inc respectively and defined as

os_call_vectors:
    jmp os_print_string     ; 0003h

in kernal.asm

Now nasm must do a lot more work under the scene when assembling, I have no idea what.

In other words assembly language is a wrapper to some degree to machine code just as say C is a wrapper to assembly. If I said cout >> "Hello World" for example in C++, this is then compiled into it's assembly equivalent and them assembled into machine code.

So I am trying to understand out how 0003h and 0012h seem to dictate everything that is going on when printing to the screen. How do these two values,

a) Tell the CPU/PC system which bus to send the corresponding bytes that represent the required string to the monitor bus and not say to the sound card.

b) In this case the string is sent to the monitor, obviously, now my understanding is that you have a frame buffer that can store a maximum number of bytes. So say the resolution of your screen is set at 1024 x 768 which is 786432 pixels and a refresh rate of 60hz on the screen, therefore the FB will contain this number of byte values and hence will be sending this many bytes to the monitor every 1/60 sec. The first byte in the FB corresponding to the first pixel on the screen and the last to the last on the screen etc.

So how does the CPU/GPU know which byte to put in which position in the FB. It's like saying to the GPU 'ok I need this pixel at coord (245,232) green so I will leave it to you to put this pixels value in the correct position in the FB' etc.

How does this work.

Best Answer

Assembly language translates almost directly to machine code. mov becomes a mov instruction. call becomes a call instruction. The arguments on the same line become the argument fields for those instructions. There's a bit of assistance in computing addresses, but not a lot beyond that.

The operating system can be treated much like a subroutine library. The "magic numbers" you're asking about are operating system entry points; the call instruction, like a function call in higher level languages, invokes them; they run until they return, at which point your program picks up where it left off. Your OS's user manual will tell you which entry point to invoke to do what, how to set up any arguments required (such as putting the address of the string to be printed in the si register before calling os_print_string, though some may involve pushing values onto the stack rather than putting them in registers), and how to read their returned results if any (again, which registers will have the result or what to pop off the stack).

As far as question (b) goes -- That's all stuff the OS and its device drivers, or a function library linked with your assembly code, will normally handle for you. If you really need to know it (eg because you're writing an OS or device drivers), you'll need to study the documentation for your specific hardware to understand how to communicate with it... but what you'll wind up doing is writing a library of functions which do the necessary work, and packaging it so main programs just invoke those functions. In other words, for most programs I/O is much like working in a higher-level language; the runtime library does all the work and all you need to know is how to use it. (Sane assembler code is critically dependent upon writing good functions so you don't spend time endlessly reinventing wheels!)