How to link a gas assembly program that uses the C standard library with ld without using gcc

assemblybinutilsgnu-assemblerld

As an exercise to learn more precisely how c programs work and what minimum level of content must exist for a program to be able to use libc, I've taken it upon myself to attempt to program primarily in x86 assembly using gas and ld.

As a fun little challenge, I've successfully assembled and linked several programs linked to different self-made dynamic libraries, but I have failed to be able to code a program from scratch to use libc function calls without directly using gcc.

I understand the calling conventions of individual c library functions, and have thoroughly inspected programs compiled out of gcc through use of objdump and readelf, but haven't gotten anywhere as far as what information to include in a gas assembly file and what parameters to invoke in ld to successfully link to libc. Anyone have any insight to this?

I'm running Linux, on an x86 machine.

Best Answer

There are at least three things that you need to do to successfully use libc with dynamic linking:

Link /usr/lib/crt1.o, which contains _start, which will be the entry point for the ELF binary;
Link /usr/lib/crti.o (before libc) and /usr/lib/crtn.o (after), which provide some initialisation and finalisation code;
Tell the linker that the binary will use the dynamic linker, /lib/ld-linux.so.

For example:

$ cat hello.s
 .text
 .globl main
main:
 push %ebp
 mov %esp, %ebp
 pushl $hw_str
 call puts
 add $4, %esp
 xor %eax, %eax
 leave
 ret

 .data
hw_str:
 .asciz "Hello world!"

$ as -o hello.o hello.s
$ ld -o hello -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o /usr/lib/crti.o -lc hello.o /usr/lib/crtn.o
$ ./hello
Hello world!
$

Related Solutions

Do programming language compilers first translate to assembly or directly to machine code

gcc actually produces assembler and assembles it using the as assembler. Not all compilers do this - the MS compilers produce object code directly, though you can make them generate assembler output. Translating assembler to object code is a pretty simple process, at least compared with compilation.

Some compilers produce other high-level language code as their output - for example, cfront, the first C++ compiler produced C as its output which was then compiled by a C compiler.

Note that neither direct compilation or assembly actually produce an executable. That is done by the linker, which takes the various object code files produced by compilation/assembly, resolves all the names they contain and produces the final executable binary.

Using scanf with x86-64 GAS assembly

As you feared, movq %rcx, %rsi is not correct. You need to pass a pointer to memory. Registers are not part of the memory address space and thus you can't have pointers to them. You need to allocate storage either globally or locally. Incidentally, you should not put your data (especially writable) into the default .text section, as that is intended for code and is typically read-only. Also, calling convention usually mandates 16 byte stack pointer alignment, so you should take care of that too.

.globl main

main:
    push %rbp           # keep stack aligned
    mov  $0, %eax       # clear AL (zero FP args in XMM registers)
    leaq f(%rip), %rdi  # load format string
    leaq x(%rip), %rsi  # set storage to address of x
    call scanf
    pop %rbp
    ret

.data

f:  .string "%d"         # could be in .rodata instead
x:  .long 0

(If your environment expects a leading underscore on symbols, then use _main, and probably _scanf.)

There are actually 3 choices for putting addresses of symbols / labels into registers. RIP-relative LEA is the standard way on x86-64. How to load address of function or label into register in GNU Assembler

As an optimization if your variables are in the lower 4GiB of the address space, e.g. in a Linux non-PIE (position-dependent) executable, you can use 32-bit absolute immediates:

    mov  $f, %edi       # load format string
    mov  $x, %esi       # set storage to address of x

movq $f, %rdi would use a 32-bit sign-extended immediate (instead of implicit zero-extension into RDI from writing EDI), but has the same code-size as a RIP-relative LEA.

You can also load the full 64 bit absolute address using the mnemonic movabsq. But don't do that because a 10-byte instruction is bad for code-size, and still needs a runtime fixup because it's not position-independent.

    movabsq $f, %rdi # load format string
    movabsq $x, %rsi # set storage to address of x

Upon request: using a local variable for the output could look like:

    subq  $8, %rsp       # allocate 8 bytes from stack
    xor   %eax, %eax     # clear AL (and RAX)
    leaq  f(%rip), %rdi  # load format string
    movq  %rsp, %rsi     # set storage to local variable
    call  scanf
    addq  $8, %rsp       # restore stack
    ret

Best Answer

Related Solutions

Do programming language compilers first translate to assembly or directly to machine code

Using scanf with x86-64 GAS assembly

Related Topic