Linux – NASM x86_64 assembly in 32-bit mode: Why does this instruction produce RIP-Relative Addressing code

assemblylinuxnasmrelative-addressingx86-64

[bits 32]
    global _start

    section .data
    str_hello       db  "HelloWorld", 0xa
    str_hello_length    db      $-str_hello

    section .text

    _start:

        mov ebx, 1              ; stdout file descriptor
        mov ecx, str_hello      ; pointer to string of characters that will be displayed        
        mov edx, [str_hello_length] ; count outputs Relative addressing
        mov eax, 4              ; sys_write
        int 0x80                ; linux kernel system call

        mov ebx, 0  ; exit status zero
        mov eax, 1  ; sys_exit
        int 0x80    ; linux kernel system call

The fundamental thing here is that I need to have the length of the hello string to pass to linux's sys_write system call. Now, I'm well aware that I can just use EQU and it'll work fine, but I'm really trying to understand what's going on here.

So, basically when I use EQU it loads the value and that's fine.

str_hello_length equ $-str_hello
...
...
mov edx, str_hello_length

However, if I use this line with DB

str_hello_length db $-str_hello
...
...
mov edx, [str_hello_length]     ; of course, without the brackets it'll load the address, which I don't want. I want the value stored at that address

instead of loading the value at that address like I expect it to, the assembler outputs RIP-Relative Addressing, as shown in the gdb debugger and I'm simply just wondering why.

mov    0x6000e5(%rip),%edx        # 0xa001a5

Now, I've tried using the eax register instead(and then moving eax to edx), but then I get a different problem. I end up getting a segmentation fault as noted in gdb:

movabs 0x4b8c289006000e5,%eax

so apparently, different registers produce different code. I guess I need to truncate the upper 32-bits somehow , but I don't know how to do that.

Though did kind of found a 'solution' and it goes like this:
load eax with str_hello_length's address and then load the contents of address that eax points to and everything is hunky dory.

mov eax, str_hello_length       
mov edx, [eax]  ; count


; gdb disassembly
mov    $0x6000e5,%eax
mov    (%rax),%edx

apparently trying to indirectly load a value from a mem address produces different code? I don't really know.

I just need help in understanding the syntax and operations of these instructions, so I can better understand why how to load effective addresses. Yeah, I guess I could've just switched to EQU and be on my merry way, but I really feel I can't go on until I understand what's going on with the DB declaration and loading from it's address.

Best Answer

The answer is it isn't. x86-64 doesn't have RIP-relative addressing in 32-bit emulation mode (this should be obvious because RIP doesn't exist in 32-bit). What's happening is that nasm is compiling you some lovely 32-bit opcodes that you're trying to run as 64-bit. GDB is disassembling your 32-bit opcodes as 64-bit, and telling you that in 64-bit, those bytes mean a RIP-relative mov. 64-bit and 32-bit opcodes on the x86-64 overlap a lot to make use of common decoding logic in the silicon, and you're getting confused because the code that GDB is disassembling looks similar to the 32-bit code you wrote, but in reality you're just throwing garbage bytes at the processor.

This isn't anything to do with nasm. You're using the wrong architecture for the process you're in. Either use 32-bit nasm in a 32-bit process or compile your assembly code for [BITS 64].

Related Topic