Following links explain x86-32 system call conventions for both UNIX (BSD flavor) & Linux:
But what are the x86-64 system call conventions on both UNIX & Linux?
abiassemblycalling-conventionlinuxx86-64
Following links explain x86-32 system call conventions for both UNIX (BSD flavor) & Linux:
But what are the x86-64 system call conventions on both UNIX & Linux?
One easy way to understand "ABI" is to compare it to "API".
You are already familiar with the concept of an API. If you want to use the features of, say, some library or your OS, you will program against an API. The API consists of data types/structures, constants, functions, etc that you can use in your code to access the functionality of that external component.
An ABI is very similar. Think of it as the compiled version of an API (or as an API on the machine-language level). When you write source code, you access the library through an API. Once the code is compiled, your application accesses the binary data in the library through the ABI. The ABI defines the structures and methods that your compiled application will use to access the external library (just like the API did), only on a lower level. Your API defines the order in which you pass arguments to a function. Your ABI defines the mechanics of how these arguments are passed (registers, stack, etc.). Your API defines which functions are part of your library. Your ABI defines how your code is stored inside the library file, so that any program using your library can locate the desired function and execute it.
ABIs are important when it comes to applications that use external libraries. Libraries are full of code and other resources, but your program has to know how to locate what it needs inside the library file. Your ABI defines how the contents of a library are stored inside the file, and your program uses the ABI to search through the file and find what it needs. If everything in your system conforms to the same ABI, then any program is able to work with any library file, no matter who created them. Linux and Windows use different ABIs, so a Windows program won't know how to access a library compiled for Linux.
Sometimes, ABI changes are unavoidable. When this happens, any programs that use that library will not work unless they are re-compiled to use the new version of the library. If the ABI changes but the API does not, then the old and new library versions are sometimes called "source compatible". This implies that while a program compiled for one library version will not work with the other, source code written for one will work for the other if re-compiled.
For this reason, developers tend to try to keep their ABI stable (to minimize disruption). Keeping an ABI stable means not changing function interfaces (return type and number, types, and order of arguments), definitions of data types or data structures, defined constants, etc. New functions and data types can be added, but existing ones must stay the same. If, for instance, your library uses 32-bit integers to indicate the offset of a function and you switch to 64-bit integers, then already-compiled code that uses that library will not be accessing that field (or any following it) correctly. Accessing data structure members gets converted into memory addresses and offsets during compilation and if the data structure changes, then these offsets will not point to what the code is expecting them to point to and the results are unpredictable at best.
An ABI isn't necessarily something you will explicitly provide unless you are doing very low-level systems design work. It isn't language-specific either, since (for example) a C application and a Pascal application can use the same ABI after they are compiled.
Edit: Regarding your question about the chapters regarding the ELF file format in the SysV ABI docs: The reason this information is included is because the ELF format defines the interface between operating system and application. When you tell the OS to run a program, it expects the program to be formatted in a certain way and (for example) expects the first section of the binary to be an ELF header containing certain information at specific memory offsets. This is how the application communicates important information about itself to the operating system. If you build a program in a non-ELF binary format (such as a.out or PE), then an OS that expects ELF-formatted applications will not be able to interpret the binary file or run the application. This is one big reason why Windows apps cannot be run directly on a Linux machine (or vice versa) without being either re-compiled or run inside some type of emulation layer that can translate from one binary format to another.
IIRC, Windows currently uses the Portable Executable (or, PE) format. There are links in the "external links" section of that Wikipedia page with more information about the PE format.
Also, regarding your note about C++ name mangling: When locating a function in a library file, the function is typically looked up by name. C++ allows you to overload function names, so name alone is not sufficient to identify a function. C++ compilers have their own ways of dealing with this internally, called name mangling. An ABI can define a standard way of encoding the name of a function so that programs built with a different language or compiler can locate what they need. When you use extern "c"
in a C++ program, you're instructing the compiler to use a standardized way of recording names that's understandable by other software.
One of the things to keep in mind about x86 is that the register name to "reg number" encoding is not obvious; in terms of instruction encoding (the MOD R/M byte, see http://www.c-jump.com/CIS77/CPU/x86/X77_0060_mod_reg_r_m_byte.htm), register numbers 0...7 are - in that order - ?AX
, ?CX
, ?DX
, ?BX
, ?SP
, ?BP
, ?SI
, ?DI
.
Hence choosing A/C/D (regs 0..2) for return value and the first two arguments (which is the "classical" 32bit __fastcall
convention) is a logical choice. As far as going to 64bit is concerned, the "higher" regs are ordered, and both Microsoft and UN*X/Linux went for R8
/ R9
as the first ones.
Keeping that in mind, Microsoft's choice of RAX
(return value) and RCX
, RDX
, R8
, R9
(arg[0..3]) are an understandable selection if you choose four registers for arguments.
I don't know why the AMD64 UN*X ABI chose RDX
before RCX
.
UN*X, on RISC architectures, has traditionally done argument passing in registers - specifically, for the first six arguments (that's so on PPC, SPARC, MIPS at least). Which might be one of the major reasons why the AMD64 (UN*X) ABI designers chose to use six registers on that architecture as well.
So if you want six registers to pass arguments in, and it's logical to choose RCX
, RDX
, R8
and R9
for four of them, which other two should you pick ?
The "higher" regs require an additional instruction prefix byte to select them and therefore have a bigger instruction size footprint, so you wouldn't want to choose any of those if you have options. Of the classical registers, due to the implicit meaning of RBP
and RSP
these aren't available, and RBX
traditionally has a special use on UN*X (global offset table) which seemingly the AMD64 ABI designers didn't want to needlessly become incompatible with.
Ergo, the only choice were RSI
/ RDI
.
So if you have to take RSI
/ RDI
as argument registers, which arguments should they be ?
Making them arg[0]
and arg[1]
has some advantages. See cHao's comment.
?SI
and ?DI
are string instruction source / destination operands, and as cHao mentioned, their use as argument registers means that with the AMD64 UN*X calling conventions, the simplest possible strcpy()
function, for example, only consists of the two CPU instructions repz movsb; ret
because the source/target addresses have been put into the correct registers by the caller. There is, particularly in low-level and compiler-generated "glue" code (think, for example, some C++ heap allocators zero-filling objects on construction, or the kernel zero-filling heap pages on sbrk()
, or copy-on-write pagefaults) an enormous amount of block copy/fill, hence it'll be useful for code so frequently used to save the two or three CPU instructions that'd otherwise load such source/target address arguments into the "correct" registers.
So in a way, UN*X and Win64 are only different in that UN*X "prepends" two additional arguments, in purposefully chosen RSI
/RDI
registers, to the natural choice of four arguments in RCX
, RDX
, R8
and R9
.
There are more differences between the UN*X and Windows x64 ABIs than just the mapping of arguments to specific registers. For the overview on Win64, check:
http://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx
Win64 and AMD64 UN*X also strikingly differ in the way stackspace is used; on Win64, for example, the caller must allocate stackspace for function arguments even though args 0...3 are passed in registers. On UN*X on the other hand, a leaf function (i.e. one that doesn't call other functions) is not even required to allocate stackspace at all if it needs no more than 128 Bytes of it (yes, you own and can use a certain amount of stack without allocating it ... well, unless you're kernel code, a source of nifty bugs). All these are particular optimization choices, most of the rationale for those is explained in the full ABI references that the original poster's wikipedia reference points to.
Best Answer
Further reading for any of the topics here: The Definitive Guide to Linux System Calls
I verified these using GNU Assembler (gas) on Linux.
Kernel Interface
x86-32 aka i386 Linux System Call convention:
In x86-32 parameters for Linux system call are passed using registers.
%eax
for syscall_number. %ebx, %ecx, %edx, %esi, %edi, %ebp are used for passing 6 parameters to system calls.The return value is in
%eax
. All other registers (including EFLAGS) are preserved across theint $0x80
.I took following snippet from the Linux Assembly Tutorial but I'm doubtful about this. If any one can show an example, it would be great.
For an example and a little more reading, refer to http://www.int80h.org/bsdasm/#alternate-calling-convention. Another example of a Hello World for i386 Linux using
int 0x80
: Hello, world in assembly language with Linux system calls?There is a faster way to make 32-bit system calls: using
sysenter
. The kernel maps a page of memory into every process (the vDSO), with the user-space side of thesysenter
dance, which has to cooperate with the kernel for it to be able to find the return address. Arg to register mapping is the same as forint $0x80
. You should normally call into the vDSO instead of usingsysenter
directly. (See The Definitive Guide to Linux System Calls for info on linking and calling into the vDSO, and for more info onsysenter
, and everything else to do with system calls.)x86-32 [Free|Open|Net|DragonFly]BSD UNIX System Call convention:
Parameters are passed on the stack. Push the parameters (last parameter pushed first) on to the stack. Then push an additional 32-bit of dummy data (Its not actually dummy data. refer to following link for more info) and then give a system call instruction
int $0x80
http://www.int80h.org/bsdasm/#default-calling-convention
x86-64 Linux System Call convention:
(Note: x86-64 Mac OS X is similar but different from Linux. TODO: check what *BSD does)
Refer to section: "A.2 AMD64 Linux Kernel Conventions" of System V Application Binary Interface AMD64 Architecture Processor Supplement. The latest versions of the i386 and x86-64 System V psABIs can be found linked from this page in the ABI maintainer's repo. (See also the x86 tag wiki for up-to-date ABI links and lots of other good stuff about x86 asm.)
Here is the snippet from this section:
Remember this is from the Linux-specific appendix to the ABI, and even for Linux it's informative not normative. (But it is in fact accurate.)
This 32-bit
int $0x80
ABI is usable in 64-bit code (but highly not recommended). What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? It still truncates its inputs to 32-bit, so it's unsuitable for pointers, and it zeros r8-r11.User Interface: function calling
x86-32 Function Calling convention:
In x86-32 parameters were passed on stack. Last parameter was pushed first on to the stack until all parameters are done and then
call
instruction was executed. This is used for calling C library (libc) functions on Linux from assembly.Modern versions of the i386 System V ABI (used on Linux) require 16-byte alignment of
%esp
before acall
, like the x86-64 System V ABI has always required. Callees are allowed to assume that and use SSE 16-byte loads/stores that fault on unaligned. But historically, Linux only required 4-byte stack alignment, so it took extra work to reserve naturally-aligned space even for an 8-bytedouble
or something.Some other modern 32-bit systems still don't require more than 4 byte stack alignment.
x86-64 System V user-space Function Calling convention:
x86-64 System V passes args in registers, which is more efficient than i386 System V's stack args convention. It avoids the latency and extra instructions of storing args to memory (cache) and then loading them back again in the callee. This works well because there are more registers available, and is better for modern high-performance CPUs where latency and out-of-order execution matter. (The i386 ABI is very old).
In this new mechanism: First the parameters are divided into classes. The class of each parameter determines the manner in which it is passed to the called function.
For complete information refer to : "3.2 Function Calling Sequence" of System V Application Binary Interface AMD64 Architecture Processor Supplement which reads, in part:
So
%rdi, %rsi, %rdx, %rcx, %r8 and %r9
are the registers in order used to pass integer/pointer (i.e. INTEGER class) parameters to any libc function from assembly. %rdi is used for the first INTEGER parameter. %rsi for 2nd, %rdx for 3rd and so on. Thencall
instruction should be given. The stack (%rsp
) must be 16B-aligned whencall
executes.If there are more than 6 INTEGER parameters, the 7th INTEGER parameter and later are passed on the stack. (Caller pops, same as x86-32.)
The first 8 floating point args are passed in %xmm0-7, later on the stack. There are no call-preserved vector registers. (A function with a mix of FP and integer arguments can have more than 8 total register arguments.)
Variadic functions (like
printf
) always need%al
= the number of FP register args.There are rules for when to pack structs into registers (
rdx:rax
on return) vs. in memory. See the ABI for details, and check compiler output to make sure your code agrees with compilers about how something should be passed/returned.Note that the Windows x64 function calling convention has multiple significant differences from x86-64 System V, like shadow space that must be reserved by the caller (instead of a red-zone), and call-preserved xmm6-xmm15. And very different rules for which arg goes in which register.