Programming Languages – Why Are System Calls Limited to C Language?

operating systemsprogramming-languages

From my Operating System textbook, application and library interact with kernel by system calls.

But as far as I can see, Windows, OS X and Linux, we can only use C Language to post system calls.

That annoyed me when I want to optimize I/O in our Java application — since I can't use system call directly, So I have to guess what system call requested by a Java API(Maybe I should read JVM source code? But I was worried that will take too much time to meet the deadline), then optimize I/O by my guessing.

Since then I got a question, why system calls are limited to C language, and why we can't do that in Python, Java and many other programming languages?

EDIT:

I know OS X, Windows and Linux all are implemented by C, but there is still a question:

If an OS is implemented by a certain programming language, we can only request system call by that programming language? What's the reason for that?

Best Answer

On Windows, OS X and Linux, we can only use C Language to post system calls.

Actually, this is wrong, at least for Linux.

The real system call does not use the same calling convention than C, as defined in the ABI. Details are of course processor specific (so let's focus on x86-64).

(I am not exactly sure of all the details here, you need to check; I've read about them several years ago...)

The actual system call uses a machine instruction like SYSENTER or SYSCALL, and passes the number of the system call (probably in %eax) and its arguments in various (well specified) processor registers. But it does not use the stack pointer at all. So you could (in your machine code) in principle make a system call without any stack, or with an invalid %rsp (e.g. set to nil). In contrast, calling C functions require a valid stack pointer (even if most arguments are passed in registers).

The actual system call uses a different return convention. If the carry bit is set, the system call has failed, and %eax contains the errno code. If the carry bit is cleared, the system call has succeeded, and %rax contains its result.

Therefore, some programming languages implementations can even avoid any C library. For example Scheme's bones. And you can code a program in assembler for Linux without using any libc and without using C calling conventions.

Hence, C standard libraries' implementations need (for every system call), a tiny wrapping function. When you call in C the read(2) "system call" function, you are actually calling a tiny wrapper.

The Linux Assembler HowTo is giving some details. Read also Advanced Linux Programming. But you should also look into the source code of your Linux kernel and your C standard library (e.g. musl-libc has very readable code).

If an OS is implemented by a certain programming language, we can only request system call by that programming language?

This is wrong. The calling convention (using SYSENTER etc....) to the kernel is not the same as for C. You can code system calls in other ways.

Notice that on Linux, some low level utility functions are not system calls (which are exhaustively listed in syscalls(2)...); in particular dlopen(3) & pthread_create(3) & DNS functions like getaddrinfo(3) etc are implemented by several system calls. See also nsswitch.conf(5). And some system calls (e.g. clock_gettime(2)...) avoid the kernel overhead with vdso(7) tricks.

See also OSDev wiki.

Why system calls are limited to C language, and why we can't do that in Python, Java and many other programming languages?

As I explained, systems calls are not limited to C. However, C is very convenient (as a lingua franca or portable assembler-like language). So most programming languages implementors are using it, and often provide some libraries calling system calls of the C standard library. Also standards like POSIX are normalizing & defining "system" functions (e.g. mmap) in terms of C code. Hence the programming language implementors have interest to code in C (his implementation is then likely to be easily portable to various Unix-like or POSIX-like systems ; in other words nearly the same C source code - e.g. of Lua or Guile or Ocaml - implementation is likely to work on various systems like MacOSX, Linux, FreeBSD, TrueOS and possibly even GNU Hurd). So bones is much harder to port from Linux to MacOSX or TrueOS than guile, even if they implement nearly the same programming language (some Scheme dialect).

why we can't do that in Python, Java and many other programming languages?

In practice, all these language implementations (remember that a programming language is a specification written in some document; it is not a software) are using the libc to avoid portability hassles.

(IIRC on some *BSD systems system calls are passing arguments on the machine stack, unlike Linux; but I leave you to check that)

Read also Operating Systems : Three Easy Pieces

Related Topic