Linux – What does opening a file actually do

clinux

In all programming languages (that I use at least), you must open a file before you can read or write to it.

But what does this open operation actually do?

Manual pages for typical functions dont actually tell you anything other than it 'opens a file for reading/writing':

http://www.cplusplus.com/reference/cstdio/fopen/

https://docs.python.org/3/library/functions.html#open

Obviously, through usage of the function you can tell it involves creation of some kind of object which facilitates accessing a file.

Another way of putting this would be, if I were to implement an open function, what would it need to do on Linux?

Best Answer

In almost every high-level language, the function that opens a file is a wrapper around the corresponding kernel system call. It may do other fancy stuff as well, but in contemporary operating systems, opening a file must always go through the kernel.

This is why the arguments of the fopen library function, or Python's open closely resemble the arguments of the open(2) system call.

In addition to opening the file, these functions usually set up a buffer that will be consequently used with the read/write operations. The purpose of this buffer is to ensure that whenever you want to read N bytes, the corresponding library call will return N bytes, regardless of whether the calls to the underlying system calls return less.

I am not actually interested in implementing my own function; just in understanding what the hell is going on...'beyond the language' if you like.

In Unix-like operating systems, a successful call to open returns a "file descriptor" which is merely an integer in the context of the user process. This descriptor is consequently passed to any call that interacts with the opened file, and after calling close on it, the descriptor becomes invalid.

It is important to note that the call to open acts like a validation point at which various checks are made. If not all of the conditions are met, the call fails by returning -1 instead of the descriptor, and the kind of error is indicated in errno. The essential checks are:

  • Whether the file exists;
  • Whether the calling process is privileged to open this file in the specified mode. This is determined by matching the file permissions, owner ID and group ID to the respective ID's of the calling process.

In the context of the kernel, there has to be some kind of mapping between the process' file descriptors and the physically opened files. The internal data structure that is mapped to the descriptor may contain yet another buffer that deals with block-based devices, or an internal pointer that points to the current read/write position.