High-Level Overview of How printf() Works in Windows OS

cstlvisual studio 2010windows

I asked this question on an IRC channel, sadly I am going around in circles.

I am aiming for a high level overview (but with some technical details if necessary) on how a function such as printf() from stdio.h "talks" to the Windows operating system.

I know a bit about MSVCRT.dll, bits of the Windows API such as how it communicates with kernel32.dll which goes to the Native API in ntdll.dll. I think VisualStudio uses msvcrt100.dll for C Standard Library…

Somebody recommended using an open source C standard library to understand what happens under the hood. Aside from not knowing how to use one in my Visual Studio project, I would still not know HOW this communicates with the OS.

My terrible understanding which misses several steps is as follows:

1) Syntax of printf() is checked against header stdio.h

2) When program is run it uses msvcrt100 for printf

3) msvcrt100 then loads the necessary Windows library such as kernel32

4) Kernel32 then passes it onto ntdll.dll

This question focuses on C, but if C++ is the same feel free to post.

Windows only.

Best Answer

While the specific details vary between operating systems, you probably want to start somewhat with an understand of

an object file format (.obj), created by the compiler/assembler, and
an executable file format (.exe), formed by by the linker
a dll file format, also formed (and consumed) by the linker

These (disc-based) file formats contain sections for machine code, initial program data values, relocations and a symbol table as well.

Before the code can be executed by the hardware, all symbolically described values (i.e. main, printf) have to be fully resolved to memory locations.

The compiler (and assembler) have incomplete information at compile time about the final memory locations of various pieces of code and data — so, they share this information in the object file symbolically rather than as final memory addresses — this is where the relocations and symbol table come into play.

The symbol table has both imports and exports — the exports gives string names to offsets within sections (code/data) the object file; whereas the imports are associated with just string names (externals for later resolution).

Relocations tell the consumer of the object file how & where to fix up the machine code and data, once the memory addresses of the symbols is known; thus, relocations typically have references to entries in the symbol table (they also reference sections in the object module...).

A .exe file for a program has one special location main that is marked as the entry point in the headers for the executable. A .exe for a program also has all external references resolved, one way or another — however, the linker that produces the executable file still does not have complete information about the memory addresses of all of the symbols as that usually doesn't happen until load time. So, the .exe file still has relocations and a symbol table.

The .exe combines all the .obj file's into a single file: the code sections are concatenated, as are the data sections. As compared with the .obj file, some of the obj file's relocations can be resolved and thus removed from the .exe file; other relocations can have their form simplified (referring to the code or data section rather than referring particular symbols — this makes for a shorter relocation entry).

Typically the operating system loader ultimately determines the locations of code and data sections, and this completes the assignment of memory address to symbols, meaning that any outstanding relocations can now be performed.

A DLL (.dll) file is like a .exe file except that it has specific exports. The .exe file created by the linker may have references to .dll files. These reference are read by a dynamic linking loader, so that the operation of loading an .exe file require also loading a .dll file, which may further require additional .dll files to be loaded. All of them are cross linked to each other as per their relocation entries.

The hardware then executes machine code instructions like call printf, where the reference to printf is specified by its memory location. There are many approaches, some involving jump tables or code sequences of several instructions; however, suffice it to say that during execution of the machine code, the hardware only sees/knows about memory addresses and not symbol names. All symbolic references are eventually resolved to memory addresses by the system of relocations and symbol table entries.

printf, when appropriate (e.g. local buffer full), will ultimately invoke a system call of some sort to perform i/o such as flushing the local buffer to disc or device. A system call is a way for a user process to request an operation of the operating system. System calls similar to regular calls except that the caller is in the user process and the callee is an operating system entry point. System calls provide a controlled method of raising the privilege (from user to kernel) as needed to provide access shared devices. System calls don't use memory address, but instead each system call is associated with a simple integer index — this allows the .exe's and .dll's that call into the operating system to remain unaware of kernel memory addresses.

Related Solutions

C Memory Usage – How Variable Assignment Works

How could it be possible to use a single n box as my tutor taught? Won't a new n box be created whenever factorial(n - 1) is called?

Yes, exactly. Each time you call a function, additional memory is set aside for the function arguments and local variables.

Let's drop a couple of levels below C and talk about how a compiled program is laid out in memory¹. We're going to assume a 32-bit, x86-like system for this discussion.

While the details vary from platform to platform, the general concepts are the same for most systems you're likely to work on. When a compiled program is loaded into memory, it's usually divided up into several logical segments; one segment contains the machine code, another contains constant (read-only) data items, another contains space for non-constant, global data items, another contains space for the stack, another contains the space for the heap, etc. The general layout looks something like the following (taken from this page):

              +------------------------+ 
high address  | Command line arguments |   
              | and environment vars   |  
              +------------------------+
              |         stack          |
              | - - - - - - - - - - -  |
              |           |            |
              |           V            |
              |                        |
              |           ^            |
              |           |            |
              | - - - - - - - - - - -  |
              |          heap          |
              +------------------------+
              |    global and read-    |
              |       only data        |
              +------------------------+
 low address  |     program text       |
              |    (machine code)      |
              +------------------------+

As your program runs, it uses the stack to store function arguments and local variables, along with some program state (previous frame pointer, address to the next instruction). On x86-like systems, the stack grows "downwards", towards decreasing addresses. Among the registers² provided, there are usually two to keep track of what's on the stack. There's the stack pointer, which points to the most recent item on the stack, and the frame pointer³, which points to the current stack frame. A stack frame is a region of the stack that contains the arguments and local variables for the current function, along with any saved program state:

             + - - - - - - - - -  +
high address |  previous stack    |
             |       frame        |
             + - - - - - - - - -  +
             |  previous stack    |
             |       frame        |
             +--------------------+  ----------------------+
             | function arguments |                        |
             |        ...         |                        |
             +--------------------+                        |
             |   return address   |                        |
             +--------------------+                        +-- current stack frame
             | old frame pointer  | <-- frame pointer      |
             +--------------------+                        |
             |  local variables   |                        |
             |        ...         | <-- stack pointer      |
             +--------------------+ -----------------------+  
 low address |  available memory  |       
             + - - - - - - - - -  +

On the systems I'm familiar with, the function arguments and program state are stored "above" the location stored in the frame pointer, and local variables are stored "below" it. Within the generated assembly code, arguments and variables are referred to by their offset from the address stored in the frame pointer. On x86. it would look something like

cmpl  $1, 8(%ebp)   ;; compare the literal value 1 to the first function argument
movl  $1, -4(%ebp)  ;; write the literal value 1 to the first local variable

The first function argument is stored 8 bytes "above" the address value stored in the frame pointer %ebp, while the first local variable would be 4 bytes "below" it (note that this assumes a 32-bit system).

The exact size of the frame depends on how many arguments and local variables need to be stored, although the platform may also require it to be aligned to certain boundary (IOW, the size of the frame must be a multiple of some number of bytes, usually 4 or 8, sometimes 16; for the purposes of this discussion, we're not going to worry about alignment).

Suppose your factorial function is called with an argument value of 3:

x = factorial( 3 );

A new stack frame will be set up that looks something like this:

 +--------------------+
 |         3          | <-- n
 +--------------------+
 |   return address   |
 +--------------------+
 |  old frame pointer | <-- frame pointer, stack pointer
 +--------------------+

where the "return address" is the address of the instruction immediately following the function call in the caller (in this case, it would be the instruction that assigns the result of factorial(3) to x).

Since there are no local variables, no space is allocated for them in the stack frame. Now, since n is not less than 2, you execute the line

n = n * factorial( n - 1 );

This means factorial is called again, this time with an argument value of 2. A new stack frame is created

 +--------------------+
 |         3          | <-- n
 +--------------------+
 |   return address   |
 +--------------------+
 |  old frame pointer | 
 +--------------------+
 |         2          | <-- n
 +--------------------+
 |   return address   |
 +--------------------+
 |  old frame pointer | <-- frame pointer, stack pointer
 +--------------------+

n is still not less than 2, so we call factorial again with an argument of 1:

 +--------------------+
 |         3          | <-- n
 +--------------------+
 |   return address   |
 +--------------------+
 |  old frame pointer | 
 +--------------------+
 |         2          | <-- n
 +--------------------+
 |   return address   |
 +--------------------+
 |  old frame pointer | 
 +--------------------+
 |         1          | <-- n
 +--------------------+
 |   return address   |
 +--------------------+
 |  old frame pointer | <-- frame pointer, stack pointer
 +--------------------+

1 is less than 2, so we execute the line

return n;

The common convention for returning values from functions is to write the value to a particular register. On x86, integer values are returned through register %eax, so the compiled code will do something like

movl  8(%ebp), %eax

Now register %eax will contain the value 1.

When the function returns, the stack and frame pointers are restored to their previous values (the stack contents are left as-is):

 +--------------------+
 |         3          | <-- n
 +--------------------+
 |   return address   |
 +--------------------+
 |  old frame pointer | 
 +--------------------+
 |         2          | <-- n
 +--------------------+
 |   return address   |
 +--------------------+
 |  old frame pointer | <-- frame pointer, stack pointer
 +--------------------+
 |         1          | <-- n
 +--------------------+
 |   return address   |
 +--------------------+
 |  old frame pointer | 
 +--------------------+

So now we need to execute the remainder of

n = n * factorial( n - 1 );

which will look something like⁴

movl %eax, %edx     ;; save the value returned from factorial(1) to register %edx
movl 8(%ebp), %eax  ;; move the value of n (2) to %eax
imull %eax, %edx    ;; multiply the values in %eax and %edx, save the result in
                    ;; %eax

Once again, we return n and exit the function, bringing us back to the previous call:

 +--------------------+
 |         3          | <-- n
 +--------------------+
 |   return address   |
 +--------------------+
 |  old frame pointer | <-- frame pointer, stack pointer
 +--------------------+
 |         2          | <-- n
 +--------------------+
 |   return address   |
 +--------------------+
 |  old frame pointer | 
 +--------------------+
 |         1          | <-- n
 +--------------------+
 |   return address   |
 +--------------------+
 |  old frame pointer | 
 +--------------------+

Lather, rinse, repeat. The value returned in %eax (2) is multiplied by the value in the argument 8(%ebp) (3), and we write 6 back to %eax.

Now, the C language definition says nothing about stacks, or stack frames, or registers, or anything like that; it only specifies the behavior of variables within a program, not how to implement that behavior. However, much of C's design is based on what most computer systems already provide (this is what people mean when they say that C is "close to the hardware"), so this is a very natural way of implementing that behavior. If you were working on a machine that had no stack, C programs would be somewhat more difficult to implement.

^{1. By which I mean a virtual address space, which is an idealized model of memory that your program sees; the OS and memory controllers will map this idealized model to physical RAM.

2. A register is a memory cell on the CPU itself. There are usually several special-purpose registers (program counter, stack pointer, frame pointer) and some number of general-purpose registers. Reading and writing to registers is faster than reading and writing to RAM.

3. On 32-bit x86, the stack pointer register is esp and the frame pointer register is ebp (a.k.a. the base pointer). On 64-bit x86, they're named rsp and rbp.

4. This is how the compiled code on my system does it, anyway.}

How to deal with Windows .NET configuration files

I think part of the issues you're facing is in the premise:

Upgrading an application with a modified app.config is hard. Either it overwrites the existing configuration file or the upgraded app.config isn't installed at all.

Don't do that, basically. Configuration is a software artefact, and it should be versioned and go through the same lifecycle as any other artefact, subject to change control and whatever other safeguards your team implements for software artefacts.

I suppose realistically, in a production environment you are likely to have hacks made to the application config to fix problems in an emergency. Ideally the immediate consequence of having to do that would be a properly-managed change that would be deployed to production formally as soon as humanly possible. Therefore a scheduled upgrade would already have this change, and regression shouldn't be a problem.

Criticising the premise probably doesn't provide an adequate answer, so could you expand your question to provide an example of the kind of settings you find to be problematic?

Best Answer

Related Solutions

C Memory Usage – How Variable Assignment Works

How to deal with Windows .NET configuration files

Related Topic