Let's consider this C code:
#include <stdio.h>
main()
{
int x=5;
printf("x is ");
printf("%d",5);
}
In this, when we wrote int x=5;
we told the computer that x
is an integer. The computer must remember that x
is an integer. But when we output the value of x
in printf()
we have to again tell the computer that x
is an integer. Why is that?
Why does the computer forget that x
was an integer?
Best Answer
There are two issues at play here:
Issue #1: C is a statically typed language; all type information is determined at compile time. No type information is stored with any object in memory such that its type and size can be determined at run time1. If you examine the memory at any particular address while the program is running, all you'll see is a sludge of bytes; there's nothing to tell you whether that particular address actually contains an object, what the type or size of that object is, or how to interpret those bytes (as an integer, or floating point type, or sequence of characters in a string, etc.). All that information is baked into the machine code when the code is compiled, based on type information specified in the source code; for example, the function definition
tells the compiler to generate the appropriate machine code to handle
x
as an integer,y
as a floating-point value, andz
as a pointer tochar
. Note that any mismatches in the number or type of arguments between a function call and a function definition are only detected when the code is being compiled2; it's only during the compilation phase that any type information is associated with an object.Issue #2:
printf
is a variadic function; it takes one fixed parameter of typeconst char * restrict
(the format string), along with zero or more additional parameters, the number and type of which are not known at compile time:The
printf
function has no way of knowing what the number and types of additional arguments are from the passed arguments themselves; it has to rely on the format string to tell it how to interpret the sludge of bytes on the stack (or in the registers). Even better, because it's a variadic function, arguments with certain types are promoted to a limited set of default types (e.g.,short
is promoted toint
,float
is promoted todouble
, etc.).Again, there's no information associated with the additional arguments themselves to give
printf
any clues on how to interpret or format them. Hence the need for the conversion specifiers in the format string.Note that in addition to telling
printf
the number and type of additional arguments, conversion specifiers also tellprintf
how to format the output (field widths, precision, padding, justification, base (decimal, octal, or hex for integer types), etc.).Edit
To avoid extensive discussion in the comments (and because the chat page is blocked from my work system - yes I'm being a bad boy), I'm going to address the last two questions here.
During translation, the compiler maintains a table (often called a symbol table) that stores information about an object's name, type, storage duration, scope, etc. You declared
b
andc
asfloat
, so any time the compiler sees an expression withb
orc
in it, it will generate the machine code to handle a floating-point value.I took your code above and wrapped a full program around it:
I used the
-g
and-Wa,-aldh
options with gcc to create a listing of the generated machine code interleaved with the C source code3:Here's how to read the assembly listing:
One thing to note here. In the generated assembly code, there are no symbols for
b
orc
; they only exist in the source code listing. Whenmain
executes at runtime, space forb
andc
(along with some other stuff) is allocated from the stack by adjusting the stack pointer:The code refers to those objects by their offset from the frame pointer4, with
b
being -8 bytes from the address stored in the frame pointer andc
being -4 bytes from it, as follows:Since you declared
b
andc
as floats, the compiler generated machine code to specifically handle floating-point values; themovsd
,mulsd
,cvtss2sd
instructions are all specific to floating-point operations, and the registers%xmm0
and%xmm1
are used to store double-precision floating point values.If I change the source code so that
b
andc
are integers instead of floats, the compiler generates different machine code:Compiling with
gcc -o c2 -g -std=c99 -pedantic -Wall -Werror -Wa,-aldh=c2.lst c2.c
gives:Here's the same operation, but with
b
andc
declared as integers:This is what I meant earlier when I said that type information was "baked in" to the machine code. When the program runs, it doesn't examine
b
orc
to determine their type; it already knows what their type should be based on the generated machine code.It doesn't work because you're lying to the compiler. You tell it that
b
is afloat
, so it will generate machine code to handle floating-point values. When you initialize it, the bit pattern corresponding to the constant'H'
will be interpreted as a floating-point value, not a character value.You lie to the compiler again when you use the
%c
conversion specifier, which expects a value of typechar
, for the argumentb
. Because of this,printf
won't interpret the contents ofb
correctly, and you'll wind up with garbage output5. Again,printf
can't know the number or types of any additional arguments based on the arguments themselves; all it sees is an address on the stack (or a bunch of registers). It needs the format string to tell it what additional arguments have been passed, and what their types are.1. The one exception being variable-length arrays; since their size isn't established until runtime, there's no way to evaluate
sizeof
on a VLA at compile time.2. As of C89, anyway. Prior to that, the compiler could only catch mismatches in the function return type; it couldn't detect mismatches in the function parameter lists.
3. This code is generated on a 64-bit SuSE Linux Enterprise 10 system using gcc 4.1.2. If you're on a different implementation (compiler/OS/chip architecture), then the exact machine instructions will be different, but the general point will still hold; the compiler will generate different instructions to handle floats vs. ints vs. strings, etc.
4. When you call a function in a running program, a stack frame is created to store the function arguments, local variables, and the address of the instruction following the function call. A special register called the frame pointer is used to keep track of the current frame.
5. For example, assume a big-endian system where the high-order byte is the addressed byte. The bit pattern for
H
will be stored tob
as0x00000048
. However, because the%c
conversion specifier indicates that the argument should be achar
, only the first byte will be read, soprintf
will try to write the character corresponding to the encoding0x00
.