There are two issues at play here:
Issue #1: C is a statically typed language; all type information is determined at compile time. No type information is stored with any object in memory such that its type and size can be determined at run time1. If you examine the memory at any particular address while the program is running, all you'll see is a sludge of bytes; there's nothing to tell you whether that particular address actually contains an object, what the type or size of that object is, or how to interpret those bytes (as an integer, or floating point type, or sequence of characters in a string, etc.). All that information is baked into the machine code when the code is compiled, based on type information specified in the source code; for example, the function definition
void foo( int x, double y, char *z )
{
...
}
tells the compiler to generate the appropriate machine code to handle x
as an integer, y
as a floating-point value, and z
as a pointer to char
. Note that any mismatches in the number or type of arguments between a function call and a function definition are only detected when the code is being compiled2; it's only during the compilation phase that any type information is associated with an object.
Issue #2: printf
is a variadic function; it takes one fixed parameter of type const char * restrict
(the format string), along with zero or more additional parameters, the number and type of which are not known at compile time:
int printf( const char * restrict fmt, ... );
The printf
function has no way of knowing what the number and types of additional arguments are from the passed arguments themselves; it has to rely on the format string to tell it how to interpret the sludge of bytes on the stack (or in the registers). Even better, because it's a variadic function, arguments with certain types are promoted to a limited set of default types (e.g., short
is promoted to int
, float
is promoted to double
, etc.).
Again, there's no information associated with the additional arguments themselves to give printf
any clues on how to interpret or format them. Hence the need for the conversion specifiers in the format string.
Note that in addition to telling printf
the number and type of additional arguments, conversion specifiers also tell printf
how to format the output (field widths, precision, padding, justification, base (decimal, octal, or hex for integer types), etc.).
Edit
To avoid extensive discussion in the comments (and because the chat page is blocked from my work system - yes I'm being a bad boy), I'm going to address the last two questions here.
IF I do this: float b;
float c;
b=3.1;
c=(5.0/9.0)*(b);
In the last statement how does the compiler know that b is of type float?
During translation, the compiler maintains a table (often called a symbol table) that stores information about an object's name, type, storage duration, scope, etc. You declared b
and c
as float
, so any time the compiler sees an expression with b
or c
in it, it will generate the machine code to handle a floating-point value.
I took your code above and wrapped a full program around it:
/**
* c1.c
*/
#include <stdio.h>
int main( void )
{
float b;
float c;
b = 3.1;
c = (5.0 / 9.0) * b;
printf( "c = %f\n", c );
return 0;
}
I used the -g
and -Wa,-aldh
options with gcc to create a listing of the generated machine code interleaved with the C source code3:
GAS LISTING /tmp/ccmGgGG2.s page 1
1 .file "c1.c"
9 .Ltext0:
10 .section .rodata
11 .LC2:
12 0000 63203D20 .string "c = %f\n"
12 25660A00
13 .align 8
14 .LC1:
15 0008 721CC771 .long 1908874354
16 000c 1CC7E13F .long 1071761180
17 .text
18 .globl main
20 main:
21 .LFB2:
22 .file 1 "c1.c"
1:c1.c **** #include <stdio.h>
2:c1.c **** int main( void )
3:c1.c **** {
23 .loc 1 3 0
24 0000 55 pushq %rbp
25 .LCFI0:
26 0001 4889E5 movq %rsp, %rbp
27 .LCFI1:
28 0004 4883EC10 subq $16, %rsp
29 .LCFI2:
4:c1.c **** float b;
5:c1.c **** float c;
6:c1.c **** b = 3.1;
30 .loc 1 6 0
31 0008 B8666646 movl $0x40466666, %eax
31 40
32 000d 8945F8 movl %eax, -8(%rbp)
7:c1.c **** c = (5.0 / 9.0) * b;
33 .loc 1 7 0
34 0010 F30F5A4D cvtss2sd -8(%rbp), %xmm1
34 F8
35 0015 F20F1005 movsd .LC1(%rip), %xmm0
35 00000000
36 001d F20F59C1 mulsd %xmm1, %xmm0
37 0021 F20F5AC0 cvtsd2ss %xmm0, %xmm0
38 0025 F30F1145 movss %xmm0, -4(%rbp)
38 FC
8:c1.c ****
9:c1.c **** printf( "c = %f\n", c );
39 .loc 1 9 0
40 002a F30F5A45 cvtss2sd -4(%rbp), %xmm0
40 FC
41 002f BF000000 movl $.LC2, %edi
41 00
42 0034 B8010000 movl $1, %eax
42 00
43 0039 E8000000 call printf
43 00
10:c1.c **** return 0;
44 .loc 1 10 0
45 003e B8000000 movl $0, %eax
GAS LISTING /tmp/ccmGgGG2.s page 2
11:c1.c **** }
46 .loc 1 11 0
47 0043 C9 leave
48 0044 C3 ret
Here's how to read the assembly listing:
40 002a F30F5A45 cvtss2sd -4(%rbp), %xmm0
40 FC
^ ^ ^ ^ ^
| | | | |
| | | | +-- Instruction operands
| | | +------------------ Instruction mnemonic
| | +---------------------------------------- Actual machine code (instruction and operands)
| +--------------------------------------------- Byte offset of instruction from subroutine entry point
+------------------------------------------------ Line number of assembly listing
One thing to note here. In the generated assembly code, there are no symbols for b
or c
; they only exist in the source code listing. When main
executes at runtime, space for b
and c
(along with some other stuff) is allocated from the stack by adjusting the stack pointer:
subq $16, %rsp
The code refers to those objects by their offset from the frame pointer4, with b
being -8 bytes from the address stored in the frame pointer and c
being -4 bytes from it, as follows:
7:c1.c **** c = (5.0 / 9.0) * b;
.loc 1 7 0
cvtss2sd -8(%rbp), %xmm1 ;; converts contents of b from single- to double-
;; precision float, stores result to floating-
;; point register xmm1
movsd .LC1(%rip), %xmm0 ;; writes the pre-computed value of 5.0/9.0
;; to floating point register xmm0
mulsd %xmm1, %xmm0 ;; multiply contents of xmm1 by xmm0, store result
;; in xmm0
cvtsd2ss %xmm0, %xmm0 ;; convert result in xmm0 from double- to single-
;; precision float
movss %xmm0, -4(%rbp) ;; save result to c
Since you declared b
and c
as floats, the compiler generated machine code to specifically handle floating-point values; the movsd
, mulsd
, cvtss2sd
instructions are all specific to floating-point operations, and the registers %xmm0
and %xmm1
are used to store double-precision floating point values.
If I change the source code so that b
and c
are integers instead of floats, the compiler generates different machine code:
/**
* c2.c
*/
#include <stdio.h>
int main( void )
{
int b;
int c;
b = 3;
c = (9 / 4) * b; // changed these values since integer 5/9 == 0, making for
// some really boring machine code.
printf( "c = %d\n", c );
return 0;
}
Compiling with gcc -o c2 -g -std=c99 -pedantic -Wall -Werror -Wa,-aldh=c2.lst c2.c
gives:
GAS LISTING /tmp/ccyxHwid.s page 1
1 .file "c2.c"
9 .Ltext0:
10 .section .rodata
11 .LC0:
12 0000 63203D20 .string "c = %d\n"
12 25640A00
13 .text
14 .globl main
16 main:
17 .LFB2:
18 .file 1 "c2.c"
1:c2.c **** #include <stdio.h>
2:c2.c **** int main( void )
3:c2.c **** {
19 .loc 1 3 0
20 0000 55 pushq %rbp
21 .LCFI0:
22 0001 4889E5 movq %rsp, %rbp
23 .LCFI1:
24 0004 4883EC10 subq $16, %rsp
25 .LCFI2:
4:c2.c **** int b;
5:c2.c **** int c;
6:c2.c **** b = 3;
26 .loc 1 6 0
27 0008 C745F803 movl $3, -8(%rbp)
27 000000
7:c2.c **** c = (9 / 4) * b;
28 .loc 1 7 0
29 000f 8B45F8 movl -8(%rbp), %eax
30 0012 01C0 addl %eax, %eax
31 0014 8945FC movl %eax, -4(%rbp)
8:c2.c ****
9:c2.c **** printf( "c = %d\n", c );
32 .loc 1 9 0
33 0017 8B75FC movl -4(%rbp), %esi
34 001a BF000000 movl $.LC0, %edi
34 00
35 001f B8000000 movl $0, %eax
35 00
36 0024 E8000000 call printf
36 00
10:c2.c **** return 0;
37 .loc 1 10 0
38 0029 B8000000 movl $0, %eax
38 00
11:c2.c **** }
39 .loc 1 11 0
40 002e C9 leave
41 002f C3 ret
Here's the same operation, but with b
and c
declared as integers:
7:c2.c **** c = (9 / 4) * b;
.loc 1 7 0
movl -8(%rbp), %eax ;; copy value of b to register eax
addl %eax, %eax ;; since 9/4 == 2 (integer arithmetic), double the
;; value in eax
movl %eax, -4(%rbp) ;; write result to c
This is what I meant earlier when I said that type information was "baked in" to the machine code. When the program runs, it doesn't examine b
or c
to determine their type; it already knows what their type should be based on the generated machine code.
If the compiler determines the type and size at run time then why doesn't the following program works:
float b='H';
printf(" value of b is %c \n",b);
It doesn't work because you're lying to the compiler. You tell it that b
is a float
, so it will generate machine code to handle floating-point values. When you initialize it, the bit pattern corresponding to the constant 'H'
will be interpreted as a floating-point value, not a character value.
You lie to the compiler again when you use the %c
conversion specifier, which expects a value of type char
, for the argument b
. Because of this, printf
won't interpret the contents of b
correctly, and you'll wind up with garbage output5. Again, printf
can't know the number or types of any additional arguments based on the arguments themselves; all it sees is an address on the stack (or a bunch of registers). It needs the format string to tell it what additional arguments have been passed, and what their types are.
1. The one exception being variable-length arrays; since their size isn't established until runtime, there's no way to evaluate sizeof
on a VLA at compile time.
2. As of C89, anyway. Prior to that, the compiler could only catch mismatches in the function return type; it couldn't detect mismatches in the function parameter lists.
3. This code is generated on a 64-bit SuSE Linux Enterprise 10 system using gcc 4.1.2. If you're on a different implementation (compiler/OS/chip architecture), then the exact machine instructions will be different, but the general point will still hold; the compiler will generate different instructions to handle floats vs. ints vs. strings, etc.
4. When you call a function in a running program, a stack frame is created to store the function arguments, local variables, and the address of the instruction following the function call. A special register called the frame pointer is used to keep track of the current frame.
5. For example, assume a big-endian system where the high-order byte is the addressed byte. The bit pattern for H
will be stored to b
as 0x00000048
. However, because the %c
conversion specifier indicates that the argument should be a char
, only the first byte will be read, so printf
will try to write the character corresponding to the encoding 0x00
.
Best Answer
You are comparing variable declarations to
#define
s, which is incorrect. With a#define
, you create a mapping between an identifier and a snippet of source code. The C preprocessor will then literally substitute any occurrences of that identifier with the provided snippet. Writingends up being the same thing to the compiler as writing
Think of it as automated copy&paste.
Also, normal variables can be reassigned, while a macro created with
#define
can not (although you can re-#define
it). The expressionFOO = 7
would be a compiler error, since we can't assign to “rvalues”:40 + 2 = 7
is illegal.So, why do we need types at all? Some languages apparently get rid of types, this is especially common in scripting languages. However, they usually have something called “dynamic typing” where variables don't have fixed types, but values have. While this is far more flexible, it's also less performant. C likes performance, so it has a very simple and efficient concept of variables:
There's a stretch of memory called the “stack”. Each local variable corresponds to an area on the stack. Now the question is how many bytes long does this area have to be? In C, each type has a well-defined size which you can query via
sizeof(type)
. The compiler needs to know the type of each variable so that it can reserve the correct amount of space on the stack.Why don't constants created with
#define
need a type annotation? They are not stored on the stack. Instead,#define
creates reusable snippets of source code in a slightly more maintainable manner than copy&paste. Literals in the source code such as"foo"
or42.87
are stored by the compiler either inline as special instructions, or in a separate data section of the resulting binary.However, literals do have types. A string literal is a
char *
.42
is anint
but can also be used for shorter types (narrowing conversion).42.8
would be adouble
. If you have a literal and want it to have a different type (e.g. to make42.8
afloat
, or42
anunsigned long int
), then you can use suffixes – a letter after the literal that changes how the compiler treats that literal. In our case, we might say42.8f
or42ul
.Some languages have static typing as in C, but the type annotations are optional. Examples are ML, Haskell, Scala, C#, C++11, and Go. How does that work? Magic? No, this is called “type inference”. In C# and Go, the compiler looks at the right hand side of an assignment, and deduces the type of that. This is fairly straightforward if the right hand side is a literal such as
42ul
. Then it's obvious what the type of the variable should be. Other languages also have more complex algorithms that take into account how a variable is used. E.g. if you dox/2
, thenx
can't be a string but must have some numeric type.