C Programming – Why Specify Data Type in printf() in C?

ciotype-safety

Let's consider this C code:

#include <stdio.h>

main()
{
  int x=5;
  printf("x is ");
  printf("%d",5);
}

In this, when we wrote int x=5; we told the computer that x is an integer. The computer must remember that x is an integer. But when we output the value of x in printf() we have to again tell the computer that x is an integer. Why is that?

Why does the computer forget that x was an integer?

Best Answer

There are two issues at play here:

Issue #1: C is a statically typed language; all type information is determined at compile time. No type information is stored with any object in memory such that its type and size can be determined at run time¹. If you examine the memory at any particular address while the program is running, all you'll see is a sludge of bytes; there's nothing to tell you whether that particular address actually contains an object, what the type or size of that object is, or how to interpret those bytes (as an integer, or floating point type, or sequence of characters in a string, etc.). All that information is baked into the machine code when the code is compiled, based on type information specified in the source code; for example, the function definition

void foo( int x, double y, char *z )
{
  ...
}

tells the compiler to generate the appropriate machine code to handle x as an integer, y as a floating-point value, and z as a pointer to char. Note that any mismatches in the number or type of arguments between a function call and a function definition are only detected when the code is being compiled²; it's only during the compilation phase that any type information is associated with an object.

Issue #2: printf is a variadic function; it takes one fixed parameter of type const char * restrict (the format string), along with zero or more additional parameters, the number and type of which are not known at compile time:

int printf( const char * restrict fmt, ... );

The printf function has no way of knowing what the number and types of additional arguments are from the passed arguments themselves; it has to rely on the format string to tell it how to interpret the sludge of bytes on the stack (or in the registers). Even better, because it's a variadic function, arguments with certain types are promoted to a limited set of default types (e.g., short is promoted to int, float is promoted to double, etc.).

Again, there's no information associated with the additional arguments themselves to give printf any clues on how to interpret or format them. Hence the need for the conversion specifiers in the format string.

Note that in addition to telling printf the number and type of additional arguments, conversion specifiers also tell printf how to format the output (field widths, precision, padding, justification, base (decimal, octal, or hex for integer types), etc.).

Edit

To avoid extensive discussion in the comments (and because the chat page is blocked from my work system - yes I'm being a bad boy), I'm going to address the last two questions here.

IF I do this:
float b;          
float c;           
b=3.1;    
c=(5.0/9.0)*(b);
In the last statement how does the compiler know that b is of type float?

During translation, the compiler maintains a table (often called a symbol table) that stores information about an object's name, type, storage duration, scope, etc. You declared b and c as float, so any time the compiler sees an expression with b or c in it, it will generate the machine code to handle a floating-point value.

I took your code above and wrapped a full program around it:

/**
 * c1.c
 */
#include <stdio.h>
int main( void )
{
  float b;
  float c;
  b = 3.1;
  c = (5.0 / 9.0) * b;

  printf( "c = %f\n", c );
  return 0;
}

I used the -g and -Wa,-aldh options with gcc to create a listing of the generated machine code interleaved with the C source code³:

GAS LISTING /tmp/ccmGgGG2.s                     page 1

   1                            .file   "c1.c"
   9                    .Ltext0:
  10                            .section        .rodata
  11                    .LC2:
  12 0000 63203D20              .string "c = %f\n"
  12      25660A00
  13                            .align 8
  14                    .LC1:
  15 0008 721CC771              .long   1908874354
  16 000c 1CC7E13F              .long   1071761180
  17                            .text
  18                    .globl main
  20                    main:
  21                    .LFB2:
  22                            .file 1 "c1.c"
   1:c1.c          **** #include <stdio.h>
   2:c1.c          **** int main( void )
   3:c1.c          **** {
  23                            .loc 1 3 0
  24 0000 55                    pushq   %rbp
  25                    .LCFI0:
  26 0001 4889E5                movq    %rsp, %rbp
  27                    .LCFI1:
  28 0004 4883EC10              subq    $16, %rsp
  29                    .LCFI2:
   4:c1.c          ****   float b;
   5:c1.c          ****   float c;
   6:c1.c          ****   b = 3.1;
  30                            .loc 1 6 0
  31 0008 B8666646              movl    $0x40466666, %eax
  31      40
  32 000d 8945F8                movl    %eax, -8(%rbp)
   7:c1.c          ****   c = (5.0 / 9.0) * b;
  33                            .loc 1 7 0
  34 0010 F30F5A4D              cvtss2sd        -8(%rbp), %xmm1
  34      F8
  35 0015 F20F1005              movsd   .LC1(%rip), %xmm0
  35      00000000
  36 001d F20F59C1              mulsd   %xmm1, %xmm0
  37 0021 F20F5AC0              cvtsd2ss        %xmm0, %xmm0
  38 0025 F30F1145              movss   %xmm0, -4(%rbp)
  38      FC
   8:c1.c          ****
   9:c1.c          ****   printf( "c = %f\n", c );
  39                            .loc 1 9 0
  40 002a F30F5A45              cvtss2sd        -4(%rbp), %xmm0
  40      FC
  41 002f BF000000              movl    $.LC2, %edi
  41      00
  42 0034 B8010000              movl    $1, %eax
  42      00
  43 0039 E8000000              call    printf
  43      00
  10:c1.c          ****   return 0;
  44                            .loc 1 10 0
  45 003e B8000000              movl    $0, %eax

GAS LISTING /tmp/ccmGgGG2.s                     page 2

  11:c1.c          **** }
  46                            .loc 1 11 0
  47 0043 C9                    leave
  48 0044 C3                    ret

Here's how to read the assembly listing:

  40 002a F30F5A45              cvtss2sd        -4(%rbp), %xmm0
  40      FC
  ^  ^    ^                     ^               ^
  |  |    |                     |               |
  |  |    |                     |               +-- Instruction operands
  |  |    |                     +------------------ Instruction mnemonic
  |  |    +---------------------------------------- Actual machine code (instruction and operands)
  |  +--------------------------------------------- Byte offset of instruction from subroutine entry point
  +------------------------------------------------ Line number of assembly listing

One thing to note here. In the generated assembly code, there are no symbols for b or c; they only exist in the source code listing. When main executes at runtime, space for b and c (along with some other stuff) is allocated from the stack by adjusting the stack pointer:

subq    $16, %rsp

The code refers to those objects by their offset from the frame pointer⁴, with b being -8 bytes from the address stored in the frame pointer and c being -4 bytes from it, as follows:

   7:c1.c          ****   c = (5.0 / 9.0) * b;
  .loc 1 7 0
  cvtss2sd        -8(%rbp), %xmm1  ;; converts contents of b from single- to double-
                                   ;; precision float, stores result to floating-
                                   ;; point register xmm1
  movsd   .LC1(%rip), %xmm0        ;; writes the pre-computed value of 5.0/9.0  
                                   ;; to floating point register xmm0
  mulsd   %xmm1, %xmm0             ;; multiply contents of xmm1 by xmm0, store result
                                   ;; in xmm0
  cvtsd2ss        %xmm0, %xmm0     ;; convert result in xmm0 from double- to single-
                                   ;; precision float
  movss   %xmm0, -4(%rbp)          ;; save result to c

Since you declared b and c as floats, the compiler generated machine code to specifically handle floating-point values; the movsd, mulsd, cvtss2sd instructions are all specific to floating-point operations, and the registers %xmm0 and %xmm1 are used to store double-precision floating point values.

If I change the source code so that b and c are integers instead of floats, the compiler generates different machine code:

/**
 * c2.c
 */
#include <stdio.h>
int main( void )
{
  int b;
  int c;
  b = 3;
  c = (9 / 4) * b; // changed these values since integer 5/9 == 0, making for
                   // some really boring machine code.

  printf( "c = %d\n", c );
  return 0;
}

Compiling with gcc -o c2 -g -std=c99 -pedantic -Wall -Werror -Wa,-aldh=c2.lst c2.c gives:

GAS LISTING /tmp/ccyxHwid.s                     page 1

   1                            .file   "c2.c"
   9                    .Ltext0:
  10                            .section        .rodata
  11                    .LC0:
  12 0000 63203D20              .string "c = %d\n"
  12      25640A00
  13                            .text
  14                    .globl main
  16                    main:
  17                    .LFB2:
  18                            .file 1 "c2.c"
   1:c2.c          **** #include <stdio.h>
   2:c2.c          **** int main( void )
   3:c2.c          **** {
  19                            .loc 1 3 0
  20 0000 55                    pushq   %rbp
  21                    .LCFI0:
  22 0001 4889E5                movq    %rsp, %rbp
  23                    .LCFI1:
  24 0004 4883EC10              subq    $16, %rsp
  25                    .LCFI2:
   4:c2.c          ****   int b;
   5:c2.c          ****   int c;
   6:c2.c          ****   b = 3;
  26                            .loc 1 6 0
  27 0008 C745F803              movl    $3, -8(%rbp)
  27      000000
   7:c2.c          ****   c = (9 / 4) * b;
  28                            .loc 1 7 0
  29 000f 8B45F8                movl    -8(%rbp), %eax
  30 0012 01C0                  addl    %eax, %eax
  31 0014 8945FC                movl    %eax, -4(%rbp)
   8:c2.c          ****
   9:c2.c          ****   printf( "c = %d\n", c );
  32                            .loc 1 9 0
  33 0017 8B75FC                movl    -4(%rbp), %esi
  34 001a BF000000              movl    $.LC0, %edi
  34      00
  35 001f B8000000              movl    $0, %eax
  35      00
  36 0024 E8000000              call    printf
  36      00
  10:c2.c          ****   return 0;
  37                            .loc 1 10 0
  38 0029 B8000000              movl    $0, %eax
  38      00
  11:c2.c          **** }
  39                            .loc 1 11 0
  40 002e C9                    leave
  41 002f C3                    ret

Here's the same operation, but with b and c declared as integers:

   7:c2.c          ****   c = (9 / 4) * b;
  .loc 1 7 0
  movl    -8(%rbp), %eax  ;; copy value of b to register eax
  addl    %eax, %eax      ;; since 9/4 == 2 (integer arithmetic), double the
                          ;; value in eax
  movl    %eax, -4(%rbp)  ;; write result to c

This is what I meant earlier when I said that type information was "baked in" to the machine code. When the program runs, it doesn't examine b or c to determine their type; it already knows what their type should be based on the generated machine code.

If the compiler determines the type and size at run time then why doesn't the following program works:
float b='H';         
printf(" value of b is %c \n",b);

It doesn't work because you're lying to the compiler. You tell it that b is a float, so it will generate machine code to handle floating-point values. When you initialize it, the bit pattern corresponding to the constant 'H' will be interpreted as a floating-point value, not a character value.

You lie to the compiler again when you use the %c conversion specifier, which expects a value of type char, for the argument b. Because of this, printf won't interpret the contents of b correctly, and you'll wind up with garbage output⁵. Again, printf can't know the number or types of any additional arguments based on the arguments themselves; all it sees is an address on the stack (or a bunch of registers). It needs the format string to tell it what additional arguments have been passed, and what their types are.

^{1. The one exception being variable-length arrays; since their size isn't established until runtime, there's no way to evaluate sizeof on a VLA at compile time.

2. As of C89, anyway. Prior to that, the compiler could only catch mismatches in the function return type; it couldn't detect mismatches in the function parameter lists.

3. This code is generated on a 64-bit SuSE Linux Enterprise 10 system using gcc 4.1.2. If you're on a different implementation (compiler/OS/chip architecture), then the exact machine instructions will be different, but the general point will still hold; the compiler will generate different instructions to handle floats vs. ints vs. strings, etc.

4. When you call a function in a running program, a stack frame is created to store the function arguments, local variables, and the address of the instruction following the function call. A special register called the frame pointer is used to keep track of the current frame.

5. For example, assume a big-endian system where the high-order byte is the addressed byte. The bit pattern for H will be stored to b as 0x00000048. However, because the %c conversion specifier indicates that the argument should be a char, only the first byte will be read, so printf will try to write the character corresponding to the encoding 0x00.}

Best Answer

Related Solutions

C Strings – Why Use ‘\n’ at the Beginning?

Haskell – Critique of the IO Monad as a State Monad Operating on the World

Related Topic