C Programming – Why Mention Data Type of Variable in C?

cdata typesdeclarationsiovariables

Usually in C, we have to tell the computer the type of data in variable declaration. E.g. in the following program, I want to print the sum of two floating point numbers X and Y.

#include<stdio.h>
main()
{
  float X=5.2;
  float Y=5.1;
  float Z;
  Z=Y+X;
  printf("%f",Z);

}

I had to tell the compiler the type of variable X.

Can't the compiler determine the type of X on its own?

Yes, it can if I do this:

#define X 5.2

I can now write my program without telling the compiler the type of X as:

#include<stdio.h>
#define X 5.2
main()
{
  float Y=5.1;
  float Z;
  Z=Y+X;
  printf("%f",Z);

}

So we see that C language has some kind of feature, using which it can determine the type of data on its own. In my case it determined that X is of type float.

Why do we have to mention the type of data, when we declare something in main()? Why can't the compiler determine the data type of a variable on its own in main() as it does in #define.

Best Answer

You are comparing variable declarations to #defines, which is incorrect. With a #define, you create a mapping between an identifier and a snippet of source code. The C preprocessor will then literally substitute any occurrences of that identifier with the provided snippet. Writing

#define FOO 40 + 2
int foos = FOO + FOO * FOO;

ends up being the same thing to the compiler as writing

int foos = 40 + 2 + 40 + 2 * 40 + 2;

Think of it as automated copy&paste.

Also, normal variables can be reassigned, while a macro created with #define can not (although you can re-#define it). The expression FOO = 7 would be a compiler error, since we can't assign to “rvalues”: 40 + 2 = 7 is illegal.

So, why do we need types at all? Some languages apparently get rid of types, this is especially common in scripting languages. However, they usually have something called “dynamic typing” where variables don't have fixed types, but values have. While this is far more flexible, it's also less performant. C likes performance, so it has a very simple and efficient concept of variables:

There's a stretch of memory called the “stack”. Each local variable corresponds to an area on the stack. Now the question is how many bytes long does this area have to be? In C, each type has a well-defined size which you can query via sizeof(type). The compiler needs to know the type of each variable so that it can reserve the correct amount of space on the stack.

Why don't constants created with #define need a type annotation? They are not stored on the stack. Instead, #define creates reusable snippets of source code in a slightly more maintainable manner than copy&paste. Literals in the source code such as "foo" or 42.87 are stored by the compiler either inline as special instructions, or in a separate data section of the resulting binary.

However, literals do have types. A string literal is a char *. 42 is an int but can also be used for shorter types (narrowing conversion). 42.8 would be a double. If you have a literal and want it to have a different type (e.g. to make 42.8 a float, or 42 an unsigned long int), then you can use suffixes – a letter after the literal that changes how the compiler treats that literal. In our case, we might say 42.8f or 42ul.

Some languages have static typing as in C, but the type annotations are optional. Examples are ML, Haskell, Scala, C#, C++11, and Go. How does that work? Magic? No, this is called “type inference”. In C# and Go, the compiler looks at the right hand side of an assignment, and deduces the type of that. This is fairly straightforward if the right hand side is a literal such as 42ul. Then it's obvious what the type of the variable should be. Other languages also have more complex algorithms that take into account how a variable is used. E.g. if you do x/2, then x can't be a string but must have some numeric type.

Related Solutions

C# – Why do variables need a type

But why does a variable need a type at all?

This may catch bugs where an invalid, wrongly typed expression is assigned to a variable. Some languages have dynamic typing, which sacrifices the correctness guarantees of a type per variable for the kind of flexibility that you seem to desire.
Types may allow the compiler to generate more efficient code. Dynamic typing means type checks have to be performed at runtime.

C Programming – Why Specify Data Type in printf() in C?

There are two issues at play here:

Issue #1: C is a statically typed language; all type information is determined at compile time. No type information is stored with any object in memory such that its type and size can be determined at run time¹. If you examine the memory at any particular address while the program is running, all you'll see is a sludge of bytes; there's nothing to tell you whether that particular address actually contains an object, what the type or size of that object is, or how to interpret those bytes (as an integer, or floating point type, or sequence of characters in a string, etc.). All that information is baked into the machine code when the code is compiled, based on type information specified in the source code; for example, the function definition

void foo( int x, double y, char *z )
{
  ...
}

tells the compiler to generate the appropriate machine code to handle x as an integer, y as a floating-point value, and z as a pointer to char. Note that any mismatches in the number or type of arguments between a function call and a function definition are only detected when the code is being compiled²; it's only during the compilation phase that any type information is associated with an object.

Issue #2: printf is a variadic function; it takes one fixed parameter of type const char * restrict (the format string), along with zero or more additional parameters, the number and type of which are not known at compile time:

int printf( const char * restrict fmt, ... );

The printf function has no way of knowing what the number and types of additional arguments are from the passed arguments themselves; it has to rely on the format string to tell it how to interpret the sludge of bytes on the stack (or in the registers). Even better, because it's a variadic function, arguments with certain types are promoted to a limited set of default types (e.g., short is promoted to int, float is promoted to double, etc.).

Again, there's no information associated with the additional arguments themselves to give printf any clues on how to interpret or format them. Hence the need for the conversion specifiers in the format string.

Note that in addition to telling printf the number and type of additional arguments, conversion specifiers also tell printf how to format the output (field widths, precision, padding, justification, base (decimal, octal, or hex for integer types), etc.).

Edit

To avoid extensive discussion in the comments (and because the chat page is blocked from my work system - yes I'm being a bad boy), I'm going to address the last two questions here.

IF I do this:
float b;          
float c;           
b=3.1;    
c=(5.0/9.0)*(b);
In the last statement how does the compiler know that b is of type float?

During translation, the compiler maintains a table (often called a symbol table) that stores information about an object's name, type, storage duration, scope, etc. You declared b and c as float, so any time the compiler sees an expression with b or c in it, it will generate the machine code to handle a floating-point value.

I took your code above and wrapped a full program around it:

/**
 * c1.c
 */
#include <stdio.h>
int main( void )
{
  float b;
  float c;
  b = 3.1;
  c = (5.0 / 9.0) * b;

  printf( "c = %f\n", c );
  return 0;
}

I used the -g and -Wa,-aldh options with gcc to create a listing of the generated machine code interleaved with the C source code³:

GAS LISTING /tmp/ccmGgGG2.s                     page 1

   1                            .file   "c1.c"
   9                    .Ltext0:
  10                            .section        .rodata
  11                    .LC2:
  12 0000 63203D20              .string "c = %f\n"
  12      25660A00
  13                            .align 8
  14                    .LC1:
  15 0008 721CC771              .long   1908874354
  16 000c 1CC7E13F              .long   1071761180
  17                            .text
  18                    .globl main
  20                    main:
  21                    .LFB2:
  22                            .file 1 "c1.c"
   1:c1.c          **** #include <stdio.h>
   2:c1.c          **** int main( void )
   3:c1.c          **** {
  23                            .loc 1 3 0
  24 0000 55                    pushq   %rbp
  25                    .LCFI0:
  26 0001 4889E5                movq    %rsp, %rbp
  27                    .LCFI1:
  28 0004 4883EC10              subq    $16, %rsp
  29                    .LCFI2:
   4:c1.c          ****   float b;
   5:c1.c          ****   float c;
   6:c1.c          ****   b = 3.1;
  30                            .loc 1 6 0
  31 0008 B8666646              movl    $0x40466666, %eax
  31      40
  32 000d 8945F8                movl    %eax, -8(%rbp)
   7:c1.c          ****   c = (5.0 / 9.0) * b;
  33                            .loc 1 7 0
  34 0010 F30F5A4D              cvtss2sd        -8(%rbp), %xmm1
  34      F8
  35 0015 F20F1005              movsd   .LC1(%rip), %xmm0
  35      00000000
  36 001d F20F59C1              mulsd   %xmm1, %xmm0
  37 0021 F20F5AC0              cvtsd2ss        %xmm0, %xmm0
  38 0025 F30F1145              movss   %xmm0, -4(%rbp)
  38      FC
   8:c1.c          ****
   9:c1.c          ****   printf( "c = %f\n", c );
  39                            .loc 1 9 0
  40 002a F30F5A45              cvtss2sd        -4(%rbp), %xmm0
  40      FC
  41 002f BF000000              movl    $.LC2, %edi
  41      00
  42 0034 B8010000              movl    $1, %eax
  42      00
  43 0039 E8000000              call    printf
  43      00
  10:c1.c          ****   return 0;
  44                            .loc 1 10 0
  45 003e B8000000              movl    $0, %eax

GAS LISTING /tmp/ccmGgGG2.s                     page 2

  11:c1.c          **** }
  46                            .loc 1 11 0
  47 0043 C9                    leave
  48 0044 C3                    ret

Here's how to read the assembly listing:

  40 002a F30F5A45              cvtss2sd        -4(%rbp), %xmm0
  40      FC
  ^  ^    ^                     ^               ^
  |  |    |                     |               |
  |  |    |                     |               +-- Instruction operands
  |  |    |                     +------------------ Instruction mnemonic
  |  |    +---------------------------------------- Actual machine code (instruction and operands)
  |  +--------------------------------------------- Byte offset of instruction from subroutine entry point
  +------------------------------------------------ Line number of assembly listing

One thing to note here. In the generated assembly code, there are no symbols for b or c; they only exist in the source code listing. When main executes at runtime, space for b and c (along with some other stuff) is allocated from the stack by adjusting the stack pointer:

subq    $16, %rsp

The code refers to those objects by their offset from the frame pointer⁴, with b being -8 bytes from the address stored in the frame pointer and c being -4 bytes from it, as follows:

   7:c1.c          ****   c = (5.0 / 9.0) * b;
  .loc 1 7 0
  cvtss2sd        -8(%rbp), %xmm1  ;; converts contents of b from single- to double-
                                   ;; precision float, stores result to floating-
                                   ;; point register xmm1
  movsd   .LC1(%rip), %xmm0        ;; writes the pre-computed value of 5.0/9.0  
                                   ;; to floating point register xmm0
  mulsd   %xmm1, %xmm0             ;; multiply contents of xmm1 by xmm0, store result
                                   ;; in xmm0
  cvtsd2ss        %xmm0, %xmm0     ;; convert result in xmm0 from double- to single-
                                   ;; precision float
  movss   %xmm0, -4(%rbp)          ;; save result to c

Since you declared b and c as floats, the compiler generated machine code to specifically handle floating-point values; the movsd, mulsd, cvtss2sd instructions are all specific to floating-point operations, and the registers %xmm0 and %xmm1 are used to store double-precision floating point values.

If I change the source code so that b and c are integers instead of floats, the compiler generates different machine code:

/**
 * c2.c
 */
#include <stdio.h>
int main( void )
{
  int b;
  int c;
  b = 3;
  c = (9 / 4) * b; // changed these values since integer 5/9 == 0, making for
                   // some really boring machine code.

  printf( "c = %d\n", c );
  return 0;
}

Compiling with gcc -o c2 -g -std=c99 -pedantic -Wall -Werror -Wa,-aldh=c2.lst c2.c gives:

GAS LISTING /tmp/ccyxHwid.s                     page 1

   1                            .file   "c2.c"
   9                    .Ltext0:
  10                            .section        .rodata
  11                    .LC0:
  12 0000 63203D20              .string "c = %d\n"
  12      25640A00
  13                            .text
  14                    .globl main
  16                    main:
  17                    .LFB2:
  18                            .file 1 "c2.c"
   1:c2.c          **** #include <stdio.h>
   2:c2.c          **** int main( void )
   3:c2.c          **** {
  19                            .loc 1 3 0
  20 0000 55                    pushq   %rbp
  21                    .LCFI0:
  22 0001 4889E5                movq    %rsp, %rbp
  23                    .LCFI1:
  24 0004 4883EC10              subq    $16, %rsp
  25                    .LCFI2:
   4:c2.c          ****   int b;
   5:c2.c          ****   int c;
   6:c2.c          ****   b = 3;
  26                            .loc 1 6 0
  27 0008 C745F803              movl    $3, -8(%rbp)
  27      000000
   7:c2.c          ****   c = (9 / 4) * b;
  28                            .loc 1 7 0
  29 000f 8B45F8                movl    -8(%rbp), %eax
  30 0012 01C0                  addl    %eax, %eax
  31 0014 8945FC                movl    %eax, -4(%rbp)
   8:c2.c          ****
   9:c2.c          ****   printf( "c = %d\n", c );
  32                            .loc 1 9 0
  33 0017 8B75FC                movl    -4(%rbp), %esi
  34 001a BF000000              movl    $.LC0, %edi
  34      00
  35 001f B8000000              movl    $0, %eax
  35      00
  36 0024 E8000000              call    printf
  36      00
  10:c2.c          ****   return 0;
  37                            .loc 1 10 0
  38 0029 B8000000              movl    $0, %eax
  38      00
  11:c2.c          **** }
  39                            .loc 1 11 0
  40 002e C9                    leave
  41 002f C3                    ret

Here's the same operation, but with b and c declared as integers:

   7:c2.c          ****   c = (9 / 4) * b;
  .loc 1 7 0
  movl    -8(%rbp), %eax  ;; copy value of b to register eax
  addl    %eax, %eax      ;; since 9/4 == 2 (integer arithmetic), double the
                          ;; value in eax
  movl    %eax, -4(%rbp)  ;; write result to c

This is what I meant earlier when I said that type information was "baked in" to the machine code. When the program runs, it doesn't examine b or c to determine their type; it already knows what their type should be based on the generated machine code.

If the compiler determines the type and size at run time then why doesn't the following program works:
float b='H';         
printf(" value of b is %c \n",b);

It doesn't work because you're lying to the compiler. You tell it that b is a float, so it will generate machine code to handle floating-point values. When you initialize it, the bit pattern corresponding to the constant 'H' will be interpreted as a floating-point value, not a character value.

You lie to the compiler again when you use the %c conversion specifier, which expects a value of type char, for the argument b. Because of this, printf won't interpret the contents of b correctly, and you'll wind up with garbage output⁵. Again, printf can't know the number or types of any additional arguments based on the arguments themselves; all it sees is an address on the stack (or a bunch of registers). It needs the format string to tell it what additional arguments have been passed, and what their types are.

^{1. The one exception being variable-length arrays; since their size isn't established until runtime, there's no way to evaluate sizeof on a VLA at compile time.

2. As of C89, anyway. Prior to that, the compiler could only catch mismatches in the function return type; it couldn't detect mismatches in the function parameter lists.

3. This code is generated on a 64-bit SuSE Linux Enterprise 10 system using gcc 4.1.2. If you're on a different implementation (compiler/OS/chip architecture), then the exact machine instructions will be different, but the general point will still hold; the compiler will generate different instructions to handle floats vs. ints vs. strings, etc.

4. When you call a function in a running program, a stack frame is created to store the function arguments, local variables, and the address of the instruction following the function call. A special register called the frame pointer is used to keep track of the current frame.

5. For example, assume a big-endian system where the high-order byte is the addressed byte. The bit pattern for H will be stored to b as 0x00000048. However, because the %c conversion specifier indicates that the argument should be a char, only the first byte will be read, so printf will try to write the character corresponding to the encoding 0x00.}

Best Answer

Related Solutions

C# – Why do variables need a type

C Programming – Why Specify Data Type in printf() in C?

Related Topic