Why Declaration of Data and Functions is Necessary in C

cdeclarationsfunctionslanguage-designvariables

Consider the following "C" code:

#include<stdio.h>
main()
{   
  printf("func:%d",Func_i());   
}

Func_i()
{
  int i=3;
  return i;
}

Func_i() is defined at the end of the source code and no declaration is provide before its use in main(). At the very time when the compiler sees Func_i() in main(), it comes out of the main() and finds out Func_i(). The compiler somehow finds the value returned by Func_i()and gives it to printf(). I also know that the compiler cannot find the return type of Func_i(). It, by default takes(guesses?) the return type of Func_i() to be int. That is if the code had float Func_i() then the compiler would give the error: Conflicting types for Func_i().

From the above discussion we see that:

  1. The compiler can find the value returned by Func_i().

    • If the compiler can find the value returned by Func_i() by coming out of the main() and searching down the source code, then why can't it find the type of Func_i(), which is explicitly mentioned.
  2. The compiler must know that Func_i() is of type float–that's why it gives the error of conflicting types.

  • If the compiler knows that Func_i is of type float, then why does it still assume Func_i() to be of type int, and gives the error of conflicting types? Why don't it forcefully make Func_i() to be of type float.

I've the same doubt with the variable declaration. Consider the following "C" code:

#include<stdio.h>
main()
{
  /* [extern int Data_i;]--omitted the declaration */
  printf("func:%d and Var:%d",Func_i(),Data_i);
}

 Func_i()
{
  int i=3;
  return i;
}
int Data_i=4;

The compiler gives the error: 'Data_i' undeclared(first use in this function).

  • When the compiler sees Func_i(), it goes down to the source code to find the value returned by Func_(). Why can't the compiler do the same for the variable Data_i?

Edit:

I don't know the details of the inner working of compiler, assembler, processor etc. The basic idea of my question is that if I tell(write) the return-value of the function in the source code at last, after the use of that function then the "C" language allows the computer to find that value without giving any error. Now why can't the computer find the type similarly. Why can't the type of Data_i be found as Func_i()'s return value was found. Even if I use the extern data-type identifier; statement, I am not telling the value to be returned by that identifier(function/variable). If the computer can find that value then why can't it find the type. Why do we need the forward declaration at all?

Thank you.

Best Answer

Because C is a single-pass, statically-typed, weakly-typed, compiled language.

  1. Single-pass means the compiler does not look ahead to see the definition of a function or variable. Since the compiler does not look ahead, the declaration of a function must come before the use of the function, otherwise the compiler does not know what its type signature is. However, the definition of the function can be later on in the same file, or even in a different file altogether. See point #4.

    The only exception is the historical artifact that undeclared functions and variables are presumed to be of type "int". Modern practice is to avoid implicit typing by always declaring functions and variables explicitly.

  2. Statically-typed means that all type information is computed at compile time. That information is then used to generate machine code that executes at run time. There is no concept in C of run-time typing. Once an int, always an int, once a float, always a float. However, that fact is somewhat obscured by the next point.

  3. Weakly-typed means that the C compiler automatically generates code to convert between numeric types without requiring the programmer to explicitly specify the conversion operations. Because of static typing, the same conversion will always be carried out in the same way each time through the program. If a float value is converted to an int value at a given spot in the code, a float value will always be converted to an int value at that spot in the code. This cannot be changed at run-time. The value itself may change from one execution of the program to the next, of course, and conditional statements may change which sections of code are run in what order, but a given single section of code without function calls or conditionals will always perform the exact same operations whenever it is run.

  4. Compiled means that the process of analyzing the human-readable source code and transforming it into machine-readable instructions is fully carried out before the program runs. When the compiler is compiling a function, it has no knowledge of what it will encounter further down in a given source file. However, once compilation (and assembly, linking, etc) have completed, each function in the finished executable contains numeric pointers to the functions that it will call when it is run. That is why main() can call a function further down in the source file. By the time main() is actually run, it will contain a pointer to the address of Func_i().

    Machine code is very, very specific. The code for adding two integers (3 + 2) is different from the one for adding two floats (3.0 + 2.0). Those are both different from adding an int to a float (3 + 2.0), and so on. The compiler determines for every point in a function what exact operation needs to be carried out at that point, and generates code that carries out that exact operation. Once that has been done, it cannot be changed without recompiling the function.

Putting all these concepts together, the reason that main() cannot "see" further down to determine the type of Func_i() is that type analysis occurs at the very beginning of the compilation process. At that point, only the part of the source file up to the definition of main() has been read and analyzed, and the definition of Func_i() is not yet known to the compiler.

The reason that main() can "see" where Func_i() is to call it is that calling happens at run time, after compilation has already resolved all of the names and types of all of the identifiers, assembly has already converted all of the functions to machine code, and linking has already inserted the correct address of each function in each place it is called.

I have, of course, left out most of the gory details. The actual process is much, much more complicated. I hope that I have provided enough of a high-level overview to answer your questions.

Additionally, please remember, what I have written above specifically applies to C.

In other languages, the compiler may make multiple passes through the source code, and so the compiler could pick up the definition of Func_i() without it being predeclared.

In other languages, functions and / or variables may be dynamically typed, so a single variable could hold, or a single function could be passed or return, an integer, a float, a string, an array, or an object at different times.

In other languages, typing may be stronger, requiring conversion from floating-point to integer to be explicitly specified. In yet other languages, typing may be weaker, allowing conversion from the string "3.0" to the float 3.0 to the integer 3 to be carried out automatically.

And in other languages, code may be interpreted one line at a time, or compiled to byte-code and then interpreted, or just-in-time compiled, or put through a wide variety of other execution schemes.