Java – Why Java does not allow function definitions to be present outside of the class

classfunctionsjava

Unlike C++, in Java, we cannot have just function declarations in the class and definitions outside of the class. Why is it so?

Is it to emphasize that a single file in Java should contain only one class and nothing else?

Best Answer

The difference between C++ and Java is in what the languages consider their smallest unit of linkage.

Because C was designed to coexist with assembly, that unit is the subroutine called by an address. (This is true of other languages that compile to native object files, such as FORTRAN.) In other words, an object file containing a function foo() will have a symbol called _foo that will be resolved to nothing but an address such as 0xdeadbeef during linking. That's all there is. If the function is to take arguments, it's up to the caller to make sure everything the function expects is in order before calling its address. Normally, this is done by piling things onto the stack, and the compiler takes care of the grunt work and making sure the prototypes match up. There is no checking of this between object files; if you goof up the call linkage, the call isn't going to go off as planned and you're not going to get a warning about it. Despite the danger, this makes it possible for object files compiled from multiple languages (including assembly) to be linked together into a functioning program without a lot of fuss.

C++, despite all of its additional fanciness, works the same way. The compiler shoehorns namespaces, classs and methods/members/etc. into this convention by flattening the contents of classes into single names that are mangled in a way that makes them unique. For example, a method like Foo::bar(int baz) might get mangled into _ZN4Foo4barEi when put into an object file and an address like 0xBADCAFE at runtime. This is entirely dependent on the compiler, so if you try to link two objects that have different mangling schemes, you're going to be out of luck. Ugly as this is, it means you can use an extern "C" block to disable mangling, making it possible to make C++ code easily accessible to other languages. C++ inherited the notion of free-floating functions from C, largely because the native object format allows it.

Java is a different beast that lives in an insulated world with its own object file format, the .class file. Class files contain a wealth of information about their contents that allows the environment to do things with classes at runtime that the native linkage mechanism couldn't even dream about. That information has to start somewhere, and that starting point is the class. The available information allows the compiled code to describe itself without the need for separate files containing a description in source code as you'd have in C, C++ or other languages. That gives you all of the type safety benefits languages using the native linkage lack, even at runtime, and is what enables you to fish an arbitrary class out of a file using reflection and use it with a guaranteed failure if something doesn't match up.

If you haven't figured it out already, all of this safety comes with a tradeoff: anything you link to a Java program has to be Java. (By "link," I mean anytime something in one class file refers to something in another.) You can link (in the native sense) to native code using JNI, but there's an implicit contract that says that if you break the native side, you own both pieces.

Java was big and not particularly fast on the available hardware when it was first introduced, much like Ada had been in the prior decade. Only Jim Gosling can say for sure what his motivations were in making the class Java's smallest unit of linkage, but I'd have to guess that the extra complexity that adding free floaters would have added to the runtime might have been a deal-killer.

Related Solutions

Java – Does reflection in Java make its functions “first class”

A first class function is one where the function is available on its own. C, C++, and Ruby allow this approach. Java requires the function to be tied to a class and only provides a metadata representation of it, even if that class is merely a static collection of functions. C# supports first class functions with lambdas (which are based off of lambda calculus) and delegates.

Ruby is one of the languages that truly supports first class functions. The difference is that not only can you define functions on their own, but you can pass them as arguments and invoke methods on them. Check out Ruby's Proc object which is used to represent an arbitrary block of code.

The end result of having first class functions is the fact that it lends to some very powerful and flexible coding constructs. This is distinctly different than hacking around first class functions using the reflection API.

Java doesn't have full support of first class functions. The reflection API can give you some semblance of first class functions if the Method object is referencing a static method. In essence you can invoke a static method like this:

Method reference = mathClass.getMethod("sqrt");

// NOTE: the first parameter is for the object instance,
// but for static methods it is ignored
double answer = (double)reference.invoke(null, 4);

As soon as you are working with an instance method, you lose that first class function ability. You might be able to hack together some reflection based delegate support similar to C#, but the resulting code will be much slower. The "delegate" would take care of keeping the object reference for future invocations.

Java vs C++ – Separation of Class Definitions and Implementations

How many lines of code are in the following program?

#include <iostream>

int main()
{
   std::cout << "Hello, world!\n";
   return 0;
}

You probably answered 7 (or 6 if you didn't count the blank line, or 4 if you didn't count the braces).

Your compiler, however, sees something very different:

~$ cpp hello.cpp | wc
  18736   40822  437015

Yes, that's 18.7 KLOC just for a "Hello, world!" program. The C++ compiler has to parse all that. This is a major reason why C++ compilation takes so long compared to other languages, and why modern languages eschew header files.

A better question would be

Why does C++ have header files?

C++ was designed to be a superset of C, so it had to keep header files for backwards compatibility.

OK, so why does C have header files?

Because of its primitive separate compilation model. The object files generated by C compilers don't include any type information, so in order to prevent type errors you need to include this information in your source code.

~$ cat sqrtdemo.c 
int main(void)
{
    /* implicit declaration int sqrt(int) */
    double sqrt2 = sqrt(2);
    printf("%f\n", sqrt2);
    return 0;
}

~$ gcc -Wall -ansi -lm -Dsqrt= sqrtdemo.c
sqrtdemo.c: In function ‘main’:
sqrtdemo.c:5:5: warning: implicit declaration of function ‘printf’ [-Wimplicit-function-declaration]
sqrtdemo.c:5:5: warning: incompatible implicit declaration of built-in function ‘printf’ [enabled by default]
~$ ./a.out 
2.000000

Adding the proper type declarations fixes the bug:

~$ cat sqrtdemo.c 
#undef printf
#undef sqrt

int printf(const char*, ...);
double sqrt(double);

int main(void)
{
    double sqrt2 = sqrt(2);
    printf("%f\n", sqrt2);
    return 0;
}

~$ gcc -Wall -ansi -lm sqrtdemo.c
~$ ./a.out 
1.414214

Notice that there are no #includes. But when you use a large number of external functions (which most programs will), manually declaring them gets tedious and error-prone. It's much easier to use header files.

How are modern languages able to avoid header files?

By using a different object file format that includes type information. For example, the Java *.class file format includes "descriptors" that specify the types of fields and method parameters.

This was not a new invention. Earlier (1987), when Borland added separately-compiled "units" to Turbo Pascal 4.0, it chose to use a new *.TPU format rather than Turbo C's *.OBJ in order to remove the need for header files.