Why C++ Has Undefined Behavior and C# or Java Don’t

cjavaprogramming-languagesundefined-behavior

This Stack Overflow post lists a fairly comprehensive list of situations where the C/C++ language specification declares as to be 'undefined behaviour'. However, I want to understand why other modern languages, like C# or Java, doesn't have the concept of 'undefined behavior'. Does it mean, the compiler designer can control all possible scenarios (C# and Java) or not (C and C++)?

Best Answer

Undefined behaviour is one of those things that were recognized as a very bad idea only in retrospect.

The first compilers were great achievements and jubilantly welcomed improvements over the alternative - machine language or assembly language programming. The problems with that were well-known, and high-level languages were invented specifically to solve those known problems. (The enthusiasm at the time was so great that HLLs were somtimes hailed as "the end of programming" - as if from now on we would only have to trivially write down what we wanted and the compiler would do all the real work.)

It wasn't until later that we realized the newer problems that came with the newer approach. Being remote from the actual machine that code runs on means there is more possibility of things silently not doing what we expected them to do. For instance, allocating a variable would typically leave the initial value undefined; this wasn't considered a problem, because you wouldn't allocate a variable if you didn't want to hold a value in it, right? Surely it wasn't too much to expect that professional programmers wouldn't forget to assign the initial value, was it?

It turned out that with the larger code bases and more complicated structures that became possible with more powerful programming systems, yes, many programmers would indeed commit such oversights from time to time, and the resulting undefined behaviour became a major problem. Even today, the majority of security leaks from tiny to horrible are the result of undefined behaviour in one form or another. (The reason is that usually, undefined behaviour is in fact very much defined by things on the next lower level on computing, and attackers who understand that level can use that wiggle room to make a program do not only unintended things, but exactly the things they intend.)

Since we recognised this, there has been a general drive to banish undefined behaviour from high-level languages, and Java was particularly thorough about this (which was comparatively easy since it was designed to run on its own specifically designed virtual machine anyway). Older languages like C can't easily be retrofitted like that without losing compatibility with the huge amount of existing code.

Edit: As pointed out, efficiency is another reason. Undefined behaviour means that compiler writers have a lot of leeway for exploiting the target architecture so that each implementation gets away with the fastest possible implementation of each feature. This was more important on yesterday's underpowered machines than with today, when programmer salary is often the bottleneck for software development.