In the Old Days (pre-ANSI), predefining symbols such as unix
and vax
was a way to allow code to detect at compile time what system it was being compiled for. There was no official language standard back then (beyond the reference material at the back of the first edition of K&R), and C code of any complexity was typically a complex maze of #ifdef
s to allow for differences between systems. These macro definitions were generally set by the compiler itself, not defined in a library header file. Since there were no real rules about which identifiers could be used by the implementation and which were reserved for programmers, compiler writers felt free to use simple names like unix
and assumed that programmers would simply avoid using those names for their own purposes.
The 1989 ANSI C standard introduced rules restricting what symbols an implementation could legally predefine. A macro predefined by the compiler could only have a name starting with two underscores, or with an underscore followed by an uppercase letter, leaving programmers free to use identifiers not matching that pattern and not used in the standard library.
As a result, any compiler that predefines unix
or linux
is non-conforming, since it will fail to compile perfectly legal code that uses something like int linux = 5;
.
As it happens, gcc is non-conforming by default -- but it can be made to conform (reasonably well) with the right command-line options:
gcc -std=c90 -pedantic ... # or -std=c89 or -ansi
gcc -std=c99 -pedantic
gcc -std=c11 -pedantic
See the gcc manual for more details.
gcc will be phasing out these definitions in future releases, so you shouldn't write code that depends on them. If your program needs to know whether it's being compiled for a Linux target or not it can check whether __linux__
is defined (assuming you're using gcc or a compiler that's compatible with it). See the GNU C preprocessor manual for more information.
A largely irrelevant aside: the "Best One Liner" winner of the 1987 International Obfuscated C Code Contest, by David Korn (yes, the author of the Korn Shell) took advantage of the predefined unix
macro:
main() { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}
It prints "unix"
, but for reasons that have absolutely nothing to do with the spelling of the macro name.
Short answer:
In both C and C++, (int *)0
is a constant expression whose value is a null pointer. It is not, however, a null pointer constant. The only observable difference between a constant-expression-whose-value-is-a-null-pointer and a null-pointer-constant, that I know of, is that a null-pointer-constant can be assigned to an lvalue of any pointer type, but a constant-expression-whose-value-is-a-null-pointer has a specific pointer type and can only be assigned to an lvalue with a compatible type. In C, but not C++, (void *)0
is also a null pointer constant; this is a special case for void *
consistent with the general C-but-not-C++ rule that void *
is assignment compatible with any other pointer-to-object type.
For example:
long *a = 0; // ok, 0 is a null pointer constant
long *b = (long *)0; // ok, (long *)0 is a null pointer with appropriate type
long *c = (void *)0; // ok in C, invalid conversion in C++
long *d = (int *)0; // invalid conversion in both C and C++
And here's a case where the difference between the null pointer constant (void *)0
and a constant-expression-whose-value-is-a-null-pointer with type void *
is visible, even in C:
typedef void (*fp)(void); // any pointer-to-function type will show this effect
fp a = 0; // ok, null pointer constant
fp b = (void *)0; // ok in C, invalid conversion in C++
fp c = (void *)(void *)0; // invalid conversion in both C and C++
Also, it's moot nowadays, but since you brought it up: No matter what the bit representation of long *
's null pointer is, all of these assertions behave as indicated by the comments:
// 'x' is initialized to a null pointer
long *x = 0;
// 'y' is initialized to all-bits-zero, which may or may not be the
// representation of a null pointer; moreover, it might be a "trap
// representation", UB even to access
long *y;
memset(&y, 0, sizeof y);
assert (x == 0); // must succeed
assert (x == (long *)0); // must succeed
assert (x == (void *)0); // must succeed in C, unspecified behavior in C++
assert (x == (int *)0); // invalid comparison in both C and C++
assert (memcmp(&x, &y, sizeof y) == 0); // unspecified
assert (y == 0); // UNDEFINED BEHAVIOR: y may be a trap representation
assert (y == x); // UNDEFINED BEHAVIOR: y may be a trap representation
"Unspecified" comparisons do not provoke undefined behavior, but the standard doesn't say whether they evaluate true or false, and the implementation is not required to document which of the two it is, or even to pick one and stick to it. It would be perfectly valid for the above memcmp
to alternate between returning 0 and 1 if you called it many times.
Long answer with standard quotes:
To understand what a null pointer constant is, you first have to understand what an integer constant expression is, and that's pretty hairy -- a complete understanding requires you to read sections 6.5 and 6.6 of C99 in detail. This is my summary:
A constant expression is any C expression which the compiler can evaluate to a constant without knowing the value of any object (const
or otherwise; however, enum
values are fair game), and which has no side effects. (This is a drastic simplification of roughly 25 pages of standardese and may not be exact.)
Integer constant expressions are a restricted subset of constant expressions, conveniently defined in a single paragraph, C99 6.6p6 and its footnote:
An integer constant expression96 shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof
operator.
96 An integer constant expression is used to specify the size of a bit-field member of a structure, the value of an enumeration constant, the size of an array, or the value of a case constant. Further constraints that apply to the integer constant expressions used in [#if
] are discussed in 6.10.1.
For purpose of this discussion, the important bit is
Cast operators ... shall only convert arithmetic types to integer types
which means that (int *)0
is not an integer constant expression, although it is a constant expression.
The C++98 definition appears to be more or less equivalent, modulo C++ features and deviations from C. For instance, the stronger separation of character and boolean types from integer types in C++ means that the C++ standard speaks of "integral constant expressions" rather than "integer constant expressions", and then sometimes requires not just an integral constant expression, but an integral constant expression of integer type, excluding char
, wchar_t
, and bool
(and maybe also signed char
and unsigned char
? it's not clear to me from the text).
Now, the C99 definition of null pointer constant is what this question is all about, so I'll repeat it: 6.3.2.3p3 says
An integer constant expression with the value 0, or such an expression cast to type
void *
, is called a null pointer constant. If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
Standardese is very, very literal. Those two sentences mean exactly the same thing as:
An integer constant expression with the value 0 is called a null pointer constant.
An integer constant expression with the value 0, cast to type void *
, is also a null pointer constant.
When any null pointer constant is converted to a pointer type, the resulting pointer is called a null pointer and is guaranteed to compare unequal ...
(Italics - definition of term. Boldface - my emphasis.) So what that means is, in C, (long *)0
and (long *)(void *)0
are two ways of writing exactly the same thing, namely the null pointer with type long *
.
C++ is different. The equivalent text is C++98 4.10 [conv.ptr]:
A null pointer constant is an integral constant expression (5.19) rvalue of integer type that evaluates to zero.
That's all. "Integral constant expression rvalue of integer type" is very nearly the same thing as C99's "integer constant expression", but there are a few things that qualify in C but not C++: for instance, in C the character literal '\x00'
is an integer constant expression, and therefore a null pointer constant, but in C++ it is not an integral constant expression of integer type, so it is not a null pointer constant either.
More to the point, though, C++ doesn't have the "or such an expression cast to void *
" clause. That means that ((void *)0)
is not a null pointer constant in C++. It is still a null pointer, but it is not assignment compatible with any other pointer type. This is consistent with C++'s generally pickier type system.
C++11 (but not, AFAIK, C11) revised the concept of "null pointer", adding a special type for them (nullptr_t
) and a new keyword which evaluates to a null pointer constant (nullptr
). I do not fully understand the changes and am not going to try to explain them, but I am pretty sure that a bare 0
is still a valid null pointer constant in C++11.
Best Answer
No, it doesn't. (I confess to being a bit biased, since the referenced blog is mine.)
The bolded sentence says that its type and value are identical to those of the unparenthesized expression. That's not enough to imply that it's a null pointer constant.
Consider:
(void*)0
is a null pointer constant.((void*)0)
has the same type and value as(void*)0
.var
also has the same type and value as(void*)0
, butvar
clearly is not a null pointer constant.Having said that, I'm 99+% sure that the intent is that
((void*)0)
is a null pointer constant, and more generally that any parenthesized null pointer constant is a null pointer constant. The authors of the standard merely neglected to say so. And since the description of parenthesized expressions in 6.5.1p5 specifically enumerates several other characteristics that are inherited by parenthesized expressions:the omission is troubling (but only mildly so).
But let's assume, for the sake of argument, that
((void*)0)
is not a null pointer constant. What difference does it make?(void*)0
is a null pointer constant, whose value is a null pointer of typevoid*
, so by the semantics of parenthesized expressions((void*)0)
also has a value that is a null pointer of typevoid*
. Both(void*)0
and((void*)0)
are address constants. (Well, I think they are.) So what contexts require a null pointer constant and do not accept an address constant? There are only a few.6.5.9 Equality operators
An expression of function pointer type may be compared for equality to a null pointer constant. (An object pointer may be compared to an expression of type
void*
, but a function pointer may not, unless it's a null pointer constant.) So this:would be a constraint violation.
6.5.16.1 Simple assignment
In an assignment, a null pointer constant may be assigned to an object of pointer-to-function type, and will be implicitly converted. An expression of type
void*
that's not a null pointer constant may not be assigned to a function pointer. The same constraints apply to argument passing and initialization. So this:would be a constraint violation if
((void*)0)
were not a null pointer constant. Thanks to commenter hvd for finding this.7.19 Common definitions
<stddef.h>
The macro
NULL
expands to "an implementation-defined null pointer constant". If((void*)0)
is not a null pointer constant, then this:would be invalid. This would be a restriction imposed on the implementation, not on programmers. Note that this:
is definitely invalid, since macro definitions in standard headers must be fully protected by parentheses where necessary (7.1.2p5). Without the parentheses, the valid expression
sizeof NULL
would be a syntax error, expanding tosizeof (void*)
followed by an extraneous constant0
.