C – Storing EOF Character in a Char Type

c

I read in the Dennis Ritchie's The C Programming Language book that int must be used for a variable to hold EOF – to make it sufficiently large so that it can hold EOF value – not char. But following code works fine:

#include<stdio.h> 

main()  { 
  char c; 
  c=getchar(); 
  while(c!=EOF)  { 
    putchar(c); 
    c=getchar(); 
  } 
} 

When there is no more input, getchar returns EOF. And in the above program, the variable c, with char type, is able to hold it successfully.

Why does this work? As per the explanation in the book above mentioned, the code should not work.

Best Answer

Your code seems to work, because the implicit type conversions accidentally happen to do the right thing.

getchar() returns an int with a value that either fits the range of unsigned char or is EOF (which must be negative, usually it is -1). Note that EOF itself is not a character, but a signal that there are no more characters available.

When storing the result from getchar() in c, there are two possibilities. Either the type char can represent the value, in which case that is the value of c. Or the type char can not represent the value. In that case, it is not defined what will happen. Intel processors just chop off the high bits that don't fit in the new type (effectively reducing the value modulo 256 for char), but you should not rely on that.

The next step is to compare c with EOF. As EOF is an int, c will be converted to an int as well, preserving the value stored in c. If c could store the value of EOF, then the comparison will succeed, but if c could not store the value, then the comparison will fail, because there has been an irrecoverable loss of information while converting EOF to type char.

It seems your compiler chose to make the char type signed and the value of EOF small enough to fit in char. If char were unsigned (or if you had used unsigned char), your test would have failed, because unsigned char can't hold the value of EOF.


Also note that there is a second problem with your code. As EOF is not a character itself, but you force it into a char type, there is very likely a character out there that gets misinterpreted as being EOF and for half the possible characters it is undefined if they will be processed correctly.