C Data Types – Are Declarators Stored in RAM?

cdata

When a C program is running, the data is stored on the heap or the stack. The values are stored in RAM addresses. But what about the type indicators (e.g., int or char)? Are they also stored?

Consider the following code:

char a = 'A';
int x = 4;

I read that A and 4 are stored in RAM addresses here. But what about a and x? Most confusingly, how does the execution know that a is a char and x is an int? I mean, is the int and char mentioned somewhere in RAM?

Let's say a value is stored somewhere in RAM as 10011001; if I am the program which executes the code, how will I know whether this 10011001 is a char or an int?

What I don't understand is how the computer knows, when it reads a variable's value from an address such as 10001, whether it is an int or char. Imagine I click on a program called anyprog.exe. Immediately the code starts executing. Does this executable file include information on whether the variables stored are of the type int or char?

Best Answer

To address the question you've posted in several comments(which I think you should edit into your post):

What I don't understand is how does the computer know lets when it reads a variable's value from and address such as 10001 if is an int or char. Imagine I click on a program called anyprog.exe. Immediately the code starts executing. Does this exe file include information about if the variables are stored as in or char?

So lets put some code to it. Let's say you write:

int x = 4;

And let's assume that it gets stored in RAM:

0x00010004: 0x00000004

The first part being the address, the second part being the value. When your program(which executes as machine code) runs, all it sees at 0x00010004 is the value 0x000000004. It doesn't 'know' the type of this data, and it doesn't know how it is 'supposed' to be used.

So, how does your program figure out the right thing to do? Consider this code:

int x = 4;
x = x + 5;

We have a read and a write here. When your program reads x from memory, it finds 0x00000004 there. And your program knows to add 0x00000005 to it. And the reason your program 'knows' this is a valid operation, is because the compiler ensures that the operation is valid through type-safety. Your compiler has already verified that you can add 4 and 5 together. So when your binary code runs(the exe), it doesn't have to make that verification. It just executes each step blindly, assuming everything is OK(bad things happen when they are in fact, not OK).

Another way to think of it is like this. I give you this information:

0x00000004: 0x12345678

Same format as before - address on the left, value on the right. What type is the value? At this point, you know just as much information about that value as your computer does when it's executing code. If I told you to add 12743 to that value, you could do it. You have no idea what the repercussions of that operation will be on the whole system, but adding two numbers is something you're really good at, so you could do it. Does that make the value an int? Not necessarily - All you see is two 32-bit values and the addition operator.

Perhaps some of the confusion is then getting the data back out. If we have:

char A = 'a';

How does the computer know to display a in the console? Well, there are a lot of steps to that. The first is to go to As location in memory and read it:

0x00000004: 0x00000061

The hex value for a in ASCII is 0x61, so the above might be something you'd see in memory. So now our machine code knows the integer value. How does it know to turn the integer value into a character to display it? Simply put, the compiler made sure to put in all of the necessary steps to make that transition. But your computer itself(or the program/exe) has no idea what the type of that data is. That 32-bit value could be anything - int, char, half of a double, a pointer, part of an array, part of a string, part of an instruction, etc.

Here's a brief interaction your program (exe) might have with the computer/operating system.

Program: I want to start up. I need 20 MB of memory.

Operating System: finds 20 free MB of memory that aren't in use and hands them over

(The important note is that this could return any 20 free MB of memory, they don't even have to be contiguous. At this point, the program can now operate within the memory it has without talking to the OS)

Program: I'm going to assume that the first spot in memory is a 32-bit integer variable x.

(The compiler makes sure that accesses to other variables will never touch this spot in memory. There's nothing on the system that says the first byte is variable x, or that variable x is an integer. An analogy: you have a bag. You tell people that you will only put yellow colored balls in this bag. When someone later pulls something out of the bag, then it would be shocking that they would pull out something blue or a cube - something has gone horribly wrong. The same goes for computers: your program is now assuming the first memory spot is variable x and that it is an integer. If something else is ever written over this byte of memory or it's assumed to be something else - something horrible has happened. The compiler ensures these kinds of things don't happen)

Program: I will now write 2 to the first four bytes where I'm assuming x is at.

Program: I want to add 5 to x.

Reads the value of X into a temporary register
Adds 5 to the temporary register
Stores the value of the temporary register back into the first byte, which is still assumed to be x.

Program: I'm going to assume the next available byte is the char variable y.

Program: I will write a to variable y.

A library is used to find the byte value for a
The byte is written to the address the program is assuming is y.

Program: I want to display the contents of y

Reads the value in the second memory spot
Uses a library to convert from the byte to a character
Uses graphics libraries to alter the console screen(setting pixels from black to white, scrolling one line, etc)

(And it goes on from here)

What you're probably getting hung up on is - what happens when the first spot in memory is no longer x? or the second is no longer y? What happens when someone reads x as a char or y as a pointer? In short, bad things happen. Some of these things have well-defined behavior, and some have undefined behavior. Undefined behavior is exactly that - anything can happen, from nothing at all, to crashing the program or the operating system. Even well-defined behavior can be malicious. If I can change x to a pointer to my program, and get your program to use it as a pointer, then I can get your program to start executing my program - which is exactly what hackers do. The compiler is there to help make sure we don't use int x as a string, and things of that nature. The machine code itself is not aware of types, and it will only do what the instructions tell it to do. There is also a large amount of information that's discovered at run-time: which bytes of memory is the program allowed to use? Does x start at the first byte or the 12th?

But you can imagine how horrible it would be to actually write programs like this(and you can, in the assembly language). You start off by 'declaring' your variables - you tell yourself that byte 1 is x, byte 2 is y, and as you write each line of code, loading and storing registers, you (as a human) have to remember which one is x and which one is y, because the system has no idea. And you (as a human) have to remember what types x and y are, because again - the system has no idea.

Related Solutions

C Programming – Assigning Strings to Pointers

This is just the way string literals work in C. String literals like "name" are arrays of characters, it is equivalent to the five element array {'n', 'a', 'm', 'e', '\0'}. For the code

char *c;
c="name";

the environment reserves memory for the above array already at initialization time, when the program is loaded from disk into memory. At run time, the adress of the beginning of that array is assigned to c.

Note the first piece of code of yours is not equivalent to the second, because in the first piece you assign a string literal (and not a character like 'n') to a char* variable. In the second, you try to assign an int (and not an int array) to an int*.

Here is a tutorial on strings and pointers in C with a more detailed explanation.

Architecture – If a number is too big does it spill over to the next memory location

No, it does not. In C, variables have a fixed set of memory addresses to work with. If you are working on a system with 4-byte ints, and you set an int variable to 2,147,483,647 and then add 1, the variable will usually contain -2147483648. (On most systems. The behavior is actually undefined.) No other memory locations will be modified.

In essence, the compiler will not let you assign a value that is too big for the type. This will generate a compiler error. If you force it to with a case, the value will be truncated.

Looked at in a bitwise way, if the type can only store 8 bits, and you try to force the value 1010101010101 into it with a case, you will end up with the bottom 8 bits, or 01010101.

In your example, regardless of what you do to myArray[2], myArray[3] will contain '4'. There is no "spill over". You are trying to put something that is more than 4-bytes it will just lop off everything on the high end, leaving the bottom 4 bytes. On most systems, this will result in -2147483648.

From a practical standpoint, you want to just make sure this never, ever happens. These sorts of overflows often result in hard-to-solve defects. In other words, if you think there is any chance at all your values will be in the billions, don't use int.