Compiler Design – How Are Variables Stored in a Language Compiler or Interpreter?

compilerinterpreterslanguagesstoragevariables

Say we set a variable in Python.

five = 5

Boom. What I'm wondering is, how is this stored? Does the compiler or interpreter just put it in a variable like so?

varname = ["five"]
varval  = [5]

If this is how it is done, where is that stored? It seems like this could go on forever.

Best Answer

Interpreter

An intepreter will work about the way you guessed. In a simple model, it will maintain one dictionary with the variable names as dictionary keys and the variable values as dictionary value. If the language knows the concept of variables that are visible only in specific contexts, the interpreter will maintain multiple dictionaries to reflect the different contexts. The interpreter itself is typically a compiled program, so for its storage, see below.

Compiler

(This depends very much on the language and compiler and is extremely simplified, so it's just meant to give some idea.)

Let's say, we have a global variable int five = 5. A global variable exists only once in the program, so the compiler reserves one memory area of 4 bytes (int size) in a data area. It can use a fixed address, let's say 1234. Into the executable file, the compiler places the info that the four bytes starting at 1234 are needed as static data memory, are to be filled with the number 5 at program start and optionally (for debugger support) the info that the 1234 place is called five and contains an integer. Wherever some other line of code refers to the variable named five, the compiler remembers that it is placed at 1234 and inserts a memory read or write instruction for address 1234.

If int six = 6 is a local variable within a function, it should exist once for every currently active call of this function (there can be multiple because of recursion or multi-threading). So, every function call stacks enough space onto the stack to hold its variables (including four bytes for our six variable. The compiler decides where to place the six variable within this stack frame, maybe at 8 bytes from the frame start and remembers that relative position. So, the instructions that the compiler produces for the function, are:

  • advance the stack pointer by enough bytes for all the local variables of the function.

  • store the number 6 (initial value of six) into the momory location 8 bytes above the stack pointer.

  • wherever the function refers to six, the compiler inserts a read or write instruction for the momory location 8 bytes above the stack pointer.

  • when finished with the function, rewind the stack pointer to its old value.

Once again, that's just a very simplified model, not covering all variable types, but maybe it helps to get an understanding...