Implementing non-fixed length array support in a compiler

arrayasmassemblyccompiler

I'm thinking of building a language for PIC microcontrollers. I want to be able to use non-fixed size arrays, like this:

  1. Declare the variable as int[]
  2. Wait for input from serial connection
  3. Make the variable input long

I think such a feature would be useful, but don't know how I should go about implementing this in a compiler which compiles to assembly. I want the arrays to be stored on consecutive register addresses, of course. Since I'm working with PICs, this has to be very memory efficient.

When someone writes int[], I don't think it's a good idea to reserve memory space for the variable yet, is it? Then the array would have a fixed maximum size. For example, when the array gets reserved memory address 5-100, and other variables get 4 and 101, the array has fixed borders and can't grow any bigger than 96 registers. Also, when I'd reserve memory from the start, let's say x bytes, and I in the end only need y bytes, I'm wasting xy bytes. I don't want that.

That means the only option I see is initializing the array and reserving space in the microcontroller, on the fly. This will take up some memory and execution time, of course. I thought of a system like this:

  • Initialize an array int[x] = {int, int} which holds pointers to the start and end of arrays that are not initialized from the beginning – x would be the maximum amount of arrays (this is a concession, but is better than a maximum length for all arrays)
  • Store a variable c = 0 to indicate the number of arrays used
  • Store the borders of the initialized (reserved) memory in a variable somewhere
  • When an array gets a length:

    • Put pointers to the start (the current border) and end (the current border + the length) in the array from the first point at index c
    • Increment c

I think this would work (wouldn't it?), but there a few cons, mainly concerning memory: I need to store the array, c and the current memory borders as overhead.

Would there be a better way to implement non-fixed size arrays in a language for PIC microcontrollers? My requirements are:

  • Low memory overhead
  • The array length does not have to be changed on the fly
  • With the system I thought up, one cannot store values in arrays that haven't been initialized yet. If there would be a system that can store values in an array of undefined length, that would be an advantage
  • Faster systems (at runtime, compile time doesn't matter) are preferable

Best Answer

You will probably make it a lot easier on yourself and avoid a lot of user errors if you allow the declaration of variable-length arrays to be delayed until a point where the size is known. Then you don't have to invent a syntax for assigning the array size and you don't have to deal with differences in the order in which arrays are declared and in which they get their size.

If your compiler uses a stack-based allocation scheme for local variables (they are given addresses relative to the stack-frame of their function, like it is done for the 'big' platforms), the variable-length arrays can just be carved out of the stack frame as needed. Only if there are multiple variable-length arrays in a function do you need an additional pointer to indicate where the data of each starts.

If your compiler uses a fixed allocation scheme (all variables, including locals, are given a fixed (absolute) address by the compiler/linker), I would use an 'array stack' for carving the variable-length arrays from. As overhead, you would need a global pointer to indicate where the free space currently starts, a pointer for each variable-length array to indicate where its data is located and, in each function using variable-length arrays, a way to restore the global pointer to the value it had when entering the function.
The user would also need some way to indicate to the compiler/linker how much of a 'array stack' he thinks will be needed. A reasonable default could be to use whatever is left after allocating all fixed-size variables and variable-length array overhead.

I think this scheme uses the least amount of overhead, both space and time-wise. The scheme you came up with is essentially a malloc implementation.