Implementing non-fixed length array support in a compiler

arrayasmassemblyccompiler

I'm thinking of building a language for PIC microcontrollers. I want to be able to use non-fixed size arrays, like this:

Declare the variable as int[]
Wait for input from serial connection
Make the variable input long

I think such a feature would be useful, but don't know how I should go about implementing this in a compiler which compiles to assembly. I want the arrays to be stored on consecutive register addresses, of course. Since I'm working with PICs, this has to be very memory efficient.

When someone writes int[], I don't think it's a good idea to reserve memory space for the variable yet, is it? Then the array would have a fixed maximum size. For example, when the array gets reserved memory address 5-100, and other variables get 4 and 101, the array has fixed borders and can't grow any bigger than 96 registers. Also, when I'd reserve memory from the start, let's say x bytes, and I in the end only need y bytes, I'm wasting x – y bytes. I don't want that.

That means the only option I see is initializing the array and reserving space in the microcontroller, on the fly. This will take up some memory and execution time, of course. I thought of a system like this:

Initialize an array int[x] = {int, int} which holds pointers to the start and end of arrays that are not initialized from the beginning – x would be the maximum amount of arrays (this is a concession, but is better than a maximum length for all arrays)
Store a variable c = 0 to indicate the number of arrays used
Store the borders of the initialized (reserved) memory in a variable somewhere
When an array gets a length:
- Put pointers to the start (the current border) and end (the current border + the length) in the array from the first point at index c
- Increment c

I think this would work (wouldn't it?), but there a few cons, mainly concerning memory: I need to store the array, c and the current memory borders as overhead.

Would there be a better way to implement non-fixed size arrays in a language for PIC microcontrollers? My requirements are:

Low memory overhead
The array length does not have to be changed on the fly
With the system I thought up, one cannot store values in arrays that haven't been initialized yet. If there would be a system that can store values in an array of undefined length, that would be an advantage
Faster systems (at runtime, compile time doesn't matter) are preferable

Best Answer

You will probably make it a lot easier on yourself and avoid a lot of user errors if you allow the declaration of variable-length arrays to be delayed until a point where the size is known. Then you don't have to invent a syntax for assigning the array size and you don't have to deal with differences in the order in which arrays are declared and in which they get their size.

If your compiler uses a stack-based allocation scheme for local variables (they are given addresses relative to the stack-frame of their function, like it is done for the 'big' platforms), the variable-length arrays can just be carved out of the stack frame as needed. Only if there are multiple variable-length arrays in a function do you need an additional pointer to indicate where the data of each starts.

If your compiler uses a fixed allocation scheme (all variables, including locals, are given a fixed (absolute) address by the compiler/linker), I would use an 'array stack' for carving the variable-length arrays from. As overhead, you would need a global pointer to indicate where the free space currently starts, a pointer for each variable-length array to indicate where its data is located and, in each function using variable-length arrays, a way to restore the global pointer to the value it had when entering the function.
The user would also need some way to indicate to the compiler/linker how much of a 'array stack' he thinks will be needed. A reasonable default could be to use whatever is left after allocating all fixed-size variables and variable-length array overhead.

I think this scheme uses the least amount of overhead, both space and time-wise. The scheme you came up with is essentially a malloc implementation.

Related Solutions

The purpose of arrays in C, when pointers could have done the job

Arrays are contiguous memory created on the stack. You can't guarantee contiguous stack memory without this syntactic sugar, and even if you could, you'd have to allocate a separate pointer in order to be able to do the pointer arithmetic (unless you wanted to do *(&foo + x), which I'm not sure but it might violate l-value semantics, but is at least quite awkward, and would scream out for some kind of syntactic sugar). Design-wise, it also is a form of encapsulation, since you can refer to the collection with a single identifier (which would otherwise require a separate pointer). And even if you could allocate them contiguously and allocated a separate pointer to reference them, you'd have either int fooForSomething, fooForSomethingElse... which forces a fair amount of creativity as your collection grows, so you might think to simplify with int foo1, foo2 ..., which looks just like an array but is harder to maintain.

Correct For Loop Design

You forgot to mention wheter your string variables in Felix start with index 0 or 1. Searching for that in the web, its additional job for readers. And affects the way your example is evaluated.

Anyway. Are you sure that:

for(i=0; predicate(i); increment(i))

In C: "The predicate is tested after the increment, but the terminating increment is not universally valid!"

Traslates to this:

i=0
continue:
  body
  increment(i)
  if not predicate(i) goto break
  goto continue
break:

Instead of this:

continue:
  i=0
  if not predicate(i) goto break
  body
  increment(i)
  goto continue
break:

Since your for loop its more specific like pascal, you may consider how should be translated and evaluated in case the index value is equal or lesser to the initial value.

Usually, if the initial value, and final value are the same, the loop is executed once, if the final value is greater that the initial value, the loop is not executed.

Best Answer

Related Solutions

The purpose of arrays in C, when pointers could have done the job

Correct For Loop Design

Related Topic