Developing a Compiler for Custom CPU Architecture

Architectureassemblycompilercpu

Recently ive been consumed by creating my own simple CPU architecture that at some point could be easily implemented in hardware (No FPGA, but actual Logic Gate circuits). Naturally to fulfill this requirement i went with a simple 4 Bit CPU, with a 4kB program space and 256 Byte RAM.

It supports all the fundamental operations such as ADD, Subtract, AND, LOAD, STORE etc. Before i start committing this to hardware i want to develop a moderately powerful software stack that could compile the a C/ C like language for the architecture, so the cpu could be programmed using a high level language. Currently i have written a working assembler in VB.NET, but now im stuck on how to approach the final goal of a working compiler.

Specifically i have the following questions:

What should be my next step, and how should i approach writing a compiler?

Even though a 4 Bit CPU is simple, it is not very useful as it cannot handle large calculations at once, thus my final goal would be abstract this inability by developing a software stack that to the user would be like programming a 16 bit (or larger) CPU. Currently i manually write assembly that can span larger numbers over multiple registers and perform calculations between them, but ultimately what part of the software stack deals with handling numbers and calculations which are greater than the size of the physical registers?

What part of the software stack deals with Subroutine calling etc?

Please let me know if i need to clarify anything.

Best Answer

LLVM backend is the primary sane way of doing this. If you lower LLVM IR to assembly or microcode, you can roll from there and just use the numerous LLVM frontends to convert higher languages like C++ into LLVM IR.

In other words, LLVM was explicitly designed to support this scenario.

The full stack goes like this:

  • Frontend (e.g. Clang for C and C++) - source code -> LLVM IR
  • Optimizer (LLVM) - LLVM IR -> LLVM IR
  • Backend (you) - LLVM IR -> assembly/microcode/whatever

The first part is provided for you on a per-language basis. So for C and C++, you can use Clang, for D you can use LDC, etc. The second part is provided by LLVM- they provide a large number of target-independent optimization routines and some target-aware ones. Finally, you provide a translation service from LLVM IR to your architecture-specific code.

Note that LLVM IR makes a few guarantees, because they are targetted at real platforms. For example, they assume IEEE754 floating-point support and 8-bit bytes as well as various types of pointer support. You will need to support all of these anyway if you want to compile languages like C to target your architecture in general. If you are willing to restrict the source language a bit beyond normal you can get away without implementing all of these features- for example, if the C code doesn't use floats, in principle there's no reason why the frontend should emit float-using LLVM IR code.

LLVM IR is a common middle-ground that you can compile any language to target, and then from there, can be lowered for any CPU. Basically, all you need to do is support the primitives, and then provide an LLVM backend to convert from LLVM IR to your assembly. LLVM and language frontends will do all the rest.

Related Topic