Electronic – Approaches to storing and addressing microcode for homebrew CPU

computer-architecturehomebrew-cpumicroprocessor

I've been teaching myself about CPU architecture for a while now and have successfully designed a couple myself. They were always based around microcode to drive the CPU's control lines.

The microcode is stored on one ROM chip and addressed by composing an address out of (1 = LSB):

Bank
CPU flag states
Instruction
T-state

Let's take my latest CPU as an example of how the address is constructed:

Bank (3 bits) – There are 48 control lines in my CPU, so I need 6 bytes to store all control line states. The bank part of the address allows me to address 6 bytes in the microcode ROM.
CPU flag states (3 bits) – I narrowed down my flags to zero, carry and compare. Need 3 bits to address all combinations
Instruction (8 bits) – The instruction set requires over 127 op-code, so I choose to go with 8 bits.
T-state (5 bits) – Some instructions take over 16 t-states to complete (complex ones like conditional CALL / RETURN) so with 5 bits I have enough room for up to 32 t-states

The final example microcode ROM address would look something like:

[flags][instruction][t-state][bank]
[000][00000000][00000][000]

Now for my question:
In the above example I seem to have reached the limit of what I can do with this approach. My microcode ROM address is 19 bits long and I have found one chip that supports it (29C040) but it seems that I don't have a lot of options if I want a large address.

I'm thinking about my next CPU which will need more flags (negative, overflow, parity) and who knows, maybe even a few more control lines or an extra bit for T-states.

What would be a better approach of storing and addressing my microcode in that case?

Only thing I can think of right now is to:

Add more ROM chips (would only free up 1 extra bit? per chip doesn't seem a good approach)
Limit the instruction set (also, would only give me 1 – maybe 2 – extra bits)
Limit the amount of control lines, but that would allow for less control over the CPU…

I wonder how this is solved in professional, microcode based CPU's and what I'm missing here.

Best Answer

Your addressing scheme uses completely disjoint areas of microcode memory for every possible instruction. Most CPUs have a lot of microcode that can be shared among instructions. For example, instruction fetch, operand read and result writeback are typically identical across large groups of instructions, with the only difference being the specific ALU operation performed in the middle.

It's also very odd to store the bytes of the microcode word serially — this means that your ROM must cycle 6 times for every "T state". It is much more common to make the microcode memory wide enough so that it cycles at the same rate as the rest of the logic.

Finally, it sounds like you're experimenting with CPU design. Maybe you should consider using wide SRAM for your microcode memory, and loading it up from some external source (an Arduino or equivalent) each time you power up your system. This would make it a lot easier to make changes.

Related Solutions

Electronic – How to implement an 8-bit CPU

For a real computer, you definitively would want more than 4 bits of program address since 4 bits only allows 16 instructions. So I came up with a scheme using a two-byte instruction for jumps, calls, load and stores which would give you a 12 bit address or 4096 location.

However, if you leave off this extra byte, then my instruction format allows for 5 bits (not just 4) of program address, and up to 4 bits of RAM addressing.

So the following is an instruction set based on the specification of two registers. All instructions are one byte except for the four requiring full addresses (optional, as described earlier, leave off this 2nd byte for 4 bit addressing).

I left in the long formats, because if one includes them, I think this would make a reasonable 8-bit computer (even though it can only address 4K bytes).

Although I favor memory-mapped I/O over input/output instructions, I provided two of each to satisfy the spec.

    register-register instructions:

    0 0 x x x x d s

    where x x x x is the opcode,
          d is the destination register 0 or 1,
          and s is the source register 0 or 1

    opcodes field:

    0000  add   d = d + s
    0001  adc   d = d + s + c
    0010  sub   d = d - s
    0011  subb  d = d - s - c
    0100  and   d = d and s
    0101  or    d = d or s
    0110  xor   d = d xor s
    0111  not   d = not s
    1000  asr  s = 0 arithmetic shift right d
(s=0 means s field is 0, not that the register is 0)
    1000  asl  s = 1 arithmetic shift left d
    1001  ror  s = 0 rotate right d
    1001  rol  s = 1 rotate left d
    1010  inc  s = 0  increment d
    1010  dec  s = 1  decrement d
    1011  cmp  d - s (no store)
    1100  inp1  s = 0  input to reg d from input port 1
    1100  inp2  s = 1  input to reg d from input port 2
    1101  out1  s = 0  output from reg d to output port 1
    1101  out2  s = 1  output from reg d to output port 2
    1110  mul   d/s = s * d  (high byte of result into d, low byte into 1-d)
    1111  sec ds = 00  set carry
    1111  clc ds = 01  clear carry
    1111  ret ds = 10  return from subroutine
    1111  hlt ds = 11  halt

    0 1 0 0 n n n n

    brn - unconditional branch negative -n bytes (up to -16),
    used for branching back at end of a short loop after a skip
    instruction

    0 1 0 1 b b i i

    skip instructions, where
        b b is type of branch
        i i = # of bytes to skip typically 1 or 2, latter for
        skipping over jump/call)

    b b field:

    00  scs skip i bytes if carry set
    01  scc skip i bytes if carry clear
    10  szs skip i bytes if zero bit set
    11  szc skip i bytes if zero bit clear

    0 1 1 r n n n n

    load immediate to register r (0 or 1) signed value nnnn
    +15 to -16

    1 0 x p a a a a
    a a a a a a a a  (2nd byte only for extended format)

    jump or call instruction (x = 0 is jump, 1 is call)
    p is reserved for a page bit (or could just be the high
    bit of address).  12 bits of address provide a direct call
    or jump to 4K of program memory (or 5 bits provide
    access to 32 bytes of memory).

    1 1 x r i a a a
    a a a a a a a a  (2nd byte only for extended format)

    load store from/to RAM (x = 0 is load, 1 is store)
    11 bits of address provide direct access to 2K of RAM
    (or 3 bits provides access to 8 bytes of RAM)
    r is the destination or source register (0 or 1)
    i field specifies indexed addressing using the register
    not specified by the r field.  if indexing feature left
    off, then either 4K bytes or 16 bytes can be addressed.

There are three kinds of branches: jump and call instructions, which take a full address; an unconditional branch instruction that can branch backwards up to 16 bytes; and conditional skip instructions that can skip up to 4 bytes ahead. Using skips instead of branches allowed for a shorter address field. It could be redone as branches instead by getting rid of the load immediate instructions:

0 1 b b a a a a

conditional branch instructions, where
    b b is type of branch
    a a a a is signed relative branch +- 8 bytes

b b field:

00  scs branch i bytes if carry set
01  scc branch i bytes if carry clear
10  szs branch i bytes if zero bit set
11  szc branch i bytes if zero bit clear

The way the multiply works is as follows: an 8x8 multiply gives a 16 bit result. The multiply instruction always multiplies register 0 by register 1. The high byte of the result goes into register d, and the low byte goes into register 1-d. s is ignored.

I didn't implement the concept of multiplying the "input buffer" by the "data cache" since the OP didn't specify any details about the cache -- and I currently have the input buffer being read into either of the two registers. Loading the an input port into one of the registers, multiplying by the other producing a 16-bit product in both makes a lot more sense.

Except for the multiply, this could be implemented fairly easily; all of the arithmetic operations (add, subtract, compare) and logical operations (and, or, xor, not) can be performed by an ALU (Arithmetic/Logic Units), supported in Logisim. In real life, this might be implemented using two 4-bit 74LS181 ALUs cascaded together.

Electronic – How to efficiently design the opcode for a CPU

I think it is a good approach to study some other instruction sets.

A small one would be the MSP430 from TI it is a 16bit Processor with about 22 instructions.

http://www.physics.mcmaster.ca/phys3b06/MSP430/MSP430_Instruction_Set_Summary.pdf

You could also look into the Atmel AVRs they have also a quite small instruction set.

In a little project of mine I tried to develop a simple 32 bit processor in VHDL with a small instruction set (14 instructions):

http://www.blog-tm.de/?p=80

Due to my current free time it is not fully finished. The instructions are implemented but two are not tested and maybe some status flags are missing.

Best Answer

Related Solutions

Electronic – How to implement an 8-bit CPU

Electronic – How to efficiently design the opcode for a CPU

Related Topic