For a real computer, you definitively would want more than 4 bits of program address since 4 bits only allows 16 instructions. So I came up with a scheme using a two-byte instruction for jumps, calls, load and stores which would give you a 12 bit address or 4096 location.
However, if you leave off this extra byte, then my instruction format allows for 5 bits (not just 4) of program address, and up to 4 bits of RAM addressing.
So the following is an instruction set based on the specification of two registers. All instructions are one byte except for the four requiring full addresses (optional, as described earlier, leave off this 2nd byte for 4 bit addressing).
I left in the long formats, because if one includes them, I think this would make a reasonable 8-bit computer (even though it can only address 4K bytes).
Although I favor memory-mapped I/O over input/output instructions, I provided two of each to satisfy the spec.
register-register instructions:
0 0 x x x x d s
where x x x x is the opcode,
d is the destination register 0 or 1,
and s is the source register 0 or 1
opcodes field:
0000 add d = d + s
0001 adc d = d + s + c
0010 sub d = d - s
0011 subb d = d - s - c
0100 and d = d and s
0101 or d = d or s
0110 xor d = d xor s
0111 not d = not s
1000 asr s = 0 arithmetic shift right d
(s=0 means s field is 0, not that the register is 0)
1000 asl s = 1 arithmetic shift left d
1001 ror s = 0 rotate right d
1001 rol s = 1 rotate left d
1010 inc s = 0 increment d
1010 dec s = 1 decrement d
1011 cmp d - s (no store)
1100 inp1 s = 0 input to reg d from input port 1
1100 inp2 s = 1 input to reg d from input port 2
1101 out1 s = 0 output from reg d to output port 1
1101 out2 s = 1 output from reg d to output port 2
1110 mul d/s = s * d (high byte of result into d, low byte into 1-d)
1111 sec ds = 00 set carry
1111 clc ds = 01 clear carry
1111 ret ds = 10 return from subroutine
1111 hlt ds = 11 halt
0 1 0 0 n n n n
brn - unconditional branch negative -n bytes (up to -16),
used for branching back at end of a short loop after a skip
instruction
0 1 0 1 b b i i
skip instructions, where
b b is type of branch
i i = # of bytes to skip typically 1 or 2, latter for
skipping over jump/call)
b b field:
00 scs skip i bytes if carry set
01 scc skip i bytes if carry clear
10 szs skip i bytes if zero bit set
11 szc skip i bytes if zero bit clear
0 1 1 r n n n n
load immediate to register r (0 or 1) signed value nnnn
+15 to -16
1 0 x p a a a a
a a a a a a a a (2nd byte only for extended format)
jump or call instruction (x = 0 is jump, 1 is call)
p is reserved for a page bit (or could just be the high
bit of address). 12 bits of address provide a direct call
or jump to 4K of program memory (or 5 bits provide
access to 32 bytes of memory).
1 1 x r i a a a
a a a a a a a a (2nd byte only for extended format)
load store from/to RAM (x = 0 is load, 1 is store)
11 bits of address provide direct access to 2K of RAM
(or 3 bits provides access to 8 bytes of RAM)
r is the destination or source register (0 or 1)
i field specifies indexed addressing using the register
not specified by the r field. if indexing feature left
off, then either 4K bytes or 16 bytes can be addressed.
There are three kinds of branches: jump and call instructions, which take a full address; an unconditional branch instruction that can branch backwards up to 16 bytes; and conditional skip instructions that can skip up to 4 bytes ahead. Using skips instead of branches allowed for a shorter address field. It could be redone as branches instead by getting rid of the load immediate instructions:
0 1 b b a a a a
conditional branch instructions, where
b b is type of branch
a a a a is signed relative branch +- 8 bytes
b b field:
00 scs branch i bytes if carry set
01 scc branch i bytes if carry clear
10 szs branch i bytes if zero bit set
11 szc branch i bytes if zero bit clear
The way the multiply works is as follows: an 8x8 multiply gives a 16 bit result. The multiply instruction always multiplies register 0 by register 1. The high byte of the result goes into register d, and the low byte goes into register 1-d. s is ignored.
I didn't implement the concept of multiplying the "input buffer" by the "data cache" since the OP didn't specify any details about the cache -- and I currently have the input buffer being read into either of the two registers. Loading the an input port into one of the registers, multiplying by the other producing a 16-bit product in both makes a lot more sense.
Except for the multiply, this could be implemented fairly easily; all of the arithmetic operations (add, subtract, compare) and logical operations (and, or, xor, not) can be performed by an ALU (Arithmetic/Logic Units), supported in Logisim. In real life, this might be implemented using two 4-bit 74LS181 ALUs cascaded together.
Best Answer
Why do you feed the carry out back around to the carry in when you're subtracting? That doesn't seem right, and that's the feedback path that's causing the "oscillation" (which basically means that the simulator can't find a self-consistent output state).
To negate a number, you need to invert all of its bits — which is what I assume the upper-left block is doing — and then add one to it, which is normally done by setting the carry-in of the adder to 1 — unconditionally. There's no reason to have the AND gate or the connection from carry-out.