Aesthetically, my favorite architecture in many was is the 14-bit series. The 16-bit PIC18Fxx architecture improves some things, but I find somehow the design less aesthetically pleasing. Which architecture you'll like better probably depends upon your design aesthetic, the extent to which your find yourself wishing things were designed differently, and the extent to which such wishing detracts from your enjoyment working with them.
From a design perspective, there's no particular reason why code addresses and data addresses need to be the same. One thing I like about the 14-bit PICs is that adding a number to an instruction address advances by that many instructions. By contrast, on the PIC18X, each instruction takes two addresses. Consequently, computed jumps using an 8-bit selector are confined to a range of 128 instructions rather than 256. It's a small detail, but having a program counter whose lowest bit is non-functional seems unaesthetic.
Also, the PIC18xx parts add a single-cycle hardware multiply, but unfortunately since it requires one operand to be in W but puts the results in a fixed pair of other registers, it can't be used very effectively for multi-precision operations. If I had my druthers, there would be two types of multiply instructions:
- Simple multiply -- Store W into multiplier register, and store op*W into PRODH:W
- Multply-add --Store PRODH+op*multiplier register into PRODH:W
With such a pattern, a 16x16 operation would be rendered as:
movf OP1L,W
mul OP2L
movwf RESULT0
mula OP2H
movff OP2L,MULTR
mula OP2L
movwf RESULT1
mula OP2H
muvwf RESULT2
movff PRODH,RESULT3
Further, arbitrary-length multiplies could be done with an average cost of a little over two cycles per 8x8 partial product, using the repeated pattern:
mula POSTINC0,c
addwfc POSTINC1,f,c
That pattern would multiply one multi-byte number times an 8-bit value and add the result to another multi-byte number.
As it is, I think the best one can do for an extended multiply is to do the multiply to a destination buffer without doing a built-in add, at a cost of six cycles per 8x8 partial product, and then spend another two-cycles per partial product adding that result to the previous 8xN partial result.
movf multiplier,w
mulwf POSTINC0,c
movf PRODL,w,c
addwfc POSTINC1,w
movff PRODH,INDF1
Four times as long as what could be achieved with a slightly different instruction set. I don't know that I've seen any processor which included a function to compute PRODH+Op1*Op2 but it would be a very simple feature to include in shifter-based multiplies, and it facilitates computing arbitrary product widths with fixed hardware cost. Actually, since the PIC takes four hardware clocks per instruction, the hardware required to allow a 16xN or 32xN multiply would be pretty modest; when computing big products, a 16xN or 32xN multiply with suitable register usage would offer a 2x or 4x speedup.
Best Answer
An illustrative example or two may help here. Take a look at the following hypothetical circuit:
simulate this circuit – Schematic created using CircuitLab
Suppose to start both A and B are high (1). The output of the AND is therefore 1, and since both inputs to the XOR are 1, the output is 0.
Logic elements don't change their state instantly - there's a small but significant propagation delay as the change in input is handled. Suppose B goes low (0). The XOR sees the new state on its second input instantly, but the first input still sees the 'stale' 1 from the AND gate. As a result, the output briefly goes high - but only until the signal propagates through the AND gate, making both inputs to the XOR low, and causing the output to go low again.
The glitch is not a desired part of the operation of the circuit, but glitches like that will happen any time there's a difference in propagation speed through different parts of the circuit, due to the amount of logic, or even just the length of the wires.
One really easy way to handle that is to put an edge-triggered flipflop on the output of your combinatorial logic, like this:
simulate this circuit
Now, any glitches that happen are hidden from the rest of the circuit by the flipflop, which only updates its state when the clock goes from 0 to 1. As long as the interval between rising clock edges is long enough for signals to propagate all the way through the combinatorial logic chains, the results will be reliably deterministic, and glitch-free.