Electronic – Digital operation on a microcontroller or FPGA

digital-logichardwaremicrocontrollertiming

I have a simple question about logic propagation in an MCU or FPGA within a single clock cycle.

Let's say that we want to multiply two numbers. Multiplication ends within a single clock cycle on most MCUs, but it takes more than one logic stage to perform the multiplication. By "logic stage" I mean operations that cannot be performed in parallel, but rather recursively.

The question: is there another clock within the "multiplication circuitry", derived from the main CPU clock via PLL or something similar, that triggers these logic stages, or everything works continuously. In other words, does the "multiplication circuitry" behave like one system that needs some time to settle, and this settling happens within a single clock cycle?

Best Answer

It sounds as if you're imagining that the intermediate results of an operation need to stored (registered) before the next stage of the operation begins. In fact, a function can be implemented by a purely combinatorial circuit if its inputs and outputs are continuous. The latter means that (a) all input data is simultaneously contained in its inputs, (b) inputs remain unchanged until the function has produced an output and (c) all output data is simultaneously presented and taken on its outputs.

Multiplication of binary numbers is just an array of multibit adders. This circuit is well documented on the internet. For example, to do F[7:0] = A[3:0] x B[3:0], each adder adds a shifted form of A to the total if a bit in B is 1, otherwise that adder adds in zero. Essentially, it does the following:

Add      A  to F if B[0] is 1 otherwise add zero
Add  2 x A  to F if B[1] is 1 otherwise add zero
Add  4 x A  to F if B[2] is 1 otherwise add zero
Add  8 x A  to F if B[3] is 1 otherwise add zero

All of these additions can be done in parallel so a purely combinatorial circuit is used. The unstated 'F starts with 0' is not a separate operation, it's produced by the adder circuit. The 'N x A' forms are produced by wiring, routing the bits of A in the right place in a binary number with zeroes around it.

There is a downside to all-combinatorial operations like this. The more gates 'deep' the combinatorial logic circuit is, the longer the propagation time for signals passing through. This lowers its own maximum clock frequency and therefore the max. clock frequency of circuits it is part of, like the CPU. Adding intermediate registers can produce circuit segments each with a lower propagation time than the overall circuit without them. Results then take more clock cycles to produce from the inputs but it allows the data flow to be pipelined. So the chosen implementation is a trade-off, dependant on what you're trying to do and which goals are more important than others.