Electronic – Implementation of Distributed Arithmetic Architecture

digital-logicfpga

im looking at implementing a distributed arithmetic architecture, but running into some trouble.

From the textbook im looking at we can rewrite the unsigned convolution of two of two vectors c and x as

where the function, f, is a LUT to give the partial sum. It is used as so in the following example:

However, when it comes to implementation we dont want to use a barrel shifter to shift b times every iteration, so the textbook suggests to do the following:

And I believe that this is implemented as shown in the following circuit

But I fail to understand how this circuit, and the above example would yield the same results. Let us suppose the same scenario as the example i.e.:

c[0] = 2
c[1] = 3
c[2] = 1

and

x[0] = 1
x[1] = 3
x[2] = 7

Then from the circuit we would have the following calculation made when calculating the convolution of x and c (where t is the iteration of the shift adder):

t = 0 : y = (0/2) + 6 = 6
t = 1 : y = (6/2) + 4 = 7
t = 2 : y = (7/2) + 1 = 4

and obviously this answer is wrong, we can see that x * c = (2)(1) + (3)(3) + (1)(7) = 18

So, I ask you, where have I misinterpreted the circuit, what is the problem? Thanks for any help that is given.

Best Answer

(Via Reddit)

[–]bunky_bunk

[+1] 3 points 28 minutes ago you do not treat bit 0 of the accumulator as the least significant bit. instead you keep a certain number of bits that would be shifted out and discarded. here the 4 bits before the underscore correspond to your equation, but all 6 bits give the result 18 in the end.

t=0: 0110_00 (6) [24]
t=1: 0011_00 + 0100_00 = 0111_00 (6/2 + 4 = 7) [12 + 16 = 28]
t=2: 0011_10 + 0001_00 = 0100_10 (7/2 + 1 = 4) [14 + 4 = 18]

source: https://old.reddit.com/r/FPGA/comments/jz5v2h/implementation_of_distributed_arithmetic/gda0had/

Related Solutions

Electronic – Need help with RTL (Register Transfer Level) circuit implementation – Euclidian GCD algorithm

Because this algorithm involves a loop, it will be much easier to implement in a microcontroller or microprocessor than in digital logic.

However, if you have some good reason to do it in digital logic, like you need absolute control over the speed of the implementation, or you want to have dozens or hundreds of processors executing the algorithm in parallel, it can of course be done in digital logic.

Rather than use individually packaged "registers and arithmetic chips" it would be much preferable to use a programmable logic device. Most likely you'd use an FPGA; but a CPLD would also be possible, particularly with the new CPLD families that are really small FPGA's in disguise.

If you use an FPGA, you'll simply use a synthesis tool to convert your RTL code into a configuration file for the device. You'll load the configuration file into the device and it will begin operation.

There's no need to translate the RTL to the gate level manually --- it's done for you by the synthesis tool.

Online arithmetic with radix 2 addition

In carry-save/borrow save arithmetics, each binary position is represented as two bits. Numbers can have several possible representations.

+25 = P=11001 / N=00000
-10 = P=00000 / N=01010

The result is simply : +25-10 : P= 11001 / N=01010

P=1, N=1 combinations can be simplified to P=0, N=0.

So : +25-10 : P= 10001 / N=00010

You can convert to a traditional two's complement representation by doing the actual subtraction : 10001 - 00010 = 1111 = 15

For a simple substraction, it does not make any sense. The whole purpose of this mess is that you can do iterative additions or subtraction and you only have to perform one level of binary operations on each step ( a few XORs or muxes per bit), without propagating the carry.

For example, divisions (particularly SRT divisions) can be done with many successive additions or subtractions. Using carry-save/borrow-save representations allows to reduce the cycle time, or to calculate more bits per cycle and, as there is no carry propagation, it does not depends on the operand size (for example the same cycle frequency is adapted for both single and double precision floating point).

Edit

Look for "borrow-save adder" for actual implementations . For example : http://users-tima.imag.fr/cis/guyot/Cours/Oparithm/english/Additi.htm

Edit

FA blocks are "full adders"

Take 3 inputs, A,B and C, each weighting 0 or 1. The result is between 00 and 11. A full adder is :

Carry = (A and B) or (B and C) or (A and C)
Sum = A xor B xor C

The bubbles around the adders are obviously inverters, 'NOT' gates

The 'Online' version on the right diagram processes bits serially instead of in parallel. I have the impression that the least significant bit should be provided first. Squares are memory/flipflops, the propagation time is therefore 3 clocks.

The adder

This diagram describes an adder taking two BS vectors (X+/X-) and (Y+/Y-) and generating one BS result (Z+/Z-). It can also be used to calculate the sum of two differences.

The problem I see is that you need the absolute value to calculate the 'sum of absolute differences'. I don't know how to convert to the absolute without doing a (carry propagating) comparison.

Best Answer

Related Solutions

Electronic – Need help with RTL (Register Transfer Level) circuit implementation – Euclidian GCD algorithm

Online arithmetic with radix 2 addition

Related Topic