Working with borrow save signed digit arithmetic output mismatch

adderboolean-algebradigital-logicsub-1

I am working with OLA adders the borrow save module for signed digit addition
I implemented the added in vhdl modeling the following picture
adder

I have run the following example as given in the table and got the same result but with a glitch on the first Zp since it has 2 registers and stays unsigned output for the first 2 cycles

I have did the subtraction manually using basic sub and got different results I do not understand why

manually I did: 0010000110100001 – 1000010010010100
the result i obtained is : 01001110100001101
which is different of the result obtained in the table

EDIT: I tried the simulation of fig2 diagram with the following

X = 01 00 10 10 and Y = 00 10 11 10

as 74-46
the obtained result is Z : 01 10 00 00

which is not equal to the hand made calculation of 00 01 11 00

the following is the simulation and I added 2 cycles of 00 00
simulation
OLA adder

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_textio.all;
use IEEE.std_logic_arith.all;
use IEEE.numeric_bit.all;
use IEEE.numeric_std.all;
use IEEE.std_logic_signed.all;
use IEEE.std_logic_unsigned.all;
use IEEE.math_real.all;
use IEEE.math_complex.all;

entity sD_adder is 
port ( clk,rst : in std_logic;
         x_p,x_m,y_p,y_m : in std_logic;
         z_p,z_m : out std_logic
        );
end sD_adder;

architecture cSadd of sD_adder is
signal sig_xm : std_logic;
signal sig_zp, sig_zm : std_logic;
signal n_sig_xm, n_sig_ym, n_sig_w2m,n_sig_g2p,n_sig_g2m,n_sig_g3 : std_logic;
signal sig_g3, sig_h2, sig_w2m : std_logic;
signal sig_g2p, sig_g2m, sig_w2p : std_logic;
signal sig_z2p,sig_z2m : std_logic;

begin

    n_sig_xm <= not(x_m);

    FA_add1: entity work.add_sig 
    port map( a => x_p,
                 b => n_sig_xm,
                 cin => y_p,
                 sum => sig_g3,
                 cout => sig_h2);

    reg_ff1: entity work.d_ff 
    port map( clk=>clk,
                 rst=>rst,
                 d=>y_m,
                 q=>sig_g2p
                );



    reg_ff2: entity work.d_ff 
    port map( clk=>clk,
                 rst=>rst,
                 d=>sig_g3,
                 q=>sig_g2m
                );  


    n_sig_g2p <= not(sig_g2p);  

    FA_add2: entity work.add_sig 
    port map( a => sig_g2m,
                 b => n_sig_g2p,
                 cin => sig_h2,
                 sum => sig_w2p,
                 cout => sig_w2m);

    n_sig_w2m <= not(sig_w2m);

    reg_ff3: entity work.d_ff 
    port map( clk=>clk,
                 rst=>rst,
                 d=>sig_w2p,
                 q=>sig_z2p
                );

    reg_ff4: entity work.d_ff 
    port map( clk=>clk,
                 rst=>rst,
                 d=>sig_z2p,
                 q=>z_p
                );

    reg_ff5: entity work.d_ff 
    port map( clk=>clk,
                 rst=>rst,
                 d=>n_sig_w2m,
                 q=>z_m
                );
end cSadd;

EDIT 2: Added a question regarding the overflow

How can i detect the overflow of such adder when adder two unsigned numbers in the range of 0-255 if I add:
01 01 01 01 10 10 00 11 + 01 00 01 00 10 00 10 10 = 228 + 149 = 377…with the output being 01 00 11 11 10 11 11 01 = 121 I should add an extra digit which is the 256 therefore I will have 9 digits of 8bit pairs but if the result is in the range i do not need to set the 256th digit what is the indicator of overflow in SD MSB adder?

Best Answer

why use a most-significant-digit-first adder

I've been told that 90% of the effort in video compression is in performing motion estimation and motion compensation, also called block matching.

That requires scanning through the previous (and in some systems the next) keyframe, looking for a block of pixels that looks "the most similar" to the current block of pixels in the current frame.

As Olivares says on page 4, computation of the sum of absolute differences can be divided into 3 steps:

Step 1. Computation of pixel differences d[i][j] = c[i][j] - r[i][j] Step 2. Determine the absolute value of each difference abs( d[i][j] ) Step 3. sum all the absolute values.

The final sum is the "sum of absolute differences" (SAD), used to estimate "how similar" a candidate block from the keyframe is to the current block.

Step 4. Compare that SAD with the best SAD calculated so far. After comparing all the candidate blocks, the one with the smallest SAD value is considered the one that looks the "most similar" to the reference block, and so its coordinates are used as the motion compensation vector in the compressed data stream.

Joaquin Olivares, et. al. "Minimum Sum of Absolute Differences implementation in a single FPGA device" suggests implementing SAD using a most-significant-digit first (MSD first) serial adder using a signed-digit representation. Olivares lists some of the advantages of MSD first serial computation on p. 3.

Signed-digit representation allows the FPGA to compute step 1 with zero delay: given two pixels C and R, where C is represented by 8 bits (bits c7, c6, c5, c4, c3, c2, c1, and c0) in standard unsigned integer format:

C = c7 * 2^7 + c6 * 2^6 + c5 * 2^5 + c4 * 2^4 + c3 * 2^3 + c2 * 2^2 + c1 * 2 + c0

the (signed) difference D = C - R between them is represented in signed digit format as the 8 bit pairs ( c7 r7, c6 r6, c5 r5, c4 r4, c3 r3, c2 r2, c1 r1, c0 r0 ). Shuffling the bits around like this simultaneously converts the pixel value to SAD and performs the difference between a pixel in the current block and a pixel in the candidate reference block.

For example, the FPGA can simultaneously convert to SAD and subtract the candidate pixel value 46 and the reference pixel value 74 by simple bit rearrangement: When calculating 74 - 46, we expect to get a difference of 28.

46 is represented as 46 = 0x2E = 0b0010_11110.

So the FPGA rearranges those bits to form the difference value in SAD format:

    0b 0  0  1  0   1  1  1  0  == 0x2E == 46
 -  0b  0  1  0  0   1  0  1  0 == 0x4A == 74
    -----------------------------------------
       00 01 10 00  11 10 11 00, a stream of digits which represents
       +0 -1 +1 +0  +0 +1 +0 +0 which represents -28.

When this particular stream of digits is fed into the absolute-value circuit, it quickly -- with zero delay, see page 5 of Olivares -- recognizes that it is a negative number, and the absolute-value circuit swaps the 2 bits of each digit to produce the absolute value of that negative number, producing

       00 10 01 00  11 01 11 00, a stream of digits which represents
       +0 +1 -1 +0  +0 -1 +0 +0 == 0b 0 0 0 1  1 1 0 0 = 0x1C = 28.

That bit-pair stream 00 10 01 00 11 01 11 00 is fed into either the "X" input of the MSD first adder circuit -- i.e., the left bit of each digit is fed into the "X+" input, and the right bit of each digit into the "X-" input.

In other words, all the bits representing the pixel value 74 are streamed most-significant bit first into the "X+" input, and all the bits representing the pixel value 46 are streamed into the "X-" input.

(The absolute value of the difference between 2 other pixels is fed into the "Y+" and "Y-" inputs of the serial adder).

Then, as Olivares describes on page 6, an adder tree, using many copies of the on-line adder circuit you posted, finds the total SAD of the current block compared to the candidate block.

Finally, the total SAD of this current block can begin to be compared to the best (lowest) SAD of all the candidates found so far (step 4). By using MSD first computation, that comparison can be done in parallel with the computation, and as soon as the FPGA detects that the current candidate is worse (has a larger SAD) than the best-so-far SAD, the FPGA can stop the computation of this candidate and skip ahead to the next candidate.

what is the advantage of such adder over the normal serial adder in calculating the sad value ... ?

Compared to a using a tree of normal least-significant-bit first (LSB first) serial adders, this MSD first system is faster -- it computes steps 1 and 2 and 4 in fewer clock cycles, and step 3 in the same amount of time.

details

As you know, signed-digit representation uses two bits to represent each digit. Olivares says "the first bit is weighted negative and the second bit is positive weighted".

The example posted -- Ercegovac and Lang, "Digit-Serial Arithmetic", p. 27; also Ercegovac and Lang, "Digital Arithmetic", p. 507 -- uses a slightly different signed-digit representation -- the left bit is weighted positive and the right bit is weighted negative.

   00 10 00 01  10 10 00 01 (x) represents
   +0 +1 +0 -1  +1 +1 +0 -1 =
         +0011  1011 = 0x3b = 59

   10 00 01 00  10 01 01 00 (y) represents
   +1 +0 -1 +0  +1 -1 -1 +0 =
         +0110  0010 = 0x62 = 98

Summing those numbers should produce 59 + 98 = 157.

The adder circuit shown has internal nodes h, g, t, w used to produce the final output value z.

The output of the adder circuit:

   10 01 11 10 00  11 01 11 10 (z) represents
   +1 -1 +0 +1 +0  +0 -1 +0 +1 =
           +01001  1101 = 0x9d = 157

which is one of several valid ways to represent 157 in signed-digit representation.

With the circuit you show, the interemediate values h, g, t, w and the output value z will not give exactly the same bits when adding 59 + 98 as they do when adding 98 + 59, but the two different output bit patterns are both valid representations of 157.

What is your actual question?

EDIT: As you've probably already figured out, in this schematic diagram, the output of each full adder has the more-significant bit on the left, and the less-significant bit on the right. (The extra delays on the right side allow the "carry bits" from less-significant digits to carry over and interact with "sum bits" from more-significant digits).

This example uses hardware where numbers are represented by one kind of signed-digit representation -- the digits "-1", "0", and "+1" represent themselves when located in the least-significant position, and (as you've already figured out) each step towards a more-significant position doubles its weight (much like traditional binary). This hardware uses the bit pair "10" represents the digit "+1", "01" represents "-1", "00" represents 0, and "11" also represents 0. Also, as the data streams through the system, the most significant digit (MSD) (represented by a pair of bits) of some value flows through first, and the least significant digit flows through last -- you feed in the input values MSD first, and the circuit generates the output values in the same order, MSD first (unlike some other serial adders that require input digits with the least significant digit LSD first, and generate output values LSD first).

This circuit can add values of any number of digits.

If your original numbers are always in the range 0 to 255, they need to be fed into this circuit as a sequence of 8 bit pairs (16 bits) (it can also handle some negative numbers).

As always, if you have two numbers in the range 0..255, when you add them together you will end up with a sum in the range 0..510, which requires more than 8 bits.

One way to use this circuit to add pairs of numbers in the range 0..255 is for the control circuit to clock 8 pairs of zero bits into each input of the summer, followed by 8 pairs of data bits, followed by 8 more pairs of zero bits, followed by 8 more pairs of data bits, etc. That would make it easy to collect the results in a 16-digit (32-bit) register, and later convert to a 16-bit number in standard binary notation. (You may find it convenient to send more or fewer than 8 pairs of zero bits between your data bits).

what is the advantage of such adder over ... making an adder tree?

On page 5, Olivares says Such an adder is used as part of an adder tree.

If you zeros into the system for 8 clocks, then 8-bit pixel values (most significant bit first) into the system for 8 clocks, then zeros, then pixel values, etc.; then initially pairs of zeros will stream out of the adder, followed by bit-pairs the sum-of-absolute-values.

Olivares uses a single comparator at the root of the tree of adders to decide if this candidate block of pixels looks "more similar" to the reference block than the most similar candidate block so far. That comparator circuit is very similar to the absolute-value circuits at the leaves of the tree -- one such absolute-value circuit for every pair of input pixels, one from the reference block and the corresponding pixel from the current candidate block.

As far as I can tell, Olivares doesn't have any hardware that "indicates overflow". He feeds in enough pairs of zero bits between each pixel value so there is more than enough room for all the carry digits that might possibly be generated in the worst case.