cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Adventurer
Adventurer
1,555 Views
Registered: ‎02-24-2017

force vivado to implement an adder tree

Jump to solution

Hi all,

I am currently writing a generic adder.
This module has a generic number of input vectors, which all have the same generic width, and it computes the sum of all these vectors.

Since the number of terms is generic, I coded this using a variable as an accumulator and a for loop.

I am worried that Vivado is going to unroll the loop in a 'naive' way and build an adder tree that always grows on the same side, resulting in a depth = (number of terms).
Instead, I would like Vivado to balance the adder tree, resulting in depth = log2(number of terms).
(I even discovered that Vivado can synthesize ternary adders, so I could possibly reach depth = log2(number of terms)

What VHDL coding style, or attribute, (or any other mean) can I use to force the synthesizer or the mapper to implement a balanced adder tree?

Thanks,

- Julien 

Tags (3)
0 Kudos
1 Solution

Accepted Solutions
Highlighted
Adventurer
Adventurer
1,320 Views
Registered: ‎02-24-2017

Hi have pushed further my investigations with adder trees.
Here are the results. (I will add the code as an attachement to the page.)

 

Theory

Unsigned adder

Just like when it's done manually, a binary addition can be done using a "full adder" circuit, which computes the sum s(j) and the output carry c(j) out of the operands a(j) and b(j) and the input carry c(j-1).

Addition between two vectors of n bits is done cascading n full adders.

The carry propagates through the full adders, hence the name "ripple carry adder".

ripple_carry_adder.png

The truth table of the full adder is as follows:

Screenshot from 2019-08-29 11-07-55.png

This truth table correspondes to the following circuit:

full_adder.jpg

 

Signed adder

When interpreted as an unsigned integer, the value of an n-bit vector v is:

- unsigned_value(v) = sum(2i, i in 0 to (n-1))
With n bits one can represent values 0 to 2n-1 - 1

One can notice that u = v + not(v) + 1 = 0 (thanks to the overflow), hence the 2's complement encoding for signed integers:

-v = (not(v) + 1)

and more generally:

signed_value(v) = -(2n-1).v(n-1) + sum(2i, i in 0 to (n-2))
With n bits one can represent values -2n-2 to 2n-2 - 1


The beauty of it is that the same full adder circuit works for 2's complement signed values as well.

 

Substractor

As a consequence, an adder can be transformed into a subtractor very easilly.

Instead of adding b, just add -b, i.e. (not(b) + 1), i.e. negate bits from b and use 1 for the input carry.

ripple_carry_adder_subtracter.png

 

Mapping on Xilinx UltraScale+ architecture

It is possible to insert a AND gate in the circuit without modifying the functionality:

  • If (a xor b) = 0, then the AND drives the signal (a and b) to the input of the OR and the functionality is not modified
  • If (a xor b) = 1, then the AND drives 0 to the input of the OR, but in that case (a and b) also is 0 so the functionality is not modified

full_adder_modified_1.jpg

Adding this AND gate lets a multiplexer appear.

The circuit can then be mapped onto the resources of the CLB:

  • The XOR between a and b is computed with one LUT5
  • The AND between a and b is computed with the other LUT5
  • The second XOR is mapped onto a XORCY
  • The multiplexer is mapped onto a MUXCY

full_adder_modified_2.jpg

When (a XOR b) is 0, then a is the same as b, hence (a AND b) = a = b, so the AND between a and b can be simplified (i.e. removed)

full_adder_modified_3.jpg

 

Ternary adders

Apparently, it is possible to build ternary adders with the same resources as for regular binary adders.

The principle is as follows:
- some logic is applied in a bitwise fashion to the 3 vectors and compresses the 3 input bits into only 2 bits
- these 2 intermediary vectors are added together using a regular binary adder
- The circuits needs more logic, but still fits into 2 LUT5, and 1 carry-chain

ternary_adder.jpg

Apparently Vivado can synthesize this very well, so we should try to use it.

Here are some pointers I found:
- https://forums.xilinx.com/t5/Spartan-Family-FPGAs-Archived/Is-the-claim-in-UG384-about-6LUT-based-ternary-adders-correct/td-p/185634
- http://myfpgablog.blogspot.com/2011/10/ternary-adder-in-lut62.html
- https://patentimages.storage.googleapis.com/07/35/23/21e173f32247b2/US7274211B1.pdf
- https://pdfs.semanticscholar.org/0798/96e0061bfde69c78573fb8c326441c00a159.pdf

 

Synthesis experiments

Coding styles

For loop

    gen_sum : process(term_vector)
        variable    accumulator : std_logic_vector((SUM_WIDTH - 1) downto 0);
    begin
        accumulator := std_logic_vector(to_signed(0, accumulator'length));
        for TERM_INDEX in 0 to (NB_TERMS - 1) loop
            accumulator := std_logic_vector(signed(accumulator) + signed(term_vector(TERM_INDEX)));
        end loop;
        sum <= accumulator;
    end process gen_sum;

 

Explicit binary adder tree using a recursive function

    function binary_adder_tree
    (
        input_term_vector   : t_term_vector
    )
    return std_logic_vector is
        constant    N                           : natural                       := input_term_vector'length;
        constant    term_vector                 : t_term_vector(0 to (N - 1))   := input_term_vector;
        constant    LEFT_TREE_N                 : natural                       := ((N + 1) / 2);
        constant    RIGHT_TREE_N                : natural                       := (N - LEFT_TREE_N);
        constant    LEFT_TREE_LOW_INDEX         : natural                       := 0;
        constant    LEFT_TREE_HIGH_INDEX        : natural                       := (LEFT_TREE_LOW_INDEX + LEFT_TREE_N - 1);
        constant    RIGHT_TREE_LOW_INDEX        : natural                       := (LEFT_TREE_HIGH_INDEX + 1);
        constant    RIGHT_TREE_HIGH_INDEX       : natural                       := (RIGHT_TREE_LOW_INDEX + RIGHT_TREE_N - 1);
        constant    SUM_WIDTH                   : natural                       := (TERM_WIDTH + syn_functions_pkg.ceil_log2(N));
    begin
        if (N = 1) then
            return term_vector(0);
        else
            return  std_logic_vector(
                            resize(signed(binary_adder_tree(term_vector(LEFT_TREE_LOW_INDEX to LEFT_TREE_HIGH_INDEX))), SUM_WIDTH)
                        +   resize(signed(binary_adder_tree(term_vector(RIGHT_TREE_LOW_INDEX to RIGHT_TREE_HIGH_INDEX))), SUM_WIDTH)
                    );
        end if;
    end function binary_adder_tree;

    sum <=          binary_adder_tree(term_vector);

 

Explicit binary adder tree using a for generate

    constant    BINARY_ADDER_TREE_NB_TERMS      : positive  := (2 ** syn_functions_pkg.ceil_log2(NB_TERMS));
    constant    BINARY_ADDER_TREE_NB_ADDERS     : positive  := (BINARY_ADDER_TREE_NB_TERMS - 1);
    constant    BINARY_ADDER_TREE_NB_NODES      : positive  := (BINARY_ADDER_TREE_NB_ADDERS + BINARY_ADDER_TREE_NB_TERMS);
    signal      binary_adder_tree               : t_resized_term_vector(1 to BINARY_ADDER_TREE_NB_NODES);

    gen_binary_adder_tree : for NODE_INDEX in BINARY_ADDER_TREE_NB_NODES downto 1 generate
    begin
    
        gen_leave : if (NODE_INDEX > BINARY_ADDER_TREE_NB_ADDERS) generate
        begin
    
            gen_useless_leave : if (NODE_INDEX > (BINARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate
            begin
                binary_adder_tree(NODE_INDEX)   <=          std_logic_vector(to_signed(0, SUM_WIDTH));
            end generate gen_useless_leave;
    
            gen_usefull_leave : if (NODE_INDEX <= (BINARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate
            begin
                binary_adder_tree(NODE_INDEX)   <=          std_logic_vector(resize(signed(term_vector(NODE_INDEX - BINARY_ADDER_TREE_NB_ADDERS - 1)), SUM_WIDTH));
            end generate gen_usefull_leave;
    
        end generate gen_leave;
    
        gen_adder : if (NODE_INDEX <= BINARY_ADDER_TREE_NB_ADDERS) generate
        begin
            binary_adder_tree(NODE_INDEX)   <=          std_logic_vector(signed(binary_adder_tree(2 * NODE_INDEX)) + signed(binary_adder_tree((2 * NODE_INDEX) + 1)));
        end generate gen_adder;
    
    end generate gen_binary_adder_tree;
    
    sum <=          binary_adder_tree(1);

 

Explicit ternary adder tree using a recursive function

    function ternary_adder_tree
    (
        input_term_vector   : t_term_vector
    )
    return std_logic_vector is
        constant    N                           : natural                       := input_term_vector'length;
        constant    term_vector                 : t_term_vector(0 to (N - 1))   := input_term_vector;
        constant    LEFT_TREE_N                 : natural                       := ((N + 2) / 3);
        constant    MIDDLE_TREE_N               : natural                       := (((N - LEFT_TREE_N) + 1) / 2);
        constant    RIGHT_TREE_N                : natural                       := (N - LEFT_TREE_N - MIDDLE_TREE_N);
        constant    LEFT_TREE_LOW_INDEX         : natural                       := 0;
        constant    LEFT_TREE_HIGH_INDEX        : natural                       := (LEFT_TREE_LOW_INDEX + LEFT_TREE_N - 1);
        constant    MIDDLE_TREE_LOW_INDEX       : natural                       := (LEFT_TREE_HIGH_INDEX + 1);
        constant    MIDDLE_TREE_HIGH_INDEX      : natural                       := (MIDDLE_TREE_LOW_INDEX + MIDDLE_TREE_N - 1);
        constant    RIGHT_TREE_LOW_INDEX        : natural                       := (MIDDLE_TREE_HIGH_INDEX + 1);
        constant    RIGHT_TREE_HIGH_INDEX       : natural                       := (RIGHT_TREE_LOW_INDEX + RIGHT_TREE_N - 1);
    begin
        if (N = 1) then
            return  std_logic_vector(
                            resize(signed(term_vector(0)), SUM_WIDTH)
                    );
        elsif (N = 2) then
            return  std_logic_vector(
                            resize(signed(term_vector(0)), SUM_WIDTH)
                        +   resize(signed(term_vector(1)), SUM_WIDTH)
                    );
        else
            return  std_logic_vector(
                            resize(signed(ternary_adder_tree(term_vector(LEFT_TREE_LOW_INDEX   to LEFT_TREE_HIGH_INDEX  ))), SUM_WIDTH)
                        +   resize(signed(ternary_adder_tree(term_vector(MIDDLE_TREE_LOW_INDEX to MIDDLE_TREE_HIGH_INDEX))), SUM_WIDTH)
                        +   resize(signed(ternary_adder_tree(term_vector(RIGHT_TREE_LOW_INDEX  to RIGHT_TREE_HIGH_INDEX ))), SUM_WIDTH)
                    );
        end if;
    end function ternary_adder_tree;

    sum <=          ternary_adder_tree(term_vector);

 

Explicit ternary adder tree using a for generate

    constant    TERNARY_ADDER_TREE_NB_TERMS     : positive  := (3 ** syn_functions_pkg.ceil_log3(NB_TERMS));
    constant    TERNARY_ADDER_TREE_NB_ADDERS    : positive  := ((TERNARY_ADDER_TREE_NB_TERMS - 1) / 2);
    constant    TERNARY_ADDER_TREE_NB_NODES     : positive  := (TERNARY_ADDER_TREE_NB_ADDERS + TERNARY_ADDER_TREE_NB_TERMS);
    signal      ternary_adder_tree              : t_resized_term_vector(1 to TERNARY_ADDER_TREE_NB_NODES);

    gen_ternary_adder_tree : for NODE_INDEX in TERNARY_ADDER_TREE_NB_NODES downto 1 generate
    begin
    
        gen_leave : if (NODE_INDEX > TERNARY_ADDER_TREE_NB_ADDERS) generate
        begin
    
            gen_useless_leave : if (NODE_INDEX > (TERNARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate
            begin
                ternary_adder_tree(NODE_INDEX)  <=          std_logic_vector(to_signed(0, SUM_WIDTH));
            end generate gen_useless_leave;
    
            gen_usefull_leave : if (NODE_INDEX <= (TERNARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate
            begin
                ternary_adder_tree(NODE_INDEX)  <=          std_logic_vector(resize(signed(term_vector(NODE_INDEX - TERNARY_ADDER_TREE_NB_ADDERS - 1)), SUM_WIDTH));
            end generate gen_usefull_leave;
    
        end generate gen_leave;
    
        gen_adder : if (NODE_INDEX <= TERNARY_ADDER_TREE_NB_ADDERS) generate
        begin
            ternary_adder_tree(NODE_INDEX)  <=          std_logic_vector(signed(ternary_adder_tree((3 * NODE_INDEX) - 1)) + signed(ternary_adder_tree(3 * NODE_INDEX)) + signed(ternary_adder_tree((3 * NODE_INDEX) + 1)));
        end generate gen_adder;
    
    end generate gen_ternary_adder_tree;
    
    sum <=          ternary_adder_tree(1);

 

Complete source code (5 coding styles, testbench, vivado project and constraints) can be found attached to is page.

 

Synthesis results

We tried each of the 5 coding styles.

We tried the following configurations:

  • add 16 8-bit vectors together
  • add 8 16-bit vectors together
  • add 12 16-bit vectors together
  • add 16 16-bit vectors together

The target was a VU9P (VCU1525), the target frequency was 500 MHz.

ternary_rec.png

ternary_for_gen.png

for_loop.png

binary_rec.png

binary_for_gen.png

 

Analysis

It is difficult to guess what has been synthesized.

Study of the schematics is too complicated.

The explicit binary adder tree should be somehow better, because the intermediate signals are on their very size instead of being resized to up to the final width.
However, this coding style actually leads to the worst results.
I do not have any explanation for this.

For some reason, with small input vectors, the explicit ternary adder tree using a recursive function implements the circuit in a different way.
I do not have any explanation for this.

The number of resources used and the timings are roughly the same, whatever the coding style used.

The timings are also roughly the same, whatever the coding style used.

The ranking of the different coding styles is not always the same, it differs for each configuration.

As a consequence, we will use the for loop coding style, for its simplicity and conciseness.

View solution in original post

0 Kudos
7 Replies
Highlighted
Scholar
Scholar
1,539 Views
Registered: ‎08-01-2012

You can calculate the depth by doing log2(X), where X is the number of inputs

then for each layer you need Y/2 adders, where Y is the number of inputs to the stage. 

The easiest way is just a to declare an array that is log2(X) x X values. Half of these will just be unconnected and unused. Make everything the output width, and the synth tool should throw away any parts that are constant zero.

Here is a log2 depth adder tree in VHDL, using VHDL 2008:

type us_array_t is array(natural range <>) of unsigned;

generic(
  NIPs    : natural;
  W       : natural;
  )
port (
  ip  : in us_array_t(0 to NIPs-1)(W-1 downto 0)
  ....

  op  : out unsigned;
)

type tree_t is array(natural range <>) of us_array_t;

constant N_LEVELS : natural := log2(NIPs);

-- Use array index to work out number of adders needed
signal stages : tree_t(N_LEVELS downto 0)(0 to NIps-1)(op'range);  -- make everything as big as output

....

stage_gen : for level in stages'range loop
  levels_gen : if level = N_LEVELS generate
    ip_gen : for i ip'range loop
      stages(level)(i) <= resize(ip(i), op'length);
    end generate ip_gen;

  else generate
    constant N_ADDS : natural := 2**level;
  begin
  
    adder_gen : for i in 0 to N_ADDS-1 generate
      process(clk)
      begin
        if rising_edge(clk) then
          stages(level)(i) <= resize(stages(level+1)(i*2), op'length) + resize(stages(level+1)(i*2+1), op'length);
        end if;
      end process;
    end adder_gen;

  end generate levels_gen;
end generate stage_gen;
  
op    <= stages(0)(0);

 

0 Kudos
Highlighted
Mentor
Mentor
1,503 Views
Registered: ‎02-24-2014

There is no known method to get Vivado synthesis to build a balanced tree except to present it with an elaborated structure which is already balanced.   @richardhead has presented a very nice parameterized VHDL2008 algorithm for a binary tree, albeit it depends on the tools to trim the adder lengths in the upper branches of the tree.    I've implemented a recursive ternary adder in VHDL93 which works under VIvado, and it turns out that balancing a ternary tree has different solutions, depending on whether you want minimum logic or minimum latency.    I ended up writing a python script to generate the optimal ternary branch sizes for solutions up to 4096 inputs, and above that size,  my code is only approximately optimal.  

My solution builds a bit compressor (also known as popcount) out of ternary adders, and this can be applied to arrays of unsigned values in a horizontal fashion, then summed again with a ternary tree at the base.    The result is a minimal depth tree when selecting minimal latency, or a slightly deeper tree when selecting minimal logic.  One part of the problem when building trees like this is that when balancing latency of the tree with other logic,  it's super handy to have a way to get the tree latency as a constant in the higher level code that instantiates the tree.   I solved this by writing a function in a package that calculates the latency given the tree input size.

I'm not able to post the ternary tree code right now, so bug me about it, and I'll see if I can post it or stick it on github. 

John

  

Don't forget to close a thread when possible by accepting a post as a solution.
0 Kudos
Highlighted
Adventurer
Adventurer
1,495 Views
Registered: ‎02-24-2017

Hi @richardhead  , thanks a lot for your answer, this is nice! 

I was about to ask follow-up questions, but @jmcclusk  answered it all :)

I will accept this post as a solution, thanks a lot to you both.

- Julien

0 Kudos
Highlighted
Adventurer
Adventurer
1,492 Views
Registered: ‎02-24-2017

@jmcclusk thanks for your post, it is very interesting.

I would definitely appreciate if you could share the code.

Also, could you go further into details about "balancing a ternary tree has different solutions, depending on whether you want minimum logic or minimum latency." and "The result is a minimal depth tree when selecting minimal latency, or a slightly deeper tree when selecting minimal logic."?

Many thanks!

0 Kudos
Highlighted
Adventurer
Adventurer
1,491 Views
Registered: ‎02-24-2017
"One part of the problem when building trees like this is that when balancing latency of the tree with other logic, it's super handy to have a way to get the tree latency as a constant in the higher level code that instantiates the tree. I solved this by writing a function in a package that calculates the latency given the tree input size."

Well played, that's what I do for every generic module I write.
Works as well when the size of some output ports is a complex expression that depends on many generic parameters for instance.
0 Kudos
Highlighted
Adventurer
Adventurer
1,321 Views
Registered: ‎02-24-2017

Hi have pushed further my investigations with adder trees.
Here are the results. (I will add the code as an attachement to the page.)

 

Theory

Unsigned adder

Just like when it's done manually, a binary addition can be done using a "full adder" circuit, which computes the sum s(j) and the output carry c(j) out of the operands a(j) and b(j) and the input carry c(j-1).

Addition between two vectors of n bits is done cascading n full adders.

The carry propagates through the full adders, hence the name "ripple carry adder".

ripple_carry_adder.png

The truth table of the full adder is as follows:

Screenshot from 2019-08-29 11-07-55.png

This truth table correspondes to the following circuit:

full_adder.jpg

 

Signed adder

When interpreted as an unsigned integer, the value of an n-bit vector v is:

- unsigned_value(v) = sum(2i, i in 0 to (n-1))
With n bits one can represent values 0 to 2n-1 - 1

One can notice that u = v + not(v) + 1 = 0 (thanks to the overflow), hence the 2's complement encoding for signed integers:

-v = (not(v) + 1)

and more generally:

signed_value(v) = -(2n-1).v(n-1) + sum(2i, i in 0 to (n-2))
With n bits one can represent values -2n-2 to 2n-2 - 1


The beauty of it is that the same full adder circuit works for 2's complement signed values as well.

 

Substractor

As a consequence, an adder can be transformed into a subtractor very easilly.

Instead of adding b, just add -b, i.e. (not(b) + 1), i.e. negate bits from b and use 1 for the input carry.

ripple_carry_adder_subtracter.png

 

Mapping on Xilinx UltraScale+ architecture

It is possible to insert a AND gate in the circuit without modifying the functionality:

  • If (a xor b) = 0, then the AND drives the signal (a and b) to the input of the OR and the functionality is not modified
  • If (a xor b) = 1, then the AND drives 0 to the input of the OR, but in that case (a and b) also is 0 so the functionality is not modified

full_adder_modified_1.jpg

Adding this AND gate lets a multiplexer appear.

The circuit can then be mapped onto the resources of the CLB:

  • The XOR between a and b is computed with one LUT5
  • The AND between a and b is computed with the other LUT5
  • The second XOR is mapped onto a XORCY
  • The multiplexer is mapped onto a MUXCY

full_adder_modified_2.jpg

When (a XOR b) is 0, then a is the same as b, hence (a AND b) = a = b, so the AND between a and b can be simplified (i.e. removed)

full_adder_modified_3.jpg

 

Ternary adders

Apparently, it is possible to build ternary adders with the same resources as for regular binary adders.

The principle is as follows:
- some logic is applied in a bitwise fashion to the 3 vectors and compresses the 3 input bits into only 2 bits
- these 2 intermediary vectors are added together using a regular binary adder
- The circuits needs more logic, but still fits into 2 LUT5, and 1 carry-chain

ternary_adder.jpg

Apparently Vivado can synthesize this very well, so we should try to use it.

Here are some pointers I found:
- https://forums.xilinx.com/t5/Spartan-Family-FPGAs-Archived/Is-the-claim-in-UG384-about-6LUT-based-ternary-adders-correct/td-p/185634
- http://myfpgablog.blogspot.com/2011/10/ternary-adder-in-lut62.html
- https://patentimages.storage.googleapis.com/07/35/23/21e173f32247b2/US7274211B1.pdf
- https://pdfs.semanticscholar.org/0798/96e0061bfde69c78573fb8c326441c00a159.pdf

 

Synthesis experiments

Coding styles

For loop

    gen_sum : process(term_vector)
        variable    accumulator : std_logic_vector((SUM_WIDTH - 1) downto 0);
    begin
        accumulator := std_logic_vector(to_signed(0, accumulator'length));
        for TERM_INDEX in 0 to (NB_TERMS - 1) loop
            accumulator := std_logic_vector(signed(accumulator) + signed(term_vector(TERM_INDEX)));
        end loop;
        sum <= accumulator;
    end process gen_sum;

 

Explicit binary adder tree using a recursive function

    function binary_adder_tree
    (
        input_term_vector   : t_term_vector
    )
    return std_logic_vector is
        constant    N                           : natural                       := input_term_vector'length;
        constant    term_vector                 : t_term_vector(0 to (N - 1))   := input_term_vector;
        constant    LEFT_TREE_N                 : natural                       := ((N + 1) / 2);
        constant    RIGHT_TREE_N                : natural                       := (N - LEFT_TREE_N);
        constant    LEFT_TREE_LOW_INDEX         : natural                       := 0;
        constant    LEFT_TREE_HIGH_INDEX        : natural                       := (LEFT_TREE_LOW_INDEX + LEFT_TREE_N - 1);
        constant    RIGHT_TREE_LOW_INDEX        : natural                       := (LEFT_TREE_HIGH_INDEX + 1);
        constant    RIGHT_TREE_HIGH_INDEX       : natural                       := (RIGHT_TREE_LOW_INDEX + RIGHT_TREE_N - 1);
        constant    SUM_WIDTH                   : natural                       := (TERM_WIDTH + syn_functions_pkg.ceil_log2(N));
    begin
        if (N = 1) then
            return term_vector(0);
        else
            return  std_logic_vector(
                            resize(signed(binary_adder_tree(term_vector(LEFT_TREE_LOW_INDEX to LEFT_TREE_HIGH_INDEX))), SUM_WIDTH)
                        +   resize(signed(binary_adder_tree(term_vector(RIGHT_TREE_LOW_INDEX to RIGHT_TREE_HIGH_INDEX))), SUM_WIDTH)
                    );
        end if;
    end function binary_adder_tree;

    sum <=          binary_adder_tree(term_vector);

 

Explicit binary adder tree using a for generate

    constant    BINARY_ADDER_TREE_NB_TERMS      : positive  := (2 ** syn_functions_pkg.ceil_log2(NB_TERMS));
    constant    BINARY_ADDER_TREE_NB_ADDERS     : positive  := (BINARY_ADDER_TREE_NB_TERMS - 1);
    constant    BINARY_ADDER_TREE_NB_NODES      : positive  := (BINARY_ADDER_TREE_NB_ADDERS + BINARY_ADDER_TREE_NB_TERMS);
    signal      binary_adder_tree               : t_resized_term_vector(1 to BINARY_ADDER_TREE_NB_NODES);

    gen_binary_adder_tree : for NODE_INDEX in BINARY_ADDER_TREE_NB_NODES downto 1 generate
    begin
    
        gen_leave : if (NODE_INDEX > BINARY_ADDER_TREE_NB_ADDERS) generate
        begin
    
            gen_useless_leave : if (NODE_INDEX > (BINARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate
            begin
                binary_adder_tree(NODE_INDEX)   <=          std_logic_vector(to_signed(0, SUM_WIDTH));
            end generate gen_useless_leave;
    
            gen_usefull_leave : if (NODE_INDEX <= (BINARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate
            begin
                binary_adder_tree(NODE_INDEX)   <=          std_logic_vector(resize(signed(term_vector(NODE_INDEX - BINARY_ADDER_TREE_NB_ADDERS - 1)), SUM_WIDTH));
            end generate gen_usefull_leave;
    
        end generate gen_leave;
    
        gen_adder : if (NODE_INDEX <= BINARY_ADDER_TREE_NB_ADDERS) generate
        begin
            binary_adder_tree(NODE_INDEX)   <=          std_logic_vector(signed(binary_adder_tree(2 * NODE_INDEX)) + signed(binary_adder_tree((2 * NODE_INDEX) + 1)));
        end generate gen_adder;
    
    end generate gen_binary_adder_tree;
    
    sum <=          binary_adder_tree(1);

 

Explicit ternary adder tree using a recursive function

    function ternary_adder_tree
    (
        input_term_vector   : t_term_vector
    )
    return std_logic_vector is
        constant    N                           : natural                       := input_term_vector'length;
        constant    term_vector                 : t_term_vector(0 to (N - 1))   := input_term_vector;
        constant    LEFT_TREE_N                 : natural                       := ((N + 2) / 3);
        constant    MIDDLE_TREE_N               : natural                       := (((N - LEFT_TREE_N) + 1) / 2);
        constant    RIGHT_TREE_N                : natural                       := (N - LEFT_TREE_N - MIDDLE_TREE_N);
        constant    LEFT_TREE_LOW_INDEX         : natural                       := 0;
        constant    LEFT_TREE_HIGH_INDEX        : natural                       := (LEFT_TREE_LOW_INDEX + LEFT_TREE_N - 1);
        constant    MIDDLE_TREE_LOW_INDEX       : natural                       := (LEFT_TREE_HIGH_INDEX + 1);
        constant    MIDDLE_TREE_HIGH_INDEX      : natural                       := (MIDDLE_TREE_LOW_INDEX + MIDDLE_TREE_N - 1);
        constant    RIGHT_TREE_LOW_INDEX        : natural                       := (MIDDLE_TREE_HIGH_INDEX + 1);
        constant    RIGHT_TREE_HIGH_INDEX       : natural                       := (RIGHT_TREE_LOW_INDEX + RIGHT_TREE_N - 1);
    begin
        if (N = 1) then
            return  std_logic_vector(
                            resize(signed(term_vector(0)), SUM_WIDTH)
                    );
        elsif (N = 2) then
            return  std_logic_vector(
                            resize(signed(term_vector(0)), SUM_WIDTH)
                        +   resize(signed(term_vector(1)), SUM_WIDTH)
                    );
        else
            return  std_logic_vector(
                            resize(signed(ternary_adder_tree(term_vector(LEFT_TREE_LOW_INDEX   to LEFT_TREE_HIGH_INDEX  ))), SUM_WIDTH)
                        +   resize(signed(ternary_adder_tree(term_vector(MIDDLE_TREE_LOW_INDEX to MIDDLE_TREE_HIGH_INDEX))), SUM_WIDTH)
                        +   resize(signed(ternary_adder_tree(term_vector(RIGHT_TREE_LOW_INDEX  to RIGHT_TREE_HIGH_INDEX ))), SUM_WIDTH)
                    );
        end if;
    end function ternary_adder_tree;

    sum <=          ternary_adder_tree(term_vector);

 

Explicit ternary adder tree using a for generate

    constant    TERNARY_ADDER_TREE_NB_TERMS     : positive  := (3 ** syn_functions_pkg.ceil_log3(NB_TERMS));
    constant    TERNARY_ADDER_TREE_NB_ADDERS    : positive  := ((TERNARY_ADDER_TREE_NB_TERMS - 1) / 2);
    constant    TERNARY_ADDER_TREE_NB_NODES     : positive  := (TERNARY_ADDER_TREE_NB_ADDERS + TERNARY_ADDER_TREE_NB_TERMS);
    signal      ternary_adder_tree              : t_resized_term_vector(1 to TERNARY_ADDER_TREE_NB_NODES);

    gen_ternary_adder_tree : for NODE_INDEX in TERNARY_ADDER_TREE_NB_NODES downto 1 generate
    begin
    
        gen_leave : if (NODE_INDEX > TERNARY_ADDER_TREE_NB_ADDERS) generate
        begin
    
            gen_useless_leave : if (NODE_INDEX > (TERNARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate
            begin
                ternary_adder_tree(NODE_INDEX)  <=          std_logic_vector(to_signed(0, SUM_WIDTH));
            end generate gen_useless_leave;
    
            gen_usefull_leave : if (NODE_INDEX <= (TERNARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate
            begin
                ternary_adder_tree(NODE_INDEX)  <=          std_logic_vector(resize(signed(term_vector(NODE_INDEX - TERNARY_ADDER_TREE_NB_ADDERS - 1)), SUM_WIDTH));
            end generate gen_usefull_leave;
    
        end generate gen_leave;
    
        gen_adder : if (NODE_INDEX <= TERNARY_ADDER_TREE_NB_ADDERS) generate
        begin
            ternary_adder_tree(NODE_INDEX)  <=          std_logic_vector(signed(ternary_adder_tree((3 * NODE_INDEX) - 1)) + signed(ternary_adder_tree(3 * NODE_INDEX)) + signed(ternary_adder_tree((3 * NODE_INDEX) + 1)));
        end generate gen_adder;
    
    end generate gen_ternary_adder_tree;
    
    sum <=          ternary_adder_tree(1);

 

Complete source code (5 coding styles, testbench, vivado project and constraints) can be found attached to is page.

 

Synthesis results

We tried each of the 5 coding styles.

We tried the following configurations:

  • add 16 8-bit vectors together
  • add 8 16-bit vectors together
  • add 12 16-bit vectors together
  • add 16 16-bit vectors together

The target was a VU9P (VCU1525), the target frequency was 500 MHz.

ternary_rec.png

ternary_for_gen.png

for_loop.png

binary_rec.png

binary_for_gen.png

 

Analysis

It is difficult to guess what has been synthesized.

Study of the schematics is too complicated.

The explicit binary adder tree should be somehow better, because the intermediate signals are on their very size instead of being resized to up to the final width.
However, this coding style actually leads to the worst results.
I do not have any explanation for this.

For some reason, with small input vectors, the explicit ternary adder tree using a recursive function implements the circuit in a different way.
I do not have any explanation for this.

The number of resources used and the timings are roughly the same, whatever the coding style used.

The timings are also roughly the same, whatever the coding style used.

The ranking of the different coding styles is not always the same, it differs for each configuration.

As a consequence, we will use the for loop coding style, for its simplicity and conciseness.

View solution in original post

0 Kudos
Highlighted
Teacher
Teacher
1,256 Views
Registered: ‎07-09-2009

Looks like none of these methods are using the fast carry option of the CLB / slice chain.

You imply you have not seen how the tools actualy impimented the design,

Did you have timming constraints on the design, registerd in and out ?

My suggestion is the tools are just routing the desing as you describe them, and then as they have meet your timming score , they stop.

Do do any meeinginfull comparison you have to have registerd inputs and outputs, and then change the constraint till you have the maximum speed. I'd suggest a DDS type adder , where the output is fed back to one side of the adder tree , is a possible solutoin, Multiple registers to output the result to a pin, else the Tout time will dominate,

Did you compaer the A <= B +C adder, let the tools free to impliment, that I bet is the fastest you will get.

Its like hand coding an adder in C, Hand routing pure adders is just not effective, the tools will beat you every time.

 

 

 

 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos