Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- Community Forums
- :
- Forums
- :
- Vivado RTL Development
- :
- Synthesis
- :
- force vivado to implement an adder tree

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted

jreinauld

Adventurer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-30-2019 12:22 AM

1,555 Views

Registered:
02-24-2017

Hi all,

I am currently writing a generic adder.

This module has a generic number of input vectors, which all have the same generic width, and it computes the sum of all these vectors.

Since the number of terms is generic, I coded this using a variable as an accumulator and a for loop.

I am worried that Vivado is going to unroll the loop in a 'naive' way and build an adder tree that always grows on the same side, resulting in a depth = (number of terms).

Instead, I would like Vivado to balance the adder tree, resulting in depth = log2(number of terms).

(I even discovered that Vivado can synthesize ternary adders, so I could possibly reach depth = log2(number of terms)

What VHDL coding style, or attribute, (or any other mean) can I use to force the synthesizer or the mapper to implement a balanced adder tree?

Thanks,

- Julien

1 Solution

Accepted Solutions

Highlighted

jreinauld

Adventurer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-29-2019 02:59 AM

1,320 Views

Registered:
02-24-2017

Hi have pushed further my investigations with adder trees.

Here are the results. (I will add the code as an attachement to the page.)

**Theory**

**Unsigned adder**

Just like when it's done manually, a binary addition can be done using a "full adder" circuit, which computes the sum s(j) and the output carry c(j) out of the operands a(j) and b(j) and the input carry c(j-1).

Addition between two vectors of n bits is done cascading n full adders.

The carry propagates through the full adders, hence the name "ripple carry adder".

￼

The truth table of the full adder is as follows:

This truth table correspondes to the following circuit:

**Signed adder**

When interpreted as an unsigned integer, the value of an n-bit vector v is:

- unsigned_value(v) = sum(2i, i in 0 to (n-1))

With n bits one can represent values 0 to 2n-1 - 1

One can notice that u = v + not(v) + 1 = 0 (thanks to the overflow), hence the 2's complement encoding for signed integers:

-v = (not(v) + 1)

and more generally:

signed_value(v) = -(2n-1).v(n-1) + sum(2i, i in 0 to (n-2))

With n bits one can represent values -2n-2 to 2n-2 - 1

The beauty of it is that the same full adder circuit works for 2's complement signed values as well.

**Substractor**

As a consequence, an adder can be transformed into a subtractor very easilly.

Instead of adding b, just add -b, i.e. (not(b) + 1), i.e. negate bits from b and use 1 for the input carry.

￼

**Mapping on Xilinx UltraScale+ architecture**

It is possible to insert a AND gate in the circuit without modifying the functionality:

- If (a xor b) = 0, then the AND drives the signal (a and b) to the input of the OR and the functionality is not modified
- If (a xor b) = 1, then the AND drives 0 to the input of the OR, but in that case (a and b) also is 0 so the functionality is not modified

Adding this AND gate lets a multiplexer appear.

The circuit can then be mapped onto the resources of the CLB:

- The XOR between a and b is computed with one LUT5
- The AND between a and b is computed with the other LUT5
- The second XOR is mapped onto a XORCY
- The multiplexer is mapped onto a MUXCY

When (a XOR b) is 0, then a is the same as b, hence (a AND b) = a = b, so the AND between a and b can be simplified (i.e. removed)

**Ternary adders**

Apparently, it is possible to build ternary adders with the same resources as for regular binary adders.

The principle is as follows:

- some logic is applied in a bitwise fashion to the 3 vectors and compresses the 3 input bits into only 2 bits

- these 2 intermediary vectors are added together using a regular binary adder

- The circuits needs more logic, but still fits into 2 LUT5, and 1 carry-chain

￼

Apparently Vivado can synthesize this very well, so we should try to use it.

Here are some pointers I found:

- https://forums.xilinx.com/t5/Spartan-Family-FPGAs-Archived/Is-the-claim-in-UG384-about-6LUT-based-ternary-adders-correct/td-p/185634

- http://myfpgablog.blogspot.com/2011/10/ternary-adder-in-lut62.html

- https://patentimages.storage.googleapis.com/07/35/23/21e173f32247b2/US7274211B1.pdf

- https://pdfs.semanticscholar.org/0798/96e0061bfde69c78573fb8c326441c00a159.pdf

**Synthesis experiments**

**Coding styles**

**For loop**

gen_sum : process(term_vector) variable accumulator : std_logic_vector((SUM_WIDTH - 1) downto 0); begin accumulator := std_logic_vector(to_signed(0, accumulator'length)); for TERM_INDEX in 0 to (NB_TERMS - 1) loop accumulator := std_logic_vector(signed(accumulator) + signed(term_vector(TERM_INDEX))); end loop; sum <= accumulator; end process gen_sum;

**Explicit binary adder tree using a recursive function**

function binary_adder_tree ( input_term_vector : t_term_vector ) return std_logic_vector is constant N : natural := input_term_vector'length; constant term_vector : t_term_vector(0 to (N - 1)) := input_term_vector; constant LEFT_TREE_N : natural := ((N + 1) / 2); constant RIGHT_TREE_N : natural := (N - LEFT_TREE_N); constant LEFT_TREE_LOW_INDEX : natural := 0; constant LEFT_TREE_HIGH_INDEX : natural := (LEFT_TREE_LOW_INDEX + LEFT_TREE_N - 1); constant RIGHT_TREE_LOW_INDEX : natural := (LEFT_TREE_HIGH_INDEX + 1); constant RIGHT_TREE_HIGH_INDEX : natural := (RIGHT_TREE_LOW_INDEX + RIGHT_TREE_N - 1); constant SUM_WIDTH : natural := (TERM_WIDTH + syn_functions_pkg.ceil_log2(N)); begin if (N = 1) then return term_vector(0); else return std_logic_vector( resize(signed(binary_adder_tree(term_vector(LEFT_TREE_LOW_INDEX to LEFT_TREE_HIGH_INDEX))), SUM_WIDTH) + resize(signed(binary_adder_tree(term_vector(RIGHT_TREE_LOW_INDEX to RIGHT_TREE_HIGH_INDEX))), SUM_WIDTH) ); end if; end function binary_adder_tree; sum <= binary_adder_tree(term_vector);

**Explicit binary adder tree using a for generate**

constant BINARY_ADDER_TREE_NB_TERMS : positive := (2 ** syn_functions_pkg.ceil_log2(NB_TERMS)); constant BINARY_ADDER_TREE_NB_ADDERS : positive := (BINARY_ADDER_TREE_NB_TERMS - 1); constant BINARY_ADDER_TREE_NB_NODES : positive := (BINARY_ADDER_TREE_NB_ADDERS + BINARY_ADDER_TREE_NB_TERMS); signal binary_adder_tree : t_resized_term_vector(1 to BINARY_ADDER_TREE_NB_NODES); gen_binary_adder_tree : for NODE_INDEX in BINARY_ADDER_TREE_NB_NODES downto 1 generate begin gen_leave : if (NODE_INDEX > BINARY_ADDER_TREE_NB_ADDERS) generate begin gen_useless_leave : if (NODE_INDEX > (BINARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate begin binary_adder_tree(NODE_INDEX) <= std_logic_vector(to_signed(0, SUM_WIDTH)); end generate gen_useless_leave; gen_usefull_leave : if (NODE_INDEX <= (BINARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate begin binary_adder_tree(NODE_INDEX) <= std_logic_vector(resize(signed(term_vector(NODE_INDEX - BINARY_ADDER_TREE_NB_ADDERS - 1)), SUM_WIDTH)); end generate gen_usefull_leave; end generate gen_leave; gen_adder : if (NODE_INDEX <= BINARY_ADDER_TREE_NB_ADDERS) generate begin binary_adder_tree(NODE_INDEX) <= std_logic_vector(signed(binary_adder_tree(2 * NODE_INDEX)) + signed(binary_adder_tree((2 * NODE_INDEX) + 1))); end generate gen_adder; end generate gen_binary_adder_tree; sum <= binary_adder_tree(1);

**Explicit ternary adder tree using a recursive function**

function ternary_adder_tree ( input_term_vector : t_term_vector ) return std_logic_vector is constant N : natural := input_term_vector'length; constant term_vector : t_term_vector(0 to (N - 1)) := input_term_vector; constant LEFT_TREE_N : natural := ((N + 2) / 3); constant MIDDLE_TREE_N : natural := (((N - LEFT_TREE_N) + 1) / 2); constant RIGHT_TREE_N : natural := (N - LEFT_TREE_N - MIDDLE_TREE_N); constant LEFT_TREE_LOW_INDEX : natural := 0; constant LEFT_TREE_HIGH_INDEX : natural := (LEFT_TREE_LOW_INDEX + LEFT_TREE_N - 1); constant MIDDLE_TREE_LOW_INDEX : natural := (LEFT_TREE_HIGH_INDEX + 1); constant MIDDLE_TREE_HIGH_INDEX : natural := (MIDDLE_TREE_LOW_INDEX + MIDDLE_TREE_N - 1); constant RIGHT_TREE_LOW_INDEX : natural := (MIDDLE_TREE_HIGH_INDEX + 1); constant RIGHT_TREE_HIGH_INDEX : natural := (RIGHT_TREE_LOW_INDEX + RIGHT_TREE_N - 1); begin if (N = 1) then return std_logic_vector( resize(signed(term_vector(0)), SUM_WIDTH) ); elsif (N = 2) then return std_logic_vector( resize(signed(term_vector(0)), SUM_WIDTH) + resize(signed(term_vector(1)), SUM_WIDTH) ); else return std_logic_vector( resize(signed(ternary_adder_tree(term_vector(LEFT_TREE_LOW_INDEX to LEFT_TREE_HIGH_INDEX ))), SUM_WIDTH) + resize(signed(ternary_adder_tree(term_vector(MIDDLE_TREE_LOW_INDEX to MIDDLE_TREE_HIGH_INDEX))), SUM_WIDTH) + resize(signed(ternary_adder_tree(term_vector(RIGHT_TREE_LOW_INDEX to RIGHT_TREE_HIGH_INDEX ))), SUM_WIDTH) ); end if; end function ternary_adder_tree; sum <= ternary_adder_tree(term_vector);

**Explicit ternary adder tree using a for generate**

constant TERNARY_ADDER_TREE_NB_TERMS : positive := (3 ** syn_functions_pkg.ceil_log3(NB_TERMS)); constant TERNARY_ADDER_TREE_NB_ADDERS : positive := ((TERNARY_ADDER_TREE_NB_TERMS - 1) / 2); constant TERNARY_ADDER_TREE_NB_NODES : positive := (TERNARY_ADDER_TREE_NB_ADDERS + TERNARY_ADDER_TREE_NB_TERMS); signal ternary_adder_tree : t_resized_term_vector(1 to TERNARY_ADDER_TREE_NB_NODES); gen_ternary_adder_tree : for NODE_INDEX in TERNARY_ADDER_TREE_NB_NODES downto 1 generate begin gen_leave : if (NODE_INDEX > TERNARY_ADDER_TREE_NB_ADDERS) generate begin gen_useless_leave : if (NODE_INDEX > (TERNARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate begin ternary_adder_tree(NODE_INDEX) <= std_logic_vector(to_signed(0, SUM_WIDTH)); end generate gen_useless_leave; gen_usefull_leave : if (NODE_INDEX <= (TERNARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate begin ternary_adder_tree(NODE_INDEX) <= std_logic_vector(resize(signed(term_vector(NODE_INDEX - TERNARY_ADDER_TREE_NB_ADDERS - 1)), SUM_WIDTH)); end generate gen_usefull_leave; end generate gen_leave; gen_adder : if (NODE_INDEX <= TERNARY_ADDER_TREE_NB_ADDERS) generate begin ternary_adder_tree(NODE_INDEX) <= std_logic_vector(signed(ternary_adder_tree((3 * NODE_INDEX) - 1)) + signed(ternary_adder_tree(3 * NODE_INDEX)) + signed(ternary_adder_tree((3 * NODE_INDEX) + 1))); end generate gen_adder; end generate gen_ternary_adder_tree; sum <= ternary_adder_tree(1);

Complete source code (5 coding styles, testbench, vivado project and constraints) can be found attached to is page.

**Synthesis results**

We tried each of the 5 coding styles.

We tried the following configurations:

- add 16 8-bit vectors together
- add 8 16-bit vectors together
- add 12 16-bit vectors together
- add 16 16-bit vectors together

The target was a VU9P (VCU1525), the target frequency was 500 MHz.

**Analysis**

It is difficult to guess what has been synthesized.

Study of the schematics is too complicated.

The explicit binary adder tree should be somehow better, because the intermediate signals are on their very size instead of being resized to up to the final width.

However, this coding style actually leads to the worst results.

I do not have any explanation for this.

For some reason, with small input vectors, the explicit ternary adder tree using a recursive function implements the circuit in a different way.

I do not have any explanation for this.

The number of resources used and the timings are roughly the same, whatever the coding style used.

The timings are also roughly the same, whatever the coding style used.

The ranking of the different coding styles is not always the same, it differs for each configuration.

As a consequence, we will use the for loop coding style, for its simplicity and conciseness.

7 Replies

Highlighted

richardhead

Scholar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-30-2019 01:10 AM - edited 07-30-2019 01:13 AM

1,539 Views

Registered:
08-01-2012

You can calculate the depth by doing log2(X), where X is the number of inputs

then for each layer you need Y/2 adders, where Y is the number of inputs to the stage.

The easiest way is just a to declare an array that is log2(X) x X values. Half of these will just be unconnected and unused. Make everything the output width, and the synth tool should throw away any parts that are constant zero.

Here is a log2 depth adder tree in VHDL, using VHDL 2008:

type us_array_t is array(natural range <>) of unsigned; generic( NIPs : natural; W : natural; ) port ( ip : in us_array_t(0 to NIPs-1)(W-1 downto 0) .... op : out unsigned; ) type tree_t is array(natural range <>) of us_array_t; constant N_LEVELS : natural := log2(NIPs); -- Use array index to work out number of adders needed signal stages : tree_t(N_LEVELS downto 0)(0 to NIps-1)(op'range); -- make everything as big as output .... stage_gen : for level in stages'range loop levels_gen : if level = N_LEVELS generate ip_gen : for i ip'range loop stages(level)(i) <= resize(ip(i), op'length); end generate ip_gen; else generate constant N_ADDS : natural := 2**level; begin adder_gen : for i in 0 to N_ADDS-1 generate process(clk) begin if rising_edge(clk) then stages(level)(i) <= resize(stages(level+1)(i*2), op'length) + resize(stages(level+1)(i*2+1), op'length); end if; end process; end adder_gen; end generate levels_gen; end generate stage_gen; op <= stages(0)(0);

Highlighted

jmcclusk

Mentor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-30-2019 08:09 AM

1,503 Views

Registered:
02-24-2014

There is no known method to get Vivado synthesis to build a balanced tree except to present it with an elaborated structure which is already balanced. @richardhead has presented a very nice parameterized VHDL2008 algorithm for a binary tree, albeit it depends on the tools to trim the adder lengths in the upper branches of the tree. I've implemented a recursive ternary adder in VHDL93 which works under VIvado, and it turns out that balancing a ternary tree has different solutions, depending on whether you want minimum logic or minimum latency. I ended up writing a python script to generate the optimal ternary branch sizes for solutions up to 4096 inputs, and above that size, my code is only approximately optimal.

My solution builds a bit compressor (also known as popcount) out of ternary adders, and this can be applied to arrays of unsigned values in a horizontal fashion, then summed again with a ternary tree at the base. The result is a minimal depth tree when selecting minimal latency, or a slightly deeper tree when selecting minimal logic. One part of the problem when building trees like this is that when balancing latency of the tree with other logic, it's super handy to have a way to get the tree latency as a constant in the higher level code that instantiates the tree. I solved this by writing a function in a package that calculates the latency given the tree input size.

I'm not able to post the ternary tree code right now, so bug me about it, and I'll see if I can post it or stick it on github.

John

Don't forget to close a thread when possible by accepting a post as a solution.

Highlighted

jreinauld

Adventurer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-30-2019 09:15 AM

1,495 Views

Registered:
02-24-2017

Hi @richardhead , thanks a lot for your answer, this is nice!

I was about to ask follow-up questions, but @jmcclusk answered it all :)

I will accept this post as a solution, thanks a lot to you both.

- Julien

Highlighted

jreinauld

Adventurer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-30-2019 09:18 AM

1,492 Views

Registered:
02-24-2017

I would definitely appreciate if you could share the code.

Also, could you go further into details about "balancing a ternary tree has different solutions, depending on whether you want minimum logic or minimum latency." and "The result is a minimal depth tree when selecting minimal latency, or a slightly deeper tree when selecting minimal logic."?

Many thanks!

Highlighted
"One part of the problem when building trees like this is that when balancing latency of the tree with other logic, it's super handy to have a way to get the tree latency as a constant in the higher level code that instantiates the tree. I solved this by writing a function in a package that calculates the latency given the tree input size."

Well played, that's what I do for every generic module I write.

Works as well when the size of some output ports is a complex expression that depends on many generic parameters for instance.

jreinauld

Adventurer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-30-2019 09:19 AM

1,491 Views

Registered:
02-24-2017

Well played, that's what I do for every generic module I write.

Works as well when the size of some output ports is a complex expression that depends on many generic parameters for instance.

Highlighted

jreinauld

Adventurer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-29-2019 02:59 AM

1,321 Views

Registered:
02-24-2017

Hi have pushed further my investigations with adder trees.

Here are the results. (I will add the code as an attachement to the page.)

**Theory**

**Unsigned adder**

Just like when it's done manually, a binary addition can be done using a "full adder" circuit, which computes the sum s(j) and the output carry c(j) out of the operands a(j) and b(j) and the input carry c(j-1).

Addition between two vectors of n bits is done cascading n full adders.

The carry propagates through the full adders, hence the name "ripple carry adder".

￼

The truth table of the full adder is as follows:

This truth table correspondes to the following circuit:

**Signed adder**

When interpreted as an unsigned integer, the value of an n-bit vector v is:

- unsigned_value(v) = sum(2i, i in 0 to (n-1))

With n bits one can represent values 0 to 2n-1 - 1

One can notice that u = v + not(v) + 1 = 0 (thanks to the overflow), hence the 2's complement encoding for signed integers:

-v = (not(v) + 1)

and more generally:

signed_value(v) = -(2n-1).v(n-1) + sum(2i, i in 0 to (n-2))

With n bits one can represent values -2n-2 to 2n-2 - 1

The beauty of it is that the same full adder circuit works for 2's complement signed values as well.

**Substractor**

As a consequence, an adder can be transformed into a subtractor very easilly.

Instead of adding b, just add -b, i.e. (not(b) + 1), i.e. negate bits from b and use 1 for the input carry.

￼

**Mapping on Xilinx UltraScale+ architecture**

It is possible to insert a AND gate in the circuit without modifying the functionality:

- If (a xor b) = 0, then the AND drives the signal (a and b) to the input of the OR and the functionality is not modified
- If (a xor b) = 1, then the AND drives 0 to the input of the OR, but in that case (a and b) also is 0 so the functionality is not modified

Adding this AND gate lets a multiplexer appear.

The circuit can then be mapped onto the resources of the CLB:

- The XOR between a and b is computed with one LUT5
- The AND between a and b is computed with the other LUT5
- The second XOR is mapped onto a XORCY
- The multiplexer is mapped onto a MUXCY

When (a XOR b) is 0, then a is the same as b, hence (a AND b) = a = b, so the AND between a and b can be simplified (i.e. removed)

**Ternary adders**

Apparently, it is possible to build ternary adders with the same resources as for regular binary adders.

The principle is as follows:

- some logic is applied in a bitwise fashion to the 3 vectors and compresses the 3 input bits into only 2 bits

- these 2 intermediary vectors are added together using a regular binary adder

- The circuits needs more logic, but still fits into 2 LUT5, and 1 carry-chain

￼

Apparently Vivado can synthesize this very well, so we should try to use it.

Here are some pointers I found:

- https://forums.xilinx.com/t5/Spartan-Family-FPGAs-Archived/Is-the-claim-in-UG384-about-6LUT-based-ternary-adders-correct/td-p/185634

- http://myfpgablog.blogspot.com/2011/10/ternary-adder-in-lut62.html

- https://patentimages.storage.googleapis.com/07/35/23/21e173f32247b2/US7274211B1.pdf

- https://pdfs.semanticscholar.org/0798/96e0061bfde69c78573fb8c326441c00a159.pdf

**Synthesis experiments**

**Coding styles**

**For loop**

gen_sum : process(term_vector) variable accumulator : std_logic_vector((SUM_WIDTH - 1) downto 0); begin accumulator := std_logic_vector(to_signed(0, accumulator'length)); for TERM_INDEX in 0 to (NB_TERMS - 1) loop accumulator := std_logic_vector(signed(accumulator) + signed(term_vector(TERM_INDEX))); end loop; sum <= accumulator; end process gen_sum;

**Explicit binary adder tree using a recursive function**

function binary_adder_tree ( input_term_vector : t_term_vector ) return std_logic_vector is constant N : natural := input_term_vector'length; constant term_vector : t_term_vector(0 to (N - 1)) := input_term_vector; constant LEFT_TREE_N : natural := ((N + 1) / 2); constant RIGHT_TREE_N : natural := (N - LEFT_TREE_N); constant LEFT_TREE_LOW_INDEX : natural := 0; constant LEFT_TREE_HIGH_INDEX : natural := (LEFT_TREE_LOW_INDEX + LEFT_TREE_N - 1); constant RIGHT_TREE_LOW_INDEX : natural := (LEFT_TREE_HIGH_INDEX + 1); constant RIGHT_TREE_HIGH_INDEX : natural := (RIGHT_TREE_LOW_INDEX + RIGHT_TREE_N - 1); constant SUM_WIDTH : natural := (TERM_WIDTH + syn_functions_pkg.ceil_log2(N)); begin if (N = 1) then return term_vector(0); else return std_logic_vector( resize(signed(binary_adder_tree(term_vector(LEFT_TREE_LOW_INDEX to LEFT_TREE_HIGH_INDEX))), SUM_WIDTH) + resize(signed(binary_adder_tree(term_vector(RIGHT_TREE_LOW_INDEX to RIGHT_TREE_HIGH_INDEX))), SUM_WIDTH) ); end if; end function binary_adder_tree; sum <= binary_adder_tree(term_vector);

**Explicit binary adder tree using a for generate**

constant BINARY_ADDER_TREE_NB_TERMS : positive := (2 ** syn_functions_pkg.ceil_log2(NB_TERMS)); constant BINARY_ADDER_TREE_NB_ADDERS : positive := (BINARY_ADDER_TREE_NB_TERMS - 1); constant BINARY_ADDER_TREE_NB_NODES : positive := (BINARY_ADDER_TREE_NB_ADDERS + BINARY_ADDER_TREE_NB_TERMS); signal binary_adder_tree : t_resized_term_vector(1 to BINARY_ADDER_TREE_NB_NODES); gen_binary_adder_tree : for NODE_INDEX in BINARY_ADDER_TREE_NB_NODES downto 1 generate begin gen_leave : if (NODE_INDEX > BINARY_ADDER_TREE_NB_ADDERS) generate begin gen_useless_leave : if (NODE_INDEX > (BINARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate begin binary_adder_tree(NODE_INDEX) <= std_logic_vector(to_signed(0, SUM_WIDTH)); end generate gen_useless_leave; gen_usefull_leave : if (NODE_INDEX <= (BINARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate begin binary_adder_tree(NODE_INDEX) <= std_logic_vector(resize(signed(term_vector(NODE_INDEX - BINARY_ADDER_TREE_NB_ADDERS - 1)), SUM_WIDTH)); end generate gen_usefull_leave; end generate gen_leave; gen_adder : if (NODE_INDEX <= BINARY_ADDER_TREE_NB_ADDERS) generate begin binary_adder_tree(NODE_INDEX) <= std_logic_vector(signed(binary_adder_tree(2 * NODE_INDEX)) + signed(binary_adder_tree((2 * NODE_INDEX) + 1))); end generate gen_adder; end generate gen_binary_adder_tree; sum <= binary_adder_tree(1);

**Explicit ternary adder tree using a recursive function**

function ternary_adder_tree ( input_term_vector : t_term_vector ) return std_logic_vector is constant N : natural := input_term_vector'length; constant term_vector : t_term_vector(0 to (N - 1)) := input_term_vector; constant LEFT_TREE_N : natural := ((N + 2) / 3); constant MIDDLE_TREE_N : natural := (((N - LEFT_TREE_N) + 1) / 2); constant RIGHT_TREE_N : natural := (N - LEFT_TREE_N - MIDDLE_TREE_N); constant LEFT_TREE_LOW_INDEX : natural := 0; constant LEFT_TREE_HIGH_INDEX : natural := (LEFT_TREE_LOW_INDEX + LEFT_TREE_N - 1); constant MIDDLE_TREE_LOW_INDEX : natural := (LEFT_TREE_HIGH_INDEX + 1); constant MIDDLE_TREE_HIGH_INDEX : natural := (MIDDLE_TREE_LOW_INDEX + MIDDLE_TREE_N - 1); constant RIGHT_TREE_LOW_INDEX : natural := (MIDDLE_TREE_HIGH_INDEX + 1); constant RIGHT_TREE_HIGH_INDEX : natural := (RIGHT_TREE_LOW_INDEX + RIGHT_TREE_N - 1); begin if (N = 1) then return std_logic_vector( resize(signed(term_vector(0)), SUM_WIDTH) ); elsif (N = 2) then return std_logic_vector( resize(signed(term_vector(0)), SUM_WIDTH) + resize(signed(term_vector(1)), SUM_WIDTH) ); else return std_logic_vector( resize(signed(ternary_adder_tree(term_vector(LEFT_TREE_LOW_INDEX to LEFT_TREE_HIGH_INDEX ))), SUM_WIDTH) + resize(signed(ternary_adder_tree(term_vector(MIDDLE_TREE_LOW_INDEX to MIDDLE_TREE_HIGH_INDEX))), SUM_WIDTH) + resize(signed(ternary_adder_tree(term_vector(RIGHT_TREE_LOW_INDEX to RIGHT_TREE_HIGH_INDEX ))), SUM_WIDTH) ); end if; end function ternary_adder_tree; sum <= ternary_adder_tree(term_vector);

**Explicit ternary adder tree using a for generate**

constant TERNARY_ADDER_TREE_NB_TERMS : positive := (3 ** syn_functions_pkg.ceil_log3(NB_TERMS)); constant TERNARY_ADDER_TREE_NB_ADDERS : positive := ((TERNARY_ADDER_TREE_NB_TERMS - 1) / 2); constant TERNARY_ADDER_TREE_NB_NODES : positive := (TERNARY_ADDER_TREE_NB_ADDERS + TERNARY_ADDER_TREE_NB_TERMS); signal ternary_adder_tree : t_resized_term_vector(1 to TERNARY_ADDER_TREE_NB_NODES); gen_ternary_adder_tree : for NODE_INDEX in TERNARY_ADDER_TREE_NB_NODES downto 1 generate begin gen_leave : if (NODE_INDEX > TERNARY_ADDER_TREE_NB_ADDERS) generate begin gen_useless_leave : if (NODE_INDEX > (TERNARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate begin ternary_adder_tree(NODE_INDEX) <= std_logic_vector(to_signed(0, SUM_WIDTH)); end generate gen_useless_leave; gen_usefull_leave : if (NODE_INDEX <= (TERNARY_ADDER_TREE_NB_ADDERS + NB_TERMS)) generate begin ternary_adder_tree(NODE_INDEX) <= std_logic_vector(resize(signed(term_vector(NODE_INDEX - TERNARY_ADDER_TREE_NB_ADDERS - 1)), SUM_WIDTH)); end generate gen_usefull_leave; end generate gen_leave; gen_adder : if (NODE_INDEX <= TERNARY_ADDER_TREE_NB_ADDERS) generate begin ternary_adder_tree(NODE_INDEX) <= std_logic_vector(signed(ternary_adder_tree((3 * NODE_INDEX) - 1)) + signed(ternary_adder_tree(3 * NODE_INDEX)) + signed(ternary_adder_tree((3 * NODE_INDEX) + 1))); end generate gen_adder; end generate gen_ternary_adder_tree; sum <= ternary_adder_tree(1);

Complete source code (5 coding styles, testbench, vivado project and constraints) can be found attached to is page.

**Synthesis results**

We tried each of the 5 coding styles.

We tried the following configurations:

- add 16 8-bit vectors together
- add 8 16-bit vectors together
- add 12 16-bit vectors together
- add 16 16-bit vectors together

The target was a VU9P (VCU1525), the target frequency was 500 MHz.

**Analysis**

It is difficult to guess what has been synthesized.

Study of the schematics is too complicated.

The explicit binary adder tree should be somehow better, because the intermediate signals are on their very size instead of being resized to up to the final width.

However, this coding style actually leads to the worst results.

I do not have any explanation for this.

For some reason, with small input vectors, the explicit ternary adder tree using a recursive function implements the circuit in a different way.

I do not have any explanation for this.

The number of resources used and the timings are roughly the same, whatever the coding style used.

The timings are also roughly the same, whatever the coding style used.

The ranking of the different coding styles is not always the same, it differs for each configuration.

As a consequence, we will use the for loop coding style, for its simplicity and conciseness.

Highlighted

drjohnsmith

Teacher

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-06-2019 05:37 AM

1,256 Views

Registered:
07-09-2009

Looks like none of these methods are using the fast carry option of the CLB / slice chain.

You imply you have not seen how the tools actualy impimented the design,

Did you have timming constraints on the design, registerd in and out ?

My suggestion is the tools are just routing the desing as you describe them, and then as they have meet your timming score , they stop.

Do do any meeinginfull comparison you have to have registerd inputs and outputs, and then change the constraint till you have the maximum speed. I'd suggest a DDS type adder , where the output is fed back to one side of the adder tree , is a possible solutoin, Multiple registers to output the result to a pin, else the Tout time will dominate,

Did you compaer the A <= B +C adder, let the tools free to impliment, that I bet is the fastest you will get.

Its like hand coding an adder in C, Hand routing pure adders is just not effective, the tools will beat you every time.

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>