cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Jojojojojo
Newbie
Newbie
294 Views
Registered: ‎05-25-2021

Question on fully pipelined balanced adder tree with HLS

Jump to solution

Dear Sir/Madam,

Sorry if the question is duplicated.

I am trying to implement a pipelined version of a balanced adder tree with HLS. To simplify the problem, I use the integer type for the accumulation process. So far, I manage to have a balanced tree as a combinational circuit.

After inserting the pipeline directive, I get a design that takes N cycles for the loop, where N is the number of inputs; this suggests that the tree grows on only one side, otherwise, it would give me log2(N) cycles. Below is the code I used:

void acc(int_12t data_in[N], out_t* output){
#pragma HLS INTERFACE ap_none port=data_in
#pragma HLS ARRAY_PARTITION variable=data_in complete
	ap_int<16> temp = 0;
    loop_add: for (int i = 0; i < ORIN; i++){
#pragma HLS EXPRESSION_BALANCE
#pragma HLS PIPELINE
    	temp += data_in[i];
    }
    *output = ~temp[15];
}

 I know that the EXPRESSION_BALANCE directive is unnecessary here, it doesn't affect the result so I just leave it there.

Could you please help me with this? I feel like the solution could be simple, yet I cannot find it.

Best,

0 Kudos
1 Solution

Accepted Solutions
frederic
Xilinx Employee
Xilinx Employee
236 Views
Registered: ‎04-14-2013

Since you are pipelining the loop, you are instructing the tool to process its variables at each clock cycle which it can do with just one adder since you are reading one element of the array at a time in that loop.

You could pipeline the function itself and as a consequence HLS would unroll the loop and create an adder tree to give you the best II.

Then depending on the timing constraint, the tool would add levels or registers within the adder tree.

View solution in original post

2 Replies
frederic
Xilinx Employee
Xilinx Employee
237 Views
Registered: ‎04-14-2013

Since you are pipelining the loop, you are instructing the tool to process its variables at each clock cycle which it can do with just one adder since you are reading one element of the array at a time in that loop.

You could pipeline the function itself and as a consequence HLS would unroll the loop and create an adder tree to give you the best II.

Then depending on the timing constraint, the tool would add levels or registers within the adder tree.

View solution in original post

Jojojojojo
Newbie
Newbie
221 Views
Registered: ‎05-25-2021

It works, ah silly me.

Thank you very much!

0 Kudos