UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor mvsoliveira
Visitor
445 Views
Registered: ‎03-21-2017

Grouping operations with dependency within the same schedule cycle (avoiding registers between these operations)

Jump to solution

I have been working in an R&D project at CERN that studies the use of HLS to low-latency firmware.

The first idea is to check if sorting networks can be efficiently implemented using Vivado HLS.

Starting from a minimalistic example, I am implementing a 4-key sorting network as follows:

4-key network.png

The sequence <x1,x2,x3,x4> is sorted after the following 5 compare-exchange operations <x1,x2>, <x3,x4>, <x1,x3>, <x2,x4>, <x2,x3> are executed.

I am using the following code.
comparator.c:

 

#include "comparator.h"

void compare_exchange(element_t data[N], const int a, const int b)
{
#pragma HLS INLINE
	element_t t;
	if (data[a].pt <= data[b].pt) {
		t = data[a];
		data[a] = data[b];
		data[b] = t;
	}
}


void compare_main(element_t data[N]) {
#pragma HLS INTERFACE ap_none port=data
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS ARRAY_PARTITION variable=data complete dim=1
compare_exchange(data,0,1);
compare_exchange(data,2,3);
compare_exchange(data,0,2);
compare_exchange(data,1,3);
compare_exchange(data,1,2);
}

comparator.h:

#ifndef __COMPARATOR_H__
#define __COMPARATOR_H__

#include <ap_int.h>

#define N 4
#define PT_WIDTH 4
#define ID_WIDTH 2

typedef ap_uint<PT_WIDTH> pt_t;
typedef ap_uint<ID_WIDTH> id_t;

typedef struct {
	id_t  id;
    pt_t  pt;
   } element_t;


void compare_exchange(element_t data[N], const int a, const int b);
void compare_main(element_t data[N]);
#endif

The input/output data is an array of struct objects because the original keys (id) for each element have to be propagated through the network in addition to the value to be compared (pt). The value of the pt is compared inside the compare_exchange function, and if the condition is met, the whole struct element is swapped. 

At the top-level function, compare_main, the 5 comparison-exchange operations are called. The following directives are used:

  1. The ARRAY_PARTITION directive is used to make sure a signal exists for each input, instead of using a memory interface.
  2. The INTERFACE directive is used to avoid control signals which are not needed yet. 
  3. The INLINE directive is used to enable cross-boundary optimizations. For example, the first two comparisons can happen at the same schedule cycle. 

Vivado HLS is working efficiently detecting data dependencies and it is splitting the comparison exchange operations into 3 different schedule cycles. Which is equivalent to what is described in the figure at the top of this post.

Although with the required clock_period, all the 5 operations could be implemented in a single clock cycle. I believe Vivado HLS is splitting the 5 comparisons into three cycles in view of a higher maximum frequency. The cycles are separated every time data dependency is detected, i.e., between comparisons <x3,x4>, <x1,x3> and <x2,x4>, <x2,x3>. Notice that the pair of comparisons <x1,x2>, <x3,x4> and <x1,x3>, <x2,x4> are grouped respectively into the same cycle because there is no data dependency within the same comparison pair. 

After generating the RTL, one can see that the slack is very high and in fact, all the 5 operations could have been scheduled to the same schedule cycle. Or if not possible, these 3 schedule cycles could have been implemented cascading their combinational logic and avoiding the register being implemented in between the cycles. This would reduce the latency at the price of a lower maximum frequency. 

In other words, for this minimalistic case, I would like to implement this 4-key sorting network with latency equal to 0, i.e., only a combinational logic. 

For larger networks, i.e., with more comparisons, I would like that Vivado HLS schedules operations with data dependency into the same clock cycle until the worst-case path delay does not exceed the clock_period - clock_uncertainty. Or in other words, I would like that Vivado groups as many data-dependent operations as possible into the same clock cycle and places a register only when timing can not be met. 

How can I do it? 

Many thanks in advance.

 

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
Xilinx Employee
336 Views
Registered: ‎09-04-2017

Re: Grouping operations with dependency within the same schedule cycle (avoiding registers between these operations)

Jump to solution

@mvsoliveira 

How about this

void top(int data[2],int dout[2]) {
#pragma HLS INTERFACE ap_none port=data
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS ARRAY_PARTITION variable=data complete dim=1
#pragma HLS ARRAY_PARTITION variable=dout complete dim=1
int t;

int temp[2];
#pragma HLS ARRAY_PARTITION variable=temp complete dim=1

temp[0] = data[0];
temp[1] = data[1];

// Comparison-exchange operation 1.
if (temp[0] <= temp[1]) {
t = temp[0];
temp[0] = temp[1];
temp[1] = t;
}
dout[0] = temp[0];
dout[1] = temp[1];
// Comparison-exchange operation 2.
if (temp[0] <= temp[1]) {
t = temp[0];
temp[0] = temp[1];
temp[1] = t;
}
dout[0] = temp[0];
dout[1] = temp[1];


}

View solution in original post

4 Replies
Visitor mvsoliveira
Visitor
347 Views
Registered: ‎03-21-2017

Re: Grouping operations with dependency within the same schedule cycle (avoiding registers between these operations)

Jump to solution

Information I got from another post makes me believe that sometimes it is impossible to prevent a register from being inserted somewhere you did not intend to. 

Just to be sure, I reduced the whole code to this very minimalistic example: 

void compare_test(int data[2]) {
#pragma HLS INTERFACE ap_none port=data
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS ARRAY_PARTITION variable=data complete dim=1
int t;

// Comparison-exchange operation 1. if (data[0] <= data[1]) { t = data[0]; data[0] = data[1]; data[1] = t; }

// A register is being inserted here.

// Comparison-exchange operation 2.
if (data[0] <= data[1]) { t = data[0]; data[0] = data[1]; data[1] = t; } }

Two identical operations in the same data are executed one after the other. HLS inserts one register between the two operations, despite the fact that the slack is still very high, and the two operations could have been implemented in the same combinational path.

Could someone please confirm that there is indeed no constraint or different way of writing the code that can prevent HLS from inserting the register from the two comparison-exchange operations 1 and 2? 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
337 Views
Registered: ‎09-04-2017

Re: Grouping operations with dependency within the same schedule cycle (avoiding registers between these operations)

Jump to solution

@mvsoliveira 

How about this

void top(int data[2],int dout[2]) {
#pragma HLS INTERFACE ap_none port=data
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS ARRAY_PARTITION variable=data complete dim=1
#pragma HLS ARRAY_PARTITION variable=dout complete dim=1
int t;

int temp[2];
#pragma HLS ARRAY_PARTITION variable=temp complete dim=1

temp[0] = data[0];
temp[1] = data[1];

// Comparison-exchange operation 1.
if (temp[0] <= temp[1]) {
t = temp[0];
temp[0] = temp[1];
temp[1] = t;
}
dout[0] = temp[0];
dout[1] = temp[1];
// Comparison-exchange operation 2.
if (temp[0] <= temp[1]) {
t = temp[0];
temp[0] = temp[1];
temp[1] = t;
}
dout[0] = temp[0];
dout[1] = temp[1];


}

View solution in original post

Visitor mvsoliveira
Visitor
299 Views
Registered: ‎03-21-2017

Re: Grouping operations with dependency within the same schedule cycle (avoiding registers between these operations)

Jump to solution

Hi @nithink, this definitely works. Really, thank you very much for giving a solution. 

I will work in the next days extending your proposed code to larger sorting networks.

Could you please elaborate on why this code prevents the register from being inserted?

Thank you very much!!!!  

 

0 Kudos
Xilinx Employee
Xilinx Employee
244 Views
Registered: ‎09-04-2017

Re: Grouping operations with dependency within the same schedule cycle (avoiding registers between these operations)

Jump to solution

@mvsoliveira  In your original case, you are trying to read from the same variable data. So for the second comparision, HLS needs to store it. what i am doing now is keeping a temporary variable to load the inputs once and working on the temp variables.

Thanks,

Nithin

0 Kudos