UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Participant stefanoribes
Participant
4,090 Views
Registered: ‎07-25-2016

Forcing Execution Order

Hi,

 

I've just started using Vivado HLS, so I cannot be considered an expert. I'm trying to implement an IP module to be integrated in a larger project, so any help would be highly appreciated.

 

In short, the module is receiving data from a FIFO queue, after having received a fixed number of inputs, it processes them and produces a result. In the end, after the computation, it's supposed to send back the same inputs, in a FIFO manner, in order to "emulate" the FIFO at the input.

 

I set up the input and output ports as 'ap_fifo', but the problem I'm facing is that the module keeps processing my data as soon they arrive. Moreover, it doesn't wait the completion of the execution before making the inputs available at the output port.

 

So my question now is, how do I assure the right execution order? Is there a particular valid/detailed guide or a manual I can follow? Maybe related to designing custom FSM in HLS?

 

I attached my code for further details about my issue.

 

Best,

 

Stefano

 


 

My .cpp code:

 

#include "switchf.hpp"
#include <math.h>

acc_t fix_abs (data_t centroid, din_t pu_din);

dout_t switch_hls (
	din_t pu_din[INPUT_NEURONS_NUM],	// Input
	din_t pu_din_hls[INPUT_NEURONS_NUM]	// Output
	) {

	// Support variables
	int i, j;

	// Distance calculation: variables and accumulators
	static acc_t acc_k0[3] = {0, 0, 0};
	static acc_t acc_k1[3] = {0, 0, 0};
	static acc_t acc_k2[3] = {0, 0, 0};
	static acc_t acc_k3[3] = {0, 0, 0};
	static acc_t dist[K_NUM*2] = {0, 0, 0, 0};

	// Minimum calculation: variables
	static acc_t min_d0 = 0;
	static acc_t min_d1 = 0;
	static dout_t min_i0 = 0;
	static dout_t min_i1 = 0;
	static dout_t min_i = 0;

	// State of the Computation
	static sig_t setup_done = 0;
	static sig_t execution_done = 0;

	// BRAM used to store input data to be processed later.
	// The RAM mapping is required since the input array is modeled as a FIFO interface.
	static din_t point_bram[INPUT_NEURONS_NUM];

	// The centroid array is split in two arrays since they will be modeled as dual port ROMs.
	// For doing this, the K_NUM is therefore halved.
	const data_t centroids_rom_0[INPUT_NEURONS_NUM*K_NUM] = {
		0.200252,
		0.220494,
		0.340129,
		0.290424,
		0.220709,
		0.230335,
		0.200119,
		0.270402,
		0.360048,
		0.294172,
		0.295676,
		0.312707,
		0.277185,
		0.278315,
		0.295683,
		0.282100,
		0.283999,
		0.300054 
	};	

	const data_t centroids_rom_1[INPUT_NEURONS_NUM*K_NUM] = {
		0.231201,
		0.236885,
		0.235270,
		0.246549,
		0.251041,
		0.247211,
		0.249753,
		0.251712,
		0.248351,
		0.263698,
		0.257219,
		0.261968,
		0.257504,
		0.250117,
		0.257047,
		0.257507,
		0.251866,
		0.258178 
	};

	Reset_Step: {
		dist[0] = 0;
		dist[1] = 0;
		dist[2] = 0;
		dist[3] = 0;

		min_d0 = 0;
		min_d1 = 0;
		min_i0 = 0;
		min_i1 = 0;
		min_i = 0;

		acc_k0[0] = 0;
		acc_k0[1] = 0;
		acc_k0[2] = 0;
		acc_k1[0] = 0;
		acc_k1[1] = 0;
		acc_k1[2] = 0;
		acc_k2[0] = 0;
		acc_k2[1] = 0;
		acc_k2[2] = 0;
		acc_k3[0] = 0;
		acc_k3[1] = 0;
		acc_k3[2] = 0;

		setup_done = 0;
		execution_done = 0;
	}

///////////////////////////
// READ INPUTs
////////////////////////// FIFO_In_Step: { for (i = 0; i < INPUT_NEURONS_NUM; i++) { point_bram[i] = pu_din[i]; } setup_done = 1; }
/////////////////////
// EXECUTION
//////////////////// Centroid_Loop: { if (setup_done) { Centroid_Loop_K0: { Distance_Loop0_K0: for (j = 0; j < 3; j++) { acc_k0[0] += fix_abs(centroids_rom_0[j], point_bram[j]); } Distance_Loop1_K0: for (j = 0; j < 3; j++) { acc_k0[1] += fix_abs(centroids_rom_0[j+3], point_bram[j+3]); } Distance_Loop2_K0: for (j = 0; j < 3; j++) { acc_k0[2] += fix_abs(centroids_rom_0[j+6], point_bram[j+6]); } dist[0] = acc_k0[0] + acc_k0[1] + acc_k0[2]; acc_k0[1] = 0; acc_k0[0] = 0; acc_k0[2] = 0; } Centroid_Loop_K1: { Distance_Loop0_K1: for (j = 0; j < 3; j++) { acc_k1[0] += fix_abs(centroids_rom_0[INPUT_NEURONS_NUM+j], point_bram[j]); } Distance_Loop1_K1: for (j = 0; j < 3; j++) { acc_k1[1] += fix_abs(centroids_rom_0[INPUT_NEURONS_NUM+j+3], point_bram[j+3]); } Distance_Loop2_K1: for (j = 0; j < 3; j++) { acc_k1[2] += fix_abs(centroids_rom_0[INPUT_NEURONS_NUM+j+6], point_bram[j+6]); } dist[1] = acc_k1[0] + acc_k1[1] + acc_k1[2]; acc_k1[0] = 0; acc_k1[1] = 0; acc_k1[2] = 0; } Centroid_Loop_K2: { Distance_Loop0_K2: for (j = 0; j < 3; j++) { acc_k2[0] += fix_abs(centroids_rom_1[j], point_bram[j]); } Distance_Loop1_K2: for (j = 0; j < 3; j++) { acc_k2[1] += fix_abs(centroids_rom_1[j+3], point_bram[j+3]); } Distance_Loop2_K2: for (j = 0; j < 3; j++) { acc_k2[2] += fix_abs(centroids_rom_1[j+6], point_bram[j+6]); } dist[2] = acc_k2[0] + acc_k2[1] + acc_k2[2]; acc_k2[0] = 0; acc_k2[1] = 0; acc_k2[2] = 0; } Centroid_Loop_K3: { Distance_Loop0_K3: for (j = 0; j < 3; j++) { acc_k3[0] += fix_abs(centroids_rom_1[INPUT_NEURONS_NUM+j], point_bram[j]); } Distance_Loop1_K3: for (j = 0; j < 3; j++) { acc_k3[1] += fix_abs(centroids_rom_1[INPUT_NEURONS_NUM+j+3], point_bram[j+3]); } Distance_Loop2_K3: for (j = 0; j < 3; j++) { acc_k3[2] += fix_abs(centroids_rom_1[INPUT_NEURONS_NUM+j+6], point_bram[j+6]); } dist[3] = acc_k3[0] + acc_k3[1] + acc_k3[2]; acc_k3[0] = 0; acc_k3[1] = 0; acc_k3[2] = 0; } } Comparison_Step: { if (dist[0] < dist[1]) { min_d0 = dist[0]; min_i0 = 0; } else { min_d0 = dist[1]; min_i0 = 1; } if (dist[2] < dist[3]) { min_d1 = dist[2]; min_i1 = 2; } else { min_d1 = dist[3]; min_i1 = 3; } } /////////////////////// // EXECUTION IS DONE // /////////////////////// if (min_d0 < min_d1) { min_i = min_i0; execution_done = 1; } else { min_i = min_i1; execution_done = 1; } } // After everything is processed, copy-back all the received inputs. FIFO_Out_Step: { if (execution_done) { for (i = 0; i < INPUT_NEURONS_NUM; i++) { pu_din_hls[i] = point_bram[i]; } } } return min_i; } // Support function used to determine the absolute value of a fixed point number. acc_t fix_abs (data_t centroid, din_t pu_din) { acc_t tmp = centroid - pu_din; if (tmp > 0) { return tmp; } else { return -tmp; } }

 

 And here how I set up the directives:

 

set_directive_interface -mode ap_ctrl_hs -register "switch_hls"
set_directive_interface -mode ap_fifo -register "switch_hls" pu_din
set_directive_interface -mode ap_fifo -register "switch_hls" pu_din_hls
set_directive_reset "switch_hls" dist
set_directive_reset "switch_hls" min_i
set_directive_reset "switch_hls" min_d0
set_directive_reset "switch_hls" min_i0
set_directive_reset "switch_hls" min_d1
set_directive_reset "switch_hls" min_i1
set_directive_reset "switch_hls" acc_k0
set_directive_reset "switch_hls" acc_k2
set_directive_reset "switch_hls" acc_k3
set_directive_reset "switch_hls" acc_k1
set_directive_unroll "switch_hls/Reset_Step"
set_directive_pipeline "switch_hls/Reset_Step"
set_directive_unroll "switch_hls/Centroid_Loop"
set_directive_pipeline "switch_hls/Centroid_Loop"
set_directive_unroll "switch_hls/Distance_Loop0_K0"
set_directive_unroll "switch_hls/Distance_Loop1_K0"
set_directive_unroll "switch_hls/Distance_Loop2_K0"
set_directive_unroll "switch_hls/Distance_Loop0_K1"
set_directive_unroll "switch_hls/Distance_Loop1_K1"
set_directive_unroll "switch_hls/Distance_Loop2_K1"
set_directive_unroll "switch_hls/Distance_Loop0_K2"
set_directive_unroll "switch_hls/Distance_Loop1_K2"
set_directive_unroll "switch_hls/Distance_Loop2_K2"
set_directive_unroll "switch_hls/Distance_Loop0_K3"
set_directive_unroll "switch_hls/Distance_Loop1_K3"
set_directive_unroll "switch_hls/Distance_Loop2_K3"
set_directive_resource -core ROM_2P "switch_hls" centroids_rom_0
set_directive_resource -core ROM_2P "switch_hls" centroids_rom_1
set_directive_resource -core RAM_2P_BRAM "switch_hls" point_bram
set_directive_pipeline "switch_hls/Init_Step"

 

 

Tags (3)
0 Kudos
1 Reply
Scholar u4223374
Scholar
4,045 Views
Registered: ‎04-26-2015

Re: Forcing Execution Order

You can use ap_wait() for this. Anything before the ap_wait() completes before anything after the ap_wait() starts.

 

With that said, is processing the data as soon as it arrives actually a problem? I'm having trouble seeing the advantage of having the block sitting idle until lots of data is available, and then frantically processing it all as fast as possible.

0 Kudos