cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
xH97acbL3j
Observer
Observer
365 Views
Registered: ‎05-12-2021

Unrolled loop iterations with no inter dependencies not executed in parallel

Jump to solution

Consider the following code, which sums all elements in a 32x256 matrix, line by line. The matrix is partitioned by line to ensure multiple read ports.

static int sum_line(int arr[256]) {
	int sum = 0;
	for (unsigned int i = 0; i < 256; ++i) {
		sum += arr[i];
	}
	return sum;
}

int test(int matrix[32][256], int count) {
	static int line_sum[32] = {0};

#pragma HLS array_partition variable=matrix dim=1 complete
#pragma HLS array_partition variable=line_sum dim=1 complete

sum_lines:
	for (int i = 0; i < 32; i++) {
#pragma HLS unroll
		line_sum[i] = sum_line(matrix[i]);
	}

	int total_sum = 0;
sum_total:
	for (int i = 0; i < 32; ++i) {
		total_sum += line_sum[i];
	}
	return total_sum;
}

This works as expected, latency is 547, meaning all unrolled loop iterations were parallelized correctly. Now I want it to use a little bit less resources, so I decide to unroll it by a factor of 8 only, using "#pragma HLS unroll factor=8 skip_exit_check". Expected latency should be about 4 times the current, so ~2200. However, for some reason, the HLS fails to parallelize the loop iterations when not unrolling completely. Actual latency is 24678... 

xilinx.png

Why would one iteration depend on the previous one like that? What's happening here? Am I missing pragmas? I'm using Vivado 2019.1.3.

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
xilinxacct
Professor
Professor
338 Views
Registered: ‎10-23-2018

@xH97acbL3j 

Take a look at pragma HLS allocation, pragma HLS inline, and pragma HLS function_instantiate ... I think among those, you can do what you want to do.

Hope that Helps
If so, Please mark as solution accepted. Kudos also welcomed.

View solution in original post

2 Replies
xilinxacct
Professor
Professor
339 Views
Registered: ‎10-23-2018

@xH97acbL3j 

Take a look at pragma HLS allocation, pragma HLS inline, and pragma HLS function_instantiate ... I think among those, you can do what you want to do.

Hope that Helps
If so, Please mark as solution accepted. Kudos also welcomed.

View solution in original post

xH97acbL3j
Observer
Observer
320 Views
Registered: ‎05-12-2021

Thanks, I was able to get an effective unrolling factor of 8 by limiting `sum_line` function allocation to 8.