UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
474 Views
Registered: ‎02-18-2018

SDSoC OpenCL kernel timing constraints

Hi ,

 

I am implementing an OpenCL application on the Zedboard using Xilinx SDSoC version 2018.1. I want my kernel to be fully pipe lined, However when I compile the project, in the implementation phase it encounters this error: 

 

VPL-4: Design failed to meet timing.
  

I have selected 100 MHz clock frequency in the SDSoC settings (the minimum value available). When I check the HLS report, the estimated clock frequency is OK.

 

Screenshot from 2018-06-20 17-46-38.png

But when I check the Vivado project generated by the SDSoC, it shows that the worst case negative slack is around -5 ns. My problem is that I cannot reduce the desired clock frequency, neither from the SDSoC project setting, nor from the generated Vivado project because the minimum available value in SDSoC is 100 MHz and also the timing constraints in the Vivado project are all locked. So is there any other way to change the clock frequency constraint to a lower value?

 

I also put my OpenCL kernel code here. I would also appreciate your suggestions about improving the code.

 

#define NEURONS 512

kernel __attribute__ ((reqd_work_group_size(1, 1, 1)))
__kernel void vvmult(__global float* pA, __global float* pB, __global float* pC, __global float* pD)
{
	float sum = 0.0;
	
	local float local_b[NEURONS];
	local float local_c[NEURONS];
	local float local_d[NEURONS];

	//burst read first vector from global memory to local memory
	b_rd: __attribute__((xcl_pipeline_loop))
	for (int j = 0 ; j <  NEURONS; j++){
		local_b[j] = pB [j];
	}

	c_rd: __attribute__((xcl_pipeline_loop))
	for (int j = 0 ; j <  NEURONS; j++){
		local_c[j] = pC [j];
	}

	row_loop: __attribute__((xcl_pipeline_loop))
	for(int i = 0; i < NEURONS; i++)
	{ 
		sum = 0.0;

		mult_loop: __attribute__((xcl_pipeline_loop))
		for(int j = 0; j < NEURONS; j++)
		{
			float tmpA = pA[i*NEURONS+j];
			float tmpSum = tmpA * local_c[j];
			sum += tmpSum;
		}

		float tmpD = sum + local_b[i];
		local_d[i] = tmpD;
	}
	
    nolin_loop:__attribute__((xcl_pipeline_loop))
    for (int j = 0 ; j < NEURONS; j++)
    {
    	if(local_d[j]<0.0)
    		local_d[j] = 0.0;
    	else
    		local_d[j] = local_d[j];

    }

    //burst write the result
    out_wr:__attribute__((xcl_pipeline_loop))
    for (int j = 0 ; j < NEURONS; j++)
        pD[j] = local_d[j];

}

 

Tags (3)
0 Kudos
1 Reply
Moderator
Moderator
360 Views
Registered: ‎10-04-2011

Re: SDSoC OpenCL kernel timing constraints

Hello @mohammad90hgh,

 

What I see in the code is the following:

 

row_loop: __attribute__((xcl_pipeline_loop))
	for(int i = 0; i < NEURONS; i++)
	{ 
		sum = 0.0;

		mult_loop: __attribute__((xcl_pipeline_loop))

 

What the "row_loop" pipeline directive is saying is that every clock cycle, or interval if not 1, perform all statements in the loop block. This implies then that the "mult_loop" would need to be completely unrolled in order to accomplish that. That in turn creates simultaneous memory access issues as well as a large fan-in adder for sum, so I suspect a large state machine to control these is being created. 

 

What I wonder is if removing the PIPELINE directive from "row_loop" would improve your timing performance?

 

Also, just note that while the HLS timing reports 8.90ns, the uncertainty is also 2.70ns, for a total range of (8.90 - 2.70 = 6.20ns) to (8.90 + 2.70 = 11.60ns). So it also fails timing in the worst case scenario. 

 

Can you let me know if that helps?

 

Thank you,
Scott

0 Kudos