09-04-2017 08:28 AM
If i write a two optimized functions in C and stream data between them using hls::stream, the output stage infers burst read with II 1.
But when i use pipes among kernels, the output stage II becomes very large (I just convert the functions to kernels and use pipes to transfer data between them).
Why is this happening
09-04-2017 12:46 PM
Ideally II should be same if Kernel code is identical for both OpenCL and HLS -C kernels.
Can you please take a look into following two Git Examples which are identical and give same II=1 for both case:
09-04-2017 03:28 PM
Thanks for the reply.
Yeah i checked the example code.
When i use dataflow model in Opencl, it works fine for my code. Its only when i use pipes i get large II.
in one of my designs i get the following II
I thought latency meant the time taken to get the complete output.
The time taken to complete my kernel is 319 ms (from profile summary report)
How to read this report?
09-04-2017 10:19 PM
If possible, can you share sample code of OpenCL kernel and equivalent HLS-C for which you are not getting identical II?
regarding your 2nd question:
Latency (Min and Max) are number of clock cycles. This is static report generated by HLS and computed based on loop trip-count (number of loop iteration).
Lets take your report here,
TRIP_COUNT=4096, --> which is loop iteration count
ITERATION_LATENCY= 284 --> which means one iteration of loop will take 284 cycles
II achieved = 8 --> next iteration will start just after 8 cycles of previous iteration.
So overall loop latency will be = Time taken by first iteration + II x (Remaining Iteration)
= ITERATION_LATENCY + II x (TRIP_COUNT -1)
= 284 + 8 x 4095
Which is close to 33043. I am not sure about 1 clock cycle difference, there might be something else missing in my calculation. Some hls expert can confirm.
09-05-2017 05:19 AM
Thanks for the reply. I have a few more questions.
1.) So the latency at the top of the image is also for the loop? I assumed it showed the complete latency (start to finish)
2.) The sobel fillter in sdaccel examples is not working when B=1 in HW_emu (says unable to partition array).
3.) I tried implementing a gaussian filter (3 x 3) using the convolve example but it is not working. Is it possible to do this using the sobel example? Can you help me with this?
09-05-2017 07:21 AM