cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Observer
Observer
2,143 Views
Registered: ‎05-18-2017

Streaming in SDAccel

If i write a two optimized functions in C and stream data between them using hls::stream, the output stage infers burst read with II 1.

But when i use pipes among kernels, the output stage II becomes very large (I just convert the functions to kernels and use pipes to transfer data between them).

 

Why is this happening

Tags (3)
0 Kudos
5 Replies
Highlighted
Xilinx Employee
Xilinx Employee
2,118 Views
Registered: ‎07-18-2014

Re: Streaming in SDAccel

Hi @race,

 

Ideally II should be same if Kernel code is identical for both OpenCL and HLS -C kernels.

 

Can you please take a look into following two Git Examples which are identical and give same II=1 for both case:

https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/dataflow/dataflow_stream_c

https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/dataflow/dataflow_pipes_ocl

 

-Heera

0 Kudos
Highlighted
Observer
Observer
2,105 Views
Registered: ‎05-18-2017

Re: Streaming in SDAccel

Hi @heeran,

 

Thanks for the reply. 

Yeah i checked the example code.

When i use dataflow model in Opencl, it works fine for my code. Its only when i use pipes i get large II.

 

also

in one of my designs i get the following II

I thought latency meant the time taken to get the complete output.

The time taken to complete my kernel is 319 ms (from profile summary report)

How to read this report?

doubtttttttttttttttttttttttttttttt.PNG

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
2,075 Views
Registered: ‎07-18-2014

Re: Streaming in SDAccel

Hi @race

If possible, can you share sample code of OpenCL kernel and equivalent HLS-C for which you are not getting identical II?

 

regarding your 2nd question:

Latency (Min and Max) are number of clock cycles. This is static report generated by HLS and computed based on loop trip-count (number of loop iteration).

 

Lets take your report here,

TRIP_COUNT=4096, --> which is loop iteration count

ITERATION_LATENCY= 284 --> which means one iteration of loop will take 284 cycles

II achieved = 8 --> next iteration will start just after 8 cycles of previous iteration.

 

So overall loop latency will be = Time taken by first iteration + II  x (Remaining Iteration)

                                                 = ITERATION_LATENCY + II x (TRIP_COUNT -1)

                                                 = 284 + 8 x 4095

                                                 = 33044

Which is close to 33043.  I am not sure about 1 clock cycle difference, there might be something else missing in my calculation. Some hls expert can confirm. 

 

-Heera

0 Kudos
Highlighted
Observer
Observer
2,057 Views
Registered: ‎05-18-2017

Re: Streaming in SDAccel

Hi @heeran,

 

Thanks for the reply. I have a few more questions.

 

1.) So the latency at the top of the image is also for the loop? I assumed it showed the complete latency (start to finish)

2.) The sobel fillter in sdaccel examples is not working when B=1 in HW_emu (says unable to partition array).

3.) I tried implementing a gaussian filter (3 x 3) using the convolve example but it is not working. Is it possible to do this using the sobel example? Can you help me with this?

 

0 Kudos
Highlighted
Observer
Observer
2,051 Views
Registered: ‎05-18-2017

Re: Streaming in SDAccel

Hi @heeran

In the report what does latency of 33045 at the top mean?

The design takes 319 ms to complete. This number doesn't make sense.

 

Thanks

0 Kudos