I'm just getting started with OpenCL on Vivado HLS. Here's a simple vector adder code from the user guide:
__kernel void __attribute__ ((reqd_work_group_size(16, 1, 1)))
vadd(__global int* a, __global int* b, __global int* c)
int idx = get_global_id(0);
c[idx] = a[idx] + b[idx];
When I try to synthesize it, one of the steps has a latency of over 100 cycles:
It seems that loading global memory is taking a long time. Why is this happening? Is there a way to specify how the memory is stored to optimize it?