01-03-2020 01:43 AM - edited 01-03-2020 01:44 AM
Below is a very simple function. However, as every loop take 100ps to run, a simple loop of 10 will result in a very large latencies.
How can I speed it up?
static int vals_dst[200];
void hole_filling(int x1, int loop, int val, int vals_dst[dst_cols])
{
DDX_LOOP:
for(int idx=0; idx<loop;idx++)
vals_dst[idx] = val;
}
Eli
02-05-2020 02:55 AM
Each iteration takes 100ps to run? So your FPGA is running at >10GHz? That seems highly unlikely...
Unless vals_dst is fully partitioned (so you can unroll the loop completely and do the whole lot in one cycle), this is going to be slow. You might need to look at how the rest of the system is designed to try to eliminate this loop.
02-04-2020 07:01 AM
HI @eewse ,
Have you tried using array partition & using unrolling to your code ? This may improve the performance and also there will be increase in hardware resources.
02-04-2020 08:22 AM
I have tried both array practition and loop unroll, (correct me if I am wrong) I find that it might improve the throughput but the latencies do not change much.
02-05-2020 02:55 AM
Each iteration takes 100ps to run? So your FPGA is running at >10GHz? That seems highly unlikely...
Unless vals_dst is fully partitioned (so you can unroll the loop completely and do the whole lot in one cycle), this is going to be slow. You might need to look at how the rest of the system is designed to try to eliminate this loop.