We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Showing results for 
Search instead for 
Did you mean: 
Registered: ‎05-23-2017

If-else quesiton in for loop


 dist_calc_pcaf1: for(Dtype_uint i=0;i<(data_size/NUM_512_R_PCA_R)/CHANNEL_WID_PCA;i++){
        PRAGMA_HLS(HLS LOOP_TRIPCOUNT min=20 max=20)  //NUM_512_R_PCA_R*200
        dist_pca = dist_pca_in.read();
Dtype_uint skip_flag = 0; Line 7: dist_calc_pcaf2: for(Dtype_uint j=0; j<LOOP_NUM;j++){ //7clocks #pragma HLS UNROLL //PIPELINE II=1 skip_flag += Dtype_uint((dist_pca.x[j]>worst_pca) ? 1 :0); } if(skip_flag==NUM_512_R_PCA_R*CHANNEL_WID_PCA) Line12: filternumber_temp += NUM_512_R_PCA_R*CHANNEL_WID_PCA; Line13: else{ for(int j=0; j<LOOP_NUM;j++){ if(dist_pca.x[j] < worst_pca){ compute_in_or(feature_or, query_or, data_size, i*NUM_512_R_PCA_R*CHANNEL_WID_PCA+j, &dist_or); if(dist_or < worst_or){ k_max(i*NUM_512_R_PCA_R*CHANNEL_WID_PCA+j, dist_pca.x[j], index_pca_heap, dists_pca_heap,KNN*HEAP_D); k_max(i*NUM_512_R_PCA_R*CHANNEL_WID_PCA+j, dist_or, index_or_heap, dists_or_heap,KNN*HEAP_D); worst_pca = dists_pca_heap[KNN*HEAP_D-1]; worst_or = dists_or_heap[KNN*HEAP_D-1]; } }else ++filternumber_temp; } } }

Latency Information (clock cycles)
Compute Unit Kernel Name Module Name Start Interval Best Case Avg Case Worst Case
------------ ----------- --------------------- -------------- --------- -------- ----------
pcaf_fpga_1 pcaf_fpga dist_calc_pcaf 102 ~ 3822 102 922 3822

In this function block,  most of the iterations will go throught line7~line12 and skip the else block after line 13. 

the codes in the else-block after line 13 is in sequential.


The line7~line12 takes atround 2 clocks. If it doesn't goes into the else-block after line 3, only aournd 2x20=40 clocks will needed. 

Why the best case is 102 instead of 40?


Another thing is if I change the LOOP_NUM value from 2 to 8, I got the following latency result.

Since the latency for the line 7~line 12 doesn't change no matter what the LOOP_NUM value is, but the best case latency changed from 102 to 382.

In my understanding the latency for the best case only **bleep** the line7~line12, right? 

If so why the best case latency changed from 102 to 382?


Latency Information (clock cycles)
Compute Unit  Kernel Name  Module Name            Start Interval  Best Case  Avg Case  Worst Case
------------  -----------  ---------------------  --------------  ---------  --------  ----------
pcaf_fpga_1   pcaf_fpga    dist_calc_pcaf         382 ~ 15262     382        3662      15262




0 Kudos
1 Reply
Registered: ‎10-04-2011

Re: If-else quesiton in for loop

Hello @mathmaxsean ,

It's a bit hard to look at this code and understand what the variable bounds might be doing. But I think the answer to you second question is that the unrolling of the loop at line 7 would result in multiple simultaneous memory read requests for the variable : 


With LOOP_NUM = 2, you would have 2 reads when unrolled, and that can most likely be handled by the dual-port memory. However, when set to 8, this would be 8 read requests, which can not be handled simulataneously, and I would expect that it would affect the latency, and that you received warnings about being unable to schedule those requests. That the best case increased by about 4x matched this idea. 

I think to answer the first part of the question you would have to look at the analysis view of the design to see what the latency of each section is doing. 

OK, I hope this helps,



0 Kudos