08-10-2017 11:12 PM
I am trying to run the sdaccel example of kmeans algorithm, which is provided in the below link
I changed the below line in make file
#Select the number of Compute units.
it compiles successfully.
i am running hardware emulation and my command is as follows
./host_kmeans -i ./data/100 -c ./data/100.gold_c5 -m 5 -n 5 -g 6
It runs also fine. But when i look at the sdaccel_profile_summary.html, below is the output.
|Device||Compute Unit||Kernel||Global Work Size||Local Work Size||Number Of Calls||Total Time (ms)||Minimum Time (ms)||Average Time (ms)||Maximum Time (ms)|
It does not show up the 6 compute units utilized here, which i configured in my make file.
Please help me in getting this example to run with more compute units ASAP.
08-14-2017 07:54 AM
08-14-2017 05:12 AM
Please help me with this.. I am struck...
Thanks in advance.
08-14-2017 07:54 AM
08-17-2017 10:01 PM
Thank you very much for the solution, it worked for me.
But after i changed the COMPUTE_UNITS to 6, the build failed by throwing below error.
"Compiling (bitstream) opencl binary: kmeans.hw.xilinx_xil-accel-rd-ku115_4ddr-xpr Log file: /data/examples_new/acceleration/kmeans/_xocc_link_kmeans.hw.xilinx_xil-accel-rd-ku115_4ddr-xpr_kmeans.hw.xilinx_xil-accel-rd-ku115_4ddr-xpr.dir/impl/build/system/kmeans.hw.xilinx_xil-accel-rd-ku115_4ddr-xpr/bitstream/kmeans.hw.xilinx_xil-accel-rd-ku115_4ddr-xpr_ipi/ipiimpl/ipiimpl.runs/impl_1/runme.log :
ERROR: [XOCC-1] design did not meet timing, auto frequency scaling failed because an unscalable system clock did not meet the target frequency. Please try specifying a lower clock frequency using '--kernel_frequency 300' for the next compilation
ERROR: [XOCC 60-704] Integration error, problem implementing OCL region, route_design ERROR"
So I used --kernel_frequency 100, just to be on safer side, with this change build was successful.
However i did not get the improvement in kernel execution time with respect to 6 compute units vs 1 compute units which has mentioned in your make file as below.
"NOTE: Kmeans can give better results with more compute units but for all Devices
#More than 2 compute units are not feasible so it is restricted to 2 compute units.
#User can increase the number of Compute units for bigger Devices and can get better
I attached the results with 1 compute unit vs 6 compute units. Can you please help me in getting the better performance as we increase the number of compute units ?.
I am getting the below results as of now.
|kernel_frequency||compute units||kernel execution time|
I also would like to know, what is the default kernel_frequency for this ku115 device, if one does not mention anything in the make file?
08-18-2017 05:50 AM
The default frequency for kernel depends on the target platform. The platform developer fixes it to a particular value. It is typically 200 or 250Mhz. So, by running the kernel at 100MHz, you are potentially losing half the performance (assuming the computation is not memory bound). One needs to balance kernel frequency when increasing number of compute units.