UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Newbie hebingsheng
Newbie
983 Views
Registered: ‎12-30-2018

Compute Unit Limit

Jump to solution

Hi, all 

Currently, I am doing implementations using Xilinx OpenCL. 

I have a problem that the CU exceeds the maximum CU number (10) allowed on the platform.  

My design is not data-level parallelism, thus cannot use the ND range model, I think. 

I also know there are only 10 master/slave interfaces allowed to connect to AXI. But I actually connect the kernels using pipes and only two kernels have global memory access.

Any idea about that?

Besides, does Xilinx OpenCL support pipe array? It should be useful when we have many pipes...   

BR

 

 

 

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
Xilinx Employee
798 Views
Registered: ‎07-18-2014

Re: Compute Unit Limit

Jump to solution

Hi @hebingsheng

Another option is the merge multiple kernel functionalities into single kernel and use dataflow to run them concurrently.

You  can refer below example for this case:

https://github.com/Xilinx/SDAccel_Examples/blob/master/getting_started/dataflow/dataflow_func_ocl/src/adder.cl

In above examples three sub-functions run concurrently and these functions are sharing data using stream interface. Stream depth can be specified using xocc level option as below (see Makefile in same example)

--xp param:compiler.xclDataflowFifoDepth=32

 

-Heera

 

 

 

7 Replies
Newbie hebingsheng
Newbie
974 Views
Registered: ‎12-30-2018

Re: Compute Unit Limit

Jump to solution

One more thing, it gives:  Max number of compute units in OpenCL binary exceeded. 

Actually, no matter whether the kernel uses the global memory or not, it is still a CU, right? 

Literally, I can imagine there are 10 kernels allowed at most... for Xilinx OpenCL.....  right? 

 

0 Kudos
Moderator
Moderator
947 Views
Registered: ‎11-04-2010

Re: Compute Unit Limit

Jump to solution

Hi, @hebingsheng ,

1. Please refer to the below post:

https://forums.xilinx.com/t5/SDAccel/10-compute-unit-limit/m-p/693901#M135

2. Currently "Pipe array" is not supported.

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Newbie hebingsheng
Newbie
927 Views
Registered: ‎12-30-2018

Re: Compute Unit Limit

Jump to solution
I have seen that discussion, but I think it does not help.
I have 16 asynchronous kernels. I cannot find a way to integrate them in fewer kernels. any idea about that?
0 Kudos
Newbie hebingsheng
Newbie
926 Views
Registered: ‎12-30-2018

Re: Compute Unit Limit

Jump to solution

Hi @hongh 

Thanks for your reply. 

I have seen that discussion, but I think it does not help.
I have 16 asynchronous kernels. I cannot find a way to integrate them in fewer kernels. any idea about that?

0 Kudos
Moderator
Moderator
851 Views
Registered: ‎11-04-2010

Re: Compute Unit Limit

Jump to solution

Hi, @hebingsheng ,

You can try to split some of the CUs into the different SLRs.

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
799 Views
Registered: ‎07-18-2014

Re: Compute Unit Limit

Jump to solution

Hi @hebingsheng

Another option is the merge multiple kernel functionalities into single kernel and use dataflow to run them concurrently.

You  can refer below example for this case:

https://github.com/Xilinx/SDAccel_Examples/blob/master/getting_started/dataflow/dataflow_func_ocl/src/adder.cl

In above examples three sub-functions run concurrently and these functions are sharing data using stream interface. Stream depth can be specified using xocc level option as below (see Makefile in same example)

--xp param:compiler.xclDataflowFifoDepth=32

 

-Heera

 

 

 

422 Views
Registered: ‎04-15-2019

Re: Compute Unit Limit

Jump to solution

FYI, I have 16 kernels and 16 queues running (as well as 16 threads running on the ARMs). The each kernel has 3 global arguments. This is in 18.2

How many global arguments in the 2 kernels that have them?  At the end of my compile the compiler adds the axi_interconnects.

In trying to think of why mine works but yours does not I have multiple queues one for each kernel. Each kernel only has 3 global varaibles. I have just compiled 24 kernels but I can't get this to run.

I use xocc to compile from the command line and that might mak a difference. 

Poking around I found this... There might be a setting in a .ini file you can change or just put this on the comand line.

param:compiler.​maxComputeUnitsType: Int

Default Value: -1

Maximum compute units allowed in the system. Any positive value will overwrite the numComputeUnits setting in the hardware platform (.dsa). The default value of -1 preserves the setting in the DSA.

 

0 Kudos