cancel
Showing results for 
Search instead for 
Did you mean: 
Newbie
Newbie
1,073 Views
Registered: ‎12-30-2018

Compute Unit Limit

Jump to solution

Hi, all 

Currently, I am doing implementations using Xilinx OpenCL. 

I have a problem that the CU exceeds the maximum CU number (10) allowed on the platform.  

My design is not data-level parallelism, thus cannot use the ND range model, I think. 

I also know there are only 10 master/slave interfaces allowed to connect to AXI. But I actually connect the kernels using pipes and only two kernels have global memory access.

Any idea about that?

Besides, does Xilinx OpenCL support pipe array? It should be useful when we have many pipes...   

BR

 

 

 

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
Xilinx Employee
888 Views
Registered: ‎07-18-2014

Re: Compute Unit Limit

Jump to solution

Hi @hebingsheng

Another option is the merge multiple kernel functionalities into single kernel and use dataflow to run them concurrently.

You  can refer below example for this case:

https://github.com/Xilinx/SDAccel_Examples/blob/master/getting_started/dataflow/dataflow_func_ocl/src/adder.cl

In above examples three sub-functions run concurrently and these functions are sharing data using stream interface. Stream depth can be specified using xocc level option as below (see Makefile in same example)

--xp param:compiler.xclDataflowFifoDepth=32

 

-Heera

 

 

 

View solution in original post

7 Replies
Highlighted
Newbie
Newbie
1,064 Views
Registered: ‎12-30-2018

Re: Compute Unit Limit

Jump to solution

One more thing, it gives:  Max number of compute units in OpenCL binary exceeded. 

Actually, no matter whether the kernel uses the global memory or not, it is still a CU, right? 

Literally, I can imagine there are 10 kernels allowed at most... for Xilinx OpenCL.....  right? 

 

0 Kudos
Highlighted
Moderator
Moderator
1,037 Views
Registered: ‎11-04-2010

Re: Compute Unit Limit

Jump to solution

Hi, @hebingsheng ,

1. Please refer to the below post:

https://forums.xilinx.com/t5/SDAccel/10-compute-unit-limit/m-p/693901#M135

2. Currently "Pipe array" is not supported.

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Highlighted
Newbie
Newbie
1,017 Views
Registered: ‎12-30-2018

Re: Compute Unit Limit

Jump to solution
I have seen that discussion, but I think it does not help.
I have 16 asynchronous kernels. I cannot find a way to integrate them in fewer kernels. any idea about that?
0 Kudos
Highlighted
Newbie
Newbie
1,016 Views
Registered: ‎12-30-2018

Re: Compute Unit Limit

Jump to solution

Hi @hongh 

Thanks for your reply. 

I have seen that discussion, but I think it does not help.
I have 16 asynchronous kernels. I cannot find a way to integrate them in fewer kernels. any idea about that?

0 Kudos
Highlighted
Moderator
Moderator
941 Views
Registered: ‎11-04-2010

Re: Compute Unit Limit

Jump to solution

Hi, @hebingsheng ,

You can try to split some of the CUs into the different SLRs.

-------------------------------------------------------------------------
Don't forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
889 Views
Registered: ‎07-18-2014

Re: Compute Unit Limit

Jump to solution

Hi @hebingsheng

Another option is the merge multiple kernel functionalities into single kernel and use dataflow to run them concurrently.

You  can refer below example for this case:

https://github.com/Xilinx/SDAccel_Examples/blob/master/getting_started/dataflow/dataflow_func_ocl/src/adder.cl

In above examples three sub-functions run concurrently and these functions are sharing data using stream interface. Stream depth can be specified using xocc level option as below (see Makefile in same example)

--xp param:compiler.xclDataflowFifoDepth=32

 

-Heera

 

 

 

View solution in original post

Highlighted
512 Views
Registered: ‎04-15-2019

Re: Compute Unit Limit

Jump to solution

FYI, I have 16 kernels and 16 queues running (as well as 16 threads running on the ARMs). The each kernel has 3 global arguments. This is in 18.2

How many global arguments in the 2 kernels that have them?  At the end of my compile the compiler adds the axi_interconnects.

In trying to think of why mine works but yours does not I have multiple queues one for each kernel. Each kernel only has 3 global varaibles. I have just compiled 24 kernels but I can't get this to run.

I use xocc to compile from the command line and that might mak a difference. 

Poking around I found this... There might be a setting in a .ini file you can change or just put this on the comand line.

param:compiler.​maxComputeUnitsType: Int

Default Value: -1

Maximum compute units allowed in the system. Any positive value will overwrite the numComputeUnits setting in the hardware platform (.dsa). The default value of -1 preserves the setting in the DSA.

 

0 Kudos