cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Participant
Participant
6,147 Views
Registered: ‎12-30-2015

OpenCL kernel with async_work_group_copy()

Jump to solution


Hello,
I'm trying to synthesize an OpenCL kernel for matrix multiplication. The kernel works, but if I change the memory copy loop:

__kernel void __attribute__ ((reqd_work_group_size(WG_SIZE_0, 1, 1)))
mmul(__global const data_t *ma, __global const data_t *mb, __global data_t *mc) {

	// Input and output rows buffer
	__local data_t a_row[WG_SIZE_0];
	__local data_t c_row[WG_SIZE_0];

	// Matrix A row index
	const int i = get_global_id(0);

	// Copy row i (global_id index) form matrix A
	for (int j = 0; j < WG_SIZE_0; ++j) {
		a_row[j] = ma[i * WG_SIZE_0 + j];
	}


with:

	async_work_group_copy(a_row, ma + i * WG_SIZE_0, WG_SIZE_0, 0);
	barrier(CLK_LOCAL_MEM_FENCE);


The design becomes un-synthesizable, with the following errors:

@E [SYNCHK-11] mmul_OpenCL/srcs/mmul.cl:5: in function 'mmul': Variable '.01.reg2mem.priv.i.i' has an unsynthesizable type '[8 x i32 addrspace(1)*]*' (possible cause(s): pointer to pointer or global pointer).
@E [SYNCHK-22] mmul_OpenCL/srcs/mmul.cl:5: in function 'mmul': memory copy is not supported unless used on bus interface possible cause(s): non-static/non-constant local array with initialization).
@I [SYNCHK-10] 2 error(s), 0 warning(s).
@E [HLS-70] Synthesizability check failed.

Any ideas why this isn't working?


 

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
Xilinx Employee
10,914 Views
Registered: ‎11-28-2007
const int i = get_global_id(0);
async_work_group_copy(a_row, ma + i * WG_SIZE_0, WG_SIZE_0, 0);

The way async_work_group_copy is used in your code violates the OpenCL Spec. Below statement is copied from OpenCL 1.2 spec (page 278 https://www.khronos.org/registry/cl/specs/opencl-1.2.pdf ). In your code, each work-item passes a new source pointer values to async_work_group_copy fucntion.

 
async copy is performed by all work-items in a workgroup and this built-in function must therefore be encountered by all work-items in a work-group executing the kernel with the same argument values; otherwise the results are undefined.
Cheers,
Jim

View solution in original post

0 Kudos
4 Replies
Highlighted
Xilinx Employee
Xilinx Employee
6,117 Views
Registered: ‎11-28-2007

What is the actual data type of data_t? Can you please attach the complete code so we can try it out?

 


@twoism_ wrote:


Hello,
I'm trying to synthesize an OpenCL kernel for matrix multiplication. The kernel works, but if I change the memory copy loop:

__kernel void __attribute__ ((reqd_work_group_size(WG_SIZE_0, 1, 1)))
mmul(__global const data_t *ma, __global const data_t *mb, __global data_t *mc) {

	// Input and output rows buffer
	__local data_t a_row[WG_SIZE_0];
	__local data_t c_row[WG_SIZE_0];

	// Matrix A row index
	const int i = get_global_id(0);

	// Copy row i (global_id index) form matrix A
	for (int j = 0; j < WG_SIZE_0; ++j) {
		a_row[j] = ma[i * WG_SIZE_0 + j];
	}


with:

	async_work_group_copy(a_row, ma + i * WG_SIZE_0, WG_SIZE_0, 0);
	barrier(CLK_LOCAL_MEM_FENCE);


The design becomes un-synthesizable, with the following errors:

@E [SYNCHK-11] mmul_OpenCL/srcs/mmul.cl:5: in function 'mmul': Variable '.01.reg2mem.priv.i.i' has an unsynthesizable type '[8 x i32 addrspace(1)*]*' (possible cause(s): pointer to pointer or global pointer).
@E [SYNCHK-22] mmul_OpenCL/srcs/mmul.cl:5: in function 'mmul': memory copy is not supported unless used on bus interface possible cause(s): non-static/non-constant local array with initialization).
@I [SYNCHK-10] 2 error(s), 0 warning(s).
@E [HLS-70] Synthesizability check failed.

Any ideas why this isn't working?


 


 

Cheers,
Jim
0 Kudos
Highlighted
Participant
Participant
6,107 Views
Registered: ‎12-30-2015
#define WG_SIZE_0 8
typedef int data_t;

0 Kudos
Highlighted
Participant
Participant
6,103 Views
Registered: ‎12-30-2015


The problem arises when I substitute the memory copy loop with the async_work_group_copy call. The rest of the code is a plain matrix multiplier. You can replicate (and isolate) the error by synthesizing with the following snippet:

mmul.h

#ifndef MMUL_H
#define MMUL_H

#define WG_SIZE_0 8
typedef int data_t;

#endif



mmul.cl

#include <clc.h>
#include "mmul.h"

__kernel void __attribute__ ((reqd_work_group_size(WG_SIZE_0, 1, 1)))
mmul(__global const data_t *ma, __global const data_t *mb, __global data_t *mc) {

	// Input and output rows buffer
	__local data_t a_row[WG_SIZE_0];
	__local data_t c_row[WG_SIZE_0];

	// Matrix A row index
	const int i = get_global_id(0);

	// Unused
	//data_t j = get_global_id(1);
	
	// Copy row i (global_id index) form matrix A
//	for (int j = 0; j < WG_SIZE_0; ++j) {
//		a_row[j] = ma[i * WG_SIZE_0 + j];
//	}

	async_work_group_copy(a_row, ma + i * WG_SIZE_0, WG_SIZE_0, 0);
	// ensure all values have been read before begin the next stage
	barrier(CLK_LOCAL_MEM_FENCE);

}


Errors:

@I [HLS-10] Checking synthesizability ...
@E [SYNCHK-11] mmul_compute_unit:62: in function '__AESL_work_groupA': Variable '.0.reg2mem.priv.i.i' has an unsynthesizable type '[8 x i32 addrspace(1)*]*' (possible cause(s): pointer to pointer or global pointer).
@W [SYNCHK-77] The top function '__AESL_work_groupA' (mmul_compute_unit:62) has no outputs. Possible cause(s) are: (1) Output parameters are passed by value; (2) intended outputs (parameters or global variables) are never written; (3) there are infinite loops.
@I [SYNCHK-10] 1 error(s), 1 warning(s).
@E [HLS-70] Synthesizability check failed.


Thanks

 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
10,915 Views
Registered: ‎11-28-2007
const int i = get_global_id(0);
async_work_group_copy(a_row, ma + i * WG_SIZE_0, WG_SIZE_0, 0);

The way async_work_group_copy is used in your code violates the OpenCL Spec. Below statement is copied from OpenCL 1.2 spec (page 278 https://www.khronos.org/registry/cl/specs/opencl-1.2.pdf ). In your code, each work-item passes a new source pointer values to async_work_group_copy fucntion.

 
async copy is performed by all work-items in a workgroup and this built-in function must therefore be encountered by all work-items in a work-group executing the kernel with the same argument values; otherwise the results are undefined.
Cheers,
Jim

View solution in original post

0 Kudos