cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
daryan
Visitor
Visitor
4,689 Views
Registered: ‎10-25-2016

FPGA compilation not working like cpu simulation

Jump to solution

Hello,

I'm implementing a matrix multiply with openCL in SDAccel. To make this i'm using 3 kernels, one of them is for reading from global memory, one for writing into global memory and the last one for multiplying by blocks. What i do is read blocks from one kernel and writing the block into a pipe, then the multiply kernel will read from this pipe, multiply and then write the resulting block into another pipe, and the last kernel will read from the pipe and write the block into global memory.

What is happening is that in CPU simulation works well, but in the fgpa it does not multiply correctly. The vivado hls log of the multiply kernel compilation says that the port m_axi_gmem has no fanin or fanout and is left dangling. That is because the kernel doesn't need a port to global memory because only needs the pipe ports.

The curious fact is that if i use only 2 kernels, using the pipe for writing into global memory, doing the reading in the multiply kernel, it works well.

¿Any idea why this is happening?

 

0 Kudos
1 Solution

Accepted Solutions
daryan
Visitor
Visitor
6,812 Views
Registered: ‎10-25-2016

It works! I only updraded SDAccel from version 2016.1 to 2016.2 and now the same code works perfectly on the fpga, thanks for all your support!

View solution in original post

0 Kudos
6 Replies
vallina
Xilinx Employee
Xilinx Employee
4,600 Views
Registered: ‎03-18-2011

Can you provide the function signature of the 3 kernels in the design? Also, have you verified the correctness of the generated design by running the hardware emulation flow?

0 Kudos
daryan
Visitor
Visitor
4,546 Views
Registered: ‎10-25-2016

__attribute__((reqd_work_group_size(1, 1, 1)))

kernel void inputData(const global float16* A, const global float16* B, const uint K)

 

__attribute__((reqd_work_group_size(1, 1, 1)))

kernel void mul1(const uint K)

 

__attribute__((reqd_work_group_size(1, 1, 1)))

outputData(global float16* C, const uint N)

 

I didn't execute the hardware emulation because it takes more time than the CPU, and 99% of the time if the CPU emulation gives good results, the fpga will too.

0 Kudos
daryan
Visitor
Visitor
4,545 Views
Registered: ‎10-25-2016
I forgot to put kernel void outputData int the last signature, sorry.
0 Kudos
ywu
Xilinx Employee
Xilinx Employee
4,376 Views
Registered: ‎11-28-2007

Can you post attach your kernel code? Do you use blocking or non-blocking pipe write/read functions?

 

 

Cheers,
Jim
0 Kudos
daryan
Visitor
Visitor
3,962 Views
Registered: ‎10-25-2016

Hello,

 

First of all, I want to apologize, I know i posted this long ago. I supposed xilinx would send me an e-mail when someone replies to my messages, but in this case it didn't send me anything, so I didn't look.

 

Yes, I use blocking pipe reading and writing, to ensure every kernel reads and writes all values. You can view the kernel attached to this post.

 

Thanks

0 Kudos
daryan
Visitor
Visitor
6,813 Views
Registered: ‎10-25-2016

It works! I only updraded SDAccel from version 2016.1 to 2016.2 and now the same code works perfectly on the fpga, thanks for all your support!

View solution in original post

0 Kudos