UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Visitor channinglan
Visitor
1,560 Views
Registered: ‎08-21-2017

2017.2 sdaceel + github example : Data transfer between kernel(s) and global memory(s) ???

 

What is "a transfer between kernel(s) and global memory(s) " ?

Why execute for a long time many times ??





3.10.0-514.el7.x86_64
Using built-in specs.
COLLECT_GCC=/opt/Xilinx/SDx/2017.2/Vivado/tps/lnx64/gcc-6.2.0/bin/g++
COLLECT_LTO_WRAPPER=/home/test/share/Xilinx_tool/SDx/2017.2/Vivado/tps/lnx64/gcc-6.2.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/6.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../../src/lnx64/configure --prefix=/tools/batonroot/rodin/devkits/lnx64/gcc-6.2.0 --enable-languages=c,c++ --with-ppl=/tools/batonroot/rodin/devkits/lnx64/ppl-0.11 --with-cloog=/tools/batonroot/rodin/devkits/lnx64/cloog-ppl-0.15.11 LDFLAGS=-L/tools/batonroot/rodin/devkits/lnx64/cloog-ppl-0.15.11/lib
Thread model: posix
gcc version 6.2.0 (GCC)
-----------------------------------
make all TARGETS=hw_emu DEVICES=xilinx_kcu1500_4ddr-xpr_4_0
-----------------------------------
make: Nothing to be done for `all'.
-----------------------------------
emconfigutil --od . --nd 1  --platform xilinx_kcu1500_4ddr-xpr_4_0
-----------------------------------

****** configutil v2017.2_sdx (64-bit)
  **** SW Build 1972098 on Wed Aug 23 11:34:38 MDT 2017
    ** Copyright 1986-2017 Xilinx, Inc. All Rights Reserved.

INFO: [ConfigUtil 60-895]    Target platform: /opt/Xilinx/SDx/2017.2/platforms/xilinx_kcu1500_4ddr-xpr_4_0/xilinx_kcu1500_4ddr-xpr_4_0.xpfm
emulation configuration file `emconfig.json` is created in ./. directory
-----------------------------------
./rsa
XCL_EMULATION_MODE=hw_emu
-----------------------------------
INFO: [main.cpp:48] TIME: [Tue Oct 31 03:53:39 2017] Xilinx 2048 bit RSA Application
xcl_mode=hw_emu
xcl_target=(null)
if xcl_mode is set then check if it's equal to true
if it's not equal to true then it should be whatever XCL_EMULATION_MODE is set to
world.mode=hw_emu
Linux:3.10.0-514.el7.x86_64:#1 SMP Tue Nov 22 16:42:41 UTC 2016:x86_64
---
XILINX_OPENCL=""
LD_LIBRARY_PATH="/opt/Xilinx/SDx/2017.2/runtime/lib/x86_64:/opt/Xilinx/SDx/2017.2/lib/lnx64.o"
---
INFO: Importing xclbin/krnl_rsa.hw_emu.xilinx_kcu1500_4ddr-xpr_4_0.xclbin
INFO: Loaded file,krnl_size=4914975 krnl_bin=0x7f9e14f7f010 world.device_id=7602144
INFO: [SDx-EM 01] Hardware emulation runs detailed simulation underneath. It may take long time for large data set. Please use a small dataset for faster execution. You can still get performance trend for your kernel with smaller dataset.
program=0x73d1b0,err=0
INFO: Created Binary
INFO: Built Program
read in ciphertext
ciphertext = 37F51321ECB6C1270C7E7922DD2DCFB1E455A6E468762C99E9D443885D7E1F3011A92060BB4AE066742CF238681B4D86A372E76F96096D6CFB1380D6F391F1850D86D7F900E3F2629CDB72FE148A9DE9F132521C6D89C4982879803FA33A00A95F3FDA26C62E24067A9C0B57078184A9D201495496E508C6DB1221B70FEF9DBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
n is
B931076E954DE3891AF2B6395F3701ADA15E80CA250A13CD52B8B29BCD9E65509E452298CD8BCBC8CC3D33B43B60BEA5E5AF87CEE8222BA95F8B91204974210C67BF331025A77A92483B3AF7D228536DE98CD9A875005ED43FC5BFD464ABAF5D1F5540D1C5EBAD519514417634109F7A46191A09278F8B89ACF2CB86D6FA6543d is
85039232FB4A5683C3B750EB24587DFC184BA87588E5141405B6639344BCE048676580D3FFCEC93010826500AF256DC9FA8F791C43DF473D00435E99B2289712F70779C4915EAA718627741D2968285108383A3DE422E012765A5EC2FD66AEB58CBE0C5A0278A5242C716EDD1E29D2AF4957640D5E972EE257130E73C9E36701
 C mod p = 74000000000000000000000000000000015D6C0000000000685D6C0000000000090000000000000061705F6D656D6F727900760000000000015D6C0000000000

 C mod q = 74000000000000000000000000000000015D6C0000000000685D6C0000000000090000000000000061705F6D656D6F727900760000000000015D6C0000000000

 p = F4ADFDBC733C3D802736D1D27A59F000D7C035A1E6C1ACC0C8FC2D271D7FFE63195D19E8DD4E42E773868D7E52A37A6C61CAF0758E75624BF5DF27EFABF61C5D

 q = C1C2746F4EAE6FF9538644EC19A5EAAF75A499A3314EDB8DAFA036746401DBA282A4A78FE5E729DDC427BF06F35056844133A3064531C18DBCB410D84C216E1F
start running kernelINFO: [rsa_app.cpp:547] TIME: [Tue Oct 31 03:53:39 2017] Invoking rsa
WARNING: unaligned host pointer detected, this leads to extra memcpy
WARNING: unaligned host pointer detected, this leads to extra memcpy
WARNING: unaligned host pointer detected, this leads to extra memcpy
WARNING: unaligned host pointer detected, this leads to extra memcpy
WARNING: unaligned host pointer detected, this leads to extra memcpy
WARNING: unaligned host pointer detected, this leads to extra memcpy
WARNING: unaligned host pointer detected, this leads to extra memcpy
WARNING: unaligned host pointer detected, this leads to extra memcpy
WARNING: unaligned host pointer detected, this leads to extra memcpy
WARNING: unaligned host pointer detected, this leads to extra memcpy
INFO: [rsa_app.cpp:359] TIME: [Tue Oct 31 03:53:39 2017] EX1: to make sure all buffers are migrated to device
INFO: [SDx-EM 22] [Wall clock time: 03:58, Emulation time: 2.91175 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 0.875 KB               WR = 0.000 KB       
BANK1          RD = 0.000 KB               WR = 0.000 KB       
BANK2          RD = 0.000 KB               WR = 0.000 KB       
BANK3          RD = 0.000 KB               WR = 0.000 KB       

INFO: [SDx-EM 22] [Wall clock time: 04:03, Emulation time: 5.80783 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 0.875 KB               WR = 0.000 KB       
BANK1          RD = 0.000 KB               WR = 0.000 KB       
BANK2          RD = 0.000 KB               WR = 0.000 KB       
BANK3          RD = 0.000 KB               WR = 0.000 KB       

INFO: [SDx-EM 22] [Wall clock time: 04:09, Emulation time: 8.58319 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 0.875 KB               WR = 0.000 KB       
BANK1          RD = 0.000 KB               WR = 0.000 KB       
BANK2          RD = 0.000 KB               WR = 0.000 KB       
BANK3          RD = 0.000 KB               WR = 0.000 KB       

INFO: [SDx-EM 22] [Wall clock time: 04:17, Emulation time: 10.8768 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 0.875 KB               WR = 0.000 KB       
BANK1          RD = 0.000 KB               WR = 0.000 KB       
BANK2          RD = 0.000 KB               WR = 0.000 KB       
BANK3          RD = 0.000 KB               WR = 0.000 KB       

INFO: [SDx-EM 22] [Wall clock time: 04:22, Emulation time: 13.6825 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 0.875 KB               WR = 0.000 KB       
BANK1          RD = 0.000 KB               WR = 0.000 KB       
BANK2          RD = 0.000 KB               WR = 0.000 KB       
BANK3          RD = 0.000 KB               WR = 0.000 KB       

INFO: [SDx-EM 22] [Wall clock time: 04:27, Emulation time: 16.2188 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 0.875 KB               WR = 0.000 KB       
BANK1          RD = 0.000 KB               WR = 0.000 KB       
BANK2          RD = 0.000 KB               WR = 0.000 KB       
BANK3          RD = 0.000 KB               WR = 0.000 KB       

INFO: [SDx-EM 22] [Wall clock time: 04:32, Emulation time: 19.0495 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 0.875 KB               WR = 0.000 KB       
BANK1          RD = 0.000 KB               WR = 0.000 KB       
BANK2          RD = 0.000 KB               WR = 0.000 KB       
BANK3          RD = 0.000 KB               WR = 0.000 KB       

INFO: [SDx-EM 22] [Wall clock time: 04:37, Emulation time: 21.6041 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 0.875 KB               WR = 0.000 KB       
BANK1          RD = 0.000 KB               WR = 0.000 KB       
BANK2          RD = 0.000 KB               WR = 0.000 KB       
BANK3          RD = 0.000 KB               WR = 0.000 KB       

INFO: [SDx-EM 22] [Wall clock time: 04:43, Emulation time: 24.3718 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 0.875 KB               WR = 0.000 KB       
BANK1          RD = 0.000 KB               WR = 0.000 KB       
BANK2          RD = 0.000 KB               WR = 0.000 KB       
BANK3          RD = 0.000 KB               WR = 0.000 KB       

INFO: [SDx-EM 22] [Wall clock time: 04:48, Emulation time: 27.1564 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 1.375 KB               WR = 0.000 KB       
BANK1          RD = 0.000 KB               WR = 0.000 KB       
BANK2          RD = 0.000 KB               WR = 0.000 KB       
BANK3          RD = 0.000 KB               WR = 0.000 KB       

INFO: [SDx-EM 22] [Wall clock time: 04:53, Emulation time: 29.7433 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 1.375 KB               WR = 0.000 KB       
BANK1          RD = 0.000 KB               WR = 0.000 KB       
BANK2          RD = 0.000 KB               WR = 0.000 KB       
BANK3          RD = 0.000 KB               WR = 0.000 KB       

INFO: [SDx-EM 22] [Wall clock time: 04:58, Emulation time: 32.0847 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 1.375 KB               WR = 0.000 KB       
BANK1          RD = 0.000 KB               WR = 0.000 KB       
BANK2          RD = 0.000 KB               WR = 0.000 KB       
BANK3          RD = 0.000 KB               WR = 0.000 KB       


^CTerminated

0 Kudos
4 Replies
Visitor channinglan
Visitor
1,315 Views
Registered: ‎08-21-2017

Re: 2017.2 sdaceel + github example : Data transfer between kernel(s) and global memory(s) ???

0 Kudos
Visitor loldaa
Visitor
1,297 Views
Registered: ‎12-05-2017

Re: 2017.2 sdaceel + github example : Data transfer between kernel(s) and global memory(s) ???

Hi,

1.Question about a transfer between the kernel and global memory

Global memory is the on board memory, if you are using the KCU1500, it is the 16GB DDR4 memory on the FPGA board.

Kernel is the computation logic created by your code inside FPGA chip, in case KCU1500 , it's inside the ku115 chip.

 

Transfer means FPGA chip(ku115) is reading or writing the global memory(DDR4 on board).

 

2. Why long time 

That is because you are using a hardware emulation which means using the CPU to emulate the FPGA to do the computation. 

0 Kudos
Visitor channinglan
Visitor
1,280 Views
Registered: ‎08-21-2017

Re: 2017.2 sdaceel + github example : Data transfer between kernel(s) and global memory(s) ???

why some of the examples will be executed too long?

Other examples also run hw_emu, the complexity is similar .

Why RSA so different?

0 Kudos
Visitor loldaa
Visitor
1,274 Views
Registered: ‎12-05-2017

Re: 2017.2 sdaceel + github example : Data transfer between kernel(s) and global memory(s) ???

Hi

1. why so long

Look at your log, I found  "WARNING: unaligned host pointer detected, this leads to extra memcpy"

 

It may be when you initial the device memory by the following similar command

    cl::Buffer buffer_input (context, CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY,
                                                                                 image_size_bytes, host_data());

 

When you use the CL_MEM_USE_HOST_PTR flag, make sure your host code allocate memory by

   std::vector<int, aligned_allocator<int>>  host_data;

 

In order to use "aligned_allocator<int>"  you also need to add the following code to your host code.

//-----------------------------------------------------------------------------------------------------

//Customized buffer allocation for 4K boundary alignment
template <typename T>
struct aligned_allocator
{
  using value_type = T;
  T* allocate(std::size_t num)
{
void* ptr = nullptr;
if (posix_memalign(&ptr,4096,num*sizeof(T)))
  throw std::bad_alloc();
  return reinterpret_cast<T*>(ptr);
}
void deallocate(T* p, std::size_t num)
{
  free(p);
}
};

//-------------------------------------------------------------------------------------------------

 

 

That's only my opinion, I hope it can help you.

 

 

 

 

 

 

0 Kudos