UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer jyerra2
Observer
1,539 Views
Registered: ‎11-01-2017

Overlapping host and kernel execution - Kernel created by RTL Kernel Wizard

Jump to solution

Hi,

 

My work requires me to create RTL Kernels. I have used the RTL Kernel Wizard in order to create the kernel. I have verified the functionality and a single instance of the kernel works. However, my final project requires me to run this kernel continuously to process data which the host receives through a NIC(socket programming). After going through the GitHub SDAccel Examples, I arrived at the conclusion that in order to do so, I would need to overlap host and kernel execution. Using the overlap_ocl example as reference, I have modified the generated(from the RTL Kernel example) host code to call asynchronous memcpy operations which would enable me to run the buffer copy operations and kernel in an out of order queue. However, I run into a few issues with this approach:

 

1. Will this design be able to run as long as required i.e. can we have a while loop which continuously sends data from the host to the device and vice versa whilst the kernel is processing this data?

2. The overlap_ocl example has host code in C++ while the generated host code is in C. How do I ensure compatibility i.e. for instance, the C++ code uses classes and namespace std. 

3. The biggest problem I run into is the problem of unaligned host pointer. Attached below is the error I encounter. In order to counter it, I have tried to copy over the aligned allocator functions in the overlap_ocl host code, but I encounter the same error. I believe that since memory copy takes more time, there is no buffer allocated on the device by the times it tries to copy something into it. 

 

Please do let me know how to solve this issue at the earliest. I will be happy to provide more information regarding the nature of this problem, if required.

overlap_exec_error.PNG
0 Kudos
1 Solution

Accepted Solutions
Xilinx Employee
Xilinx Employee
1,500 Views
Registered: ‎03-24-2010

Re: Overlapping host and kernel execution - Kernel created by RTL Kernel Wizard

Jump to solution

1. When doing overlapping with host and kernel, it's essential to make it's synchronized between data and kernel. That's to say, before kernel execution, the input data should be ready. When read back data by host, the data should be written by kernel. Events can be used to synchronize between executions in the command queue. And clWaitForEvents can be used to synchronize between host and kernel. Refer to OpenCL standard for usage of events, clWaitForEvents, and etc.

2. Choose C++ or C for host code. Any one make you feel better.

3. Use aligned_allocator for C++ or "posix_memalign" for C to do alignment. Do a search in Xilinx Github examples or google to find usage example. 

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
7 Replies
Xilinx Employee
Xilinx Employee
1,501 Views
Registered: ‎03-24-2010

Re: Overlapping host and kernel execution - Kernel created by RTL Kernel Wizard

Jump to solution

1. When doing overlapping with host and kernel, it's essential to make it's synchronized between data and kernel. That's to say, before kernel execution, the input data should be ready. When read back data by host, the data should be written by kernel. Events can be used to synchronize between executions in the command queue. And clWaitForEvents can be used to synchronize between host and kernel. Refer to OpenCL standard for usage of events, clWaitForEvents, and etc.

2. Choose C++ or C for host code. Any one make you feel better.

3. Use aligned_allocator for C++ or "posix_memalign" for C to do alignment. Do a search in Xilinx Github examples or google to find usage example. 

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
Observer jyerra2
Observer
1,458 Views
Registered: ‎11-01-2017

Re: Overlapping host and kernel execution - Kernel created by RTL Kernel Wizard

Jump to solution

Hi,

 

Thank you! Your advice has helped so far.

 

I believe I am using clWaitforEvents the same way it is being used in overlap_ocl. There does not seem to be a problem with that.

 

I have added the following in my code: 

 void* h_axi00_ptr0_input1 = nullptr;
 void* h_axi00_ptr0_output1 = nullptr;
 uint8_t* h_axi00_ptr0_input;
 uint8_t* h_axi00_ptr0_output;

if(posix_memalign(&h_axi00_ptr0_input1,4096,MAX_LENGTH*sizeof(uint8_t))){
printf("Bad allocation of input host pointer\n");
throw std::bad_alloc();
}
if(posix_memalign(&h_axi00_ptr0_output1,4096,MAX_LENGTH*sizeof(uint8_t))){
printf("Bad allocation of output host pointer\n");
throw std::bad_alloc();
}
h_axi00_ptr0_input=(uint8_t *)(h_axi00_ptr0_input1);
h_axi00_ptr0_output=(uint8_t *)(h_axi00_ptr0_output1);

 This seems to only align the h_axi00_ptr0_input1 and h_axi00_ptr0_output1 and since they need to be void* pointers for posix:memalign to align the host pointers to 4kB boundaries, I am not able to directly align the uint8t_t* host pointers which are used throughout my code. Attached below is the error I get.

 

I think this is because when I declare them as uint8_t* pointers, they are assigned addresses which are not necessarily multiples of 4096. I believe the extra memcpy is slowing down the queue until I am eventually not able to find an buffer which should not have been deleted or one which should have been allocated. I am using C, therefore, I have tried to use posix:memalign which requires arguments of the type void**. Trying to recast void * to uint8_t* later on breaks my code since it says it is an invalid conversion. Please do let me know how to move forward.

 

Thanks!

Capture.PNG
0 Kudos
Xilinx Employee
Xilinx Employee
1,418 Views
Registered: ‎03-24-2010

Re: Overlapping host and kernel execution - Kernel created by RTL Kernel Wizard

Jump to solution

That seems correct usage.

From the message, the warning messages decrease.

Please pay attention to other part of code, and other messages like the error.

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
Highlighted
Observer jyerra2
Observer
1,330 Views
Registered: ‎11-01-2017

Re: Overlapping host and kernel execution - Kernel created by RTL Kernel Wizard

Jump to solution

Thank you, it seems to work after some debugging. 

 

The nature of my project is such that it needs to continuously read in data from a socket open on the host code, process it(compute the kernel) and send it back to the host. Would a simple while loop suffice to continuously run the kernel and to achieve what I want to do?

 

Sincerely,

Janish Yerra

0 Kudos
Xilinx Employee
Xilinx Employee
1,313 Views
Registered: ‎03-24-2010

Re: Overlapping host and kernel execution - Kernel created by RTL Kernel Wizard

Jump to solution

https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/host/overlap_ocl would be a good reference.

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
0 Kudos
Xilinx Employee
Xilinx Employee
1,307 Views
Registered: ‎07-18-2014

Re: Overlapping host and kernel execution - Kernel created by RTL Kernel Wizard

Jump to solution

Hi @jyerra2,

 

You can also refer below host code of LZ4 data compression application. 

https://github.com/Xilinx/Applications/blob/master/data_compression/xil_lz4/src/xil_lz4.cpp

 

It send multiple blocks back to back to kernel in overlap way to achieve better throughput.

 

-Heera

0 Kudos
Observer jyerra2
Observer
1,292 Views
Registered: ‎11-01-2017

Re: Overlapping host and kernel execution - Kernel created by RTL Kernel Wizard

Jump to solution

Yes, thank you for the reference materials. I will go through those examples. 


One important requirement for my project is that the kernel remain the same between iterations i.e. it cannot enqueue a new kernel instance every iteration. The ideal scenario would be a single kernel instance maintaining the database it creates on on-chip memory. While overlapping host and kernel execution might help in reducing latency, it is not a requirement since the hardware is pretty fast. 

 

Therefore, I want to make sure that I will be able to run the same kernel instance through multiple iterations while maintaining a database on on-chip memory. Please let me know if that is the case.

 

Thank you both for your assistance!

 

Sincerely,

Janish Yerra

0 Kudos