02-06-2021 02:30 PM - edited 02-06-2021 02:41 PM
I have written an application that uses GMIO at the top-level simulation::platform, and I have implemented an in-graph PL kernel to replicate and move data between AIE kernels. The PL kernel uses HLS streams, and the AIE kernels use windows. An earlier version of the application ran flawlessly in x86simulator when using streams or windows exclusively (and AIE tiles only). The introduction of a stream based PL kernel in between the window based AIE kernels has apparently exposed an x86simulator bug related to CDNOx86Sim::Window2StreamsAdapter. This is how the simulation crashes:
Thread 9 "sim.out" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x155553158700 (LWP 1736844)]
0x00001555551a952a in CDNOx86Sim::Window2StreamsAdapter::execute() ()
#0 0x00001555551a952a in CDNOx86Sim::Window2StreamsAdapter::execute() ()
#1 0x0000155554affcbf in std::execute_native_thread_routine (__p=0x6e4b80)
#2 0x00001555554eb609 in start_thread (arg=<optimized out>)
#3 0x0000155554610293 in clone ()
Before posting any code, I need to recreate the issue in a non-proprietary test application. However, there is always a chance that the issue will not manifest itself in a simplified system.
02-08-2021 04:17 AM
Unfortunately x86simulation is not in the same state as aiesimulation. This should be improved in 2021.1.
If you can send a test case that would be great so we can make sure this will be fixed in 2021.1.
I will also try to reproduce on my side.
02-08-2021 07:52 AM - edited 02-08-2021 05:00 PM
@florentw I tried and was unable to reproduce the error in a simplified application. One thing that's unclear is if window sizes need to be adjusted to be a multiple of the stream width. That is, if an HLS stream reads/writes 32-, 64- or 128-bit wide data, does the sending/receiving window for the connected AIE kernel need to be a multiple of 4, 8 or 16 bytes, respectively, or will this automatically be handled by the compiler? For example, let's say that a 128-bit PL data mover is feeding an AIE kernel with a smaller window that is not a multiple of 128 bits;
// Internally, pl_kernel moves 64 ap_int<128> values = 1024 bytes void pl_kernel(const ap_int<128>* mem, dir::out<hls::stream<ap_axis<128, 0, 0, 0>>&> s); // The window expects 510 int16 values = 1020 bytes connect<stream, window<510 * sizeof(int16)>>(pl_kernel, aie_kernel);
In my implementation, I added padding with additional window_read/write calls to consume/produce the extra bytes just to be on the safe side, but I don't know if this was unnecessary. The compiler sees that a 128-bit HLS stream is connected to a 1020 byte window, but it can't possibly know how much data will flow from the stream to the window. My assumption was that I need to handle any extra data in the window based AIE kernel so that the stream based PL kernel does not stall trying to push the last few bytes to my window. Therefore, I must process 510 int16 values and discard 2 int16 values per invocation, and my window size needs to be 512 * sizeof(int16) instead of 510. Is this correct?
02-09-2021 08:03 AM
I do not know how this is handled. I am not sure the compiler is taking care of this. Let me check internally
02-10-2021 08:53 AM
I confirmed that the compiler is not taking care of adapting the data to fit the window.
If you are sending too much data, then the extra data should be written on the pong buffer.
02-22-2021 02:02 AM
You do not need to create a SR. We can do the debug through the forums (at least share the updates) so next user facing the issue can see the outcome. When your test case is ready, just let me know and I will send you an EZmove link so you can upload your project in private.
02-22-2021 07:46 AM
@florentw Sorry, I should have explained a bit more. The SR was already open for another issue, and the test case demonstrates both issues. In any case, you're welcome to post the outcome here for the x86 crash.
02-25-2021 03:47 AM
I can reproduce a segmentation fault with the test case you sent and I reported this to the development team
03-03-2021 07:52 PM
@florentw While debugging another issue, it became apparent that windows between AIE kernels must be at least 32 bytes, but this requirement was excluded from UG1076 for Vitis 2020.2 dated 11/24/2020, whereas it was stated in the 2020.1 version. Indeed, my example contained windows smaller than 32 bytes, and after I increased the window size, x86simulator stopped crashing. Of course, crashing is not a good way to handle this, and I think the compiler should at the very least refuse to compile the ADF graph if the window size is too small or not a multiple of 16 bytes, which is yet another requirement excluded from the latest UG1076.
03-04-2021 01:14 AM
Thanks for the update.
Yes I did not investigate too much on the test case but this should not fail with a segmentation fault.
I had an internal discussion. I do not think this information should have been removed from the UG1076. We will work for adding them back in 2020.2 with any other limitation