I am porting an embedded application from FreeRTOS to Petalinux. The target is a custom made board with a 1G PHY and Zynq-7020 chip. The app generates about 16KB of data and sends it out the PHY to a host computer via fragmented UDP messages repeatedly at a fairly high rate. This all works very well with FreeRTOS and lwIP, achieving up to 800 Mbps sustained throughput. With Petalinux I am doing pretty well in achieving about 400 Mbps (from what I have read on other posts) but I need to squeeze some more speed out of it.
The FPGA builds the data into one of two ping-pong buffers in on-chip-memory and then kicks the ARM CPU. On interrupt, the ARM app reads a register that states what ping-pong buffer is complete, adds some header information to the OCM buffer, and then calls sendto to send the buffer to the host.
I believe the slowdown is at the user-kernel barrier. I have seen posts that suggest using vmsplice/splice from memory to socket via pipes but I have not been able to get this to work. I think the problem is that the user-level code cannot mmap to the OCM memory correctly and the vmsplice call fails giving an EFAULT error (bad address). I have mmap-ed to the OCM memory via /dev/mem and I think that might be one place where I am going wrong.
I have also tried creating a kernel module that ioremaps the OCM memory and provides it to the user application but I have not been able to get the user app to access the OCM buffer properly (the kernel module appears to get the buffer from the OCM driver). I need to be able to modify the buffer in OCM at the user level and then have the buffer sent to a socket without the kernel level generating an extra copy of it (try to get closer to zero-copy).
Is what I am trying to do even possible in Petalinux? Can someone please point me in a direction that might work? I am fairly new to Petalinux so I might be missing something trivial.