I'm testing GPU performance of ZU5EV and I use DRM as GPU input/output buffer.
I found that copying data from DRM to heap(about 0.14GB/s) is much slower than copying data from heap to DRM(2.02GB/s). The test code is attached.
My question is why data copying from DRM is so slow and how can I speed up it?
For arm64 architectures, the Xilinx DRM Driver sets vm_page_prot to "Write Combine" when mmap() the GEM buffer.
If vm_page_prot is set to "Write Combine", the CPU data cache will not be used. As a result, access to the buffer is slower (but sequential write to the buffer is faster).
Unfortunately, the Xilinx DRM Driver needs to be modified to speed up buffer access.
This fix is not easy. I look forward to Xilinx's future support.