08-09-2019 05:09 AM
I am attempting to verify the maximum throughput on QDMA detailed in https://www.xilinx.com/support/answers/71453.html . I am using the example design provided in pg302 V3.0, also indicated in the performance report and the dmaperf tool, (dmautils detailed in the report doesnt seem to exist?). The drivers/tools are from here https://github.com/Xilinx/dma_ip_drivers and vivado 2019.1 to build the design.
I have verified that I can set up a streaming queue and send/recieve data using dma_from_device and dma_to_device. The FPGA is connected via PCIe gen3 x8 (instead of x16 in the report).
However when using dmaperf and the included configs used in the performance report, I am not getting anywhere close to the % throughput shown. I should be expecting a maximum of 7.88GB/s but I am getting ~600MB/s when writing the largest (4096 Byte) packet size and 8 queues. I am yet to verify read as it appears the FPGA is not sending any data back and I am currently looking into this.
Is there anything that would cause such low numbers or can anyone see any mistakes I've made? I've attempted to copy the performance report setup as closely as possible.
Dev Kit : Zynq Ultrascale+ ZCU111 Evaluation Platform
If more information is needed let me know.
08-09-2019 07:24 AM
Could you try with the performance example design? You will need to use a tcl option to generate the design. Please find more details on that in the link below:
Let us know how that goes.
08-12-2019 05:39 AM
Thanks for your response.
I've tried with the performance example designed which resolved my problems with reading from the FPGA too (c2h direction).
I've attached the outputs of dmaperf for read and write when using 1 queue and 4096 transfer size.
I am getting ~3GB/s for Read and ~650GB/s for Write.
I should mention that the fpga is being detected as Gen 3 x4 so the read is around 75% bandwidth which is not far off the % bandwidth shown in the performance report, but the write is still much slower. I see some conflicting information about what mode the driver should be in so right now it is loaded in auto mode. Are there any other values in the configuration registers that could be optimised for write performance? The end goal is Gen3 x16 so I am worried about the rather slow write that doesnt appear to change regardless of PCIe lanes used.
I will do some more testing in the meantime.
09-12-2019 08:08 AM - edited 09-12-2019 08:09 AM
09-12-2019 09:07 AM
Hey for us I believe this came down to debug messages generated from the driver. Once disabled we got full speed in both directions.
Good luck looking into QDMA, the documentation is confusing at best, and the software/drivers are mostly undocumented. Xilinx don't appear to respond very frequently on the forums, I have had several basic questions go unanswered when seeking simple clarification.
09-13-2019 01:51 AM
09-13-2019 01:56 PM