cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
803 Views
Registered: ‎08-09-2019

QDMA Performance Verification

Hi All,

I am attempting to verify the maximum throughput on QDMA detailed in https://www.xilinx.com/support/answers/71453.html . I am using the example design provided in pg302 V3.0, also indicated in the performance report and the dmaperf tool, (dmautils detailed in the report doesnt seem to exist?). The drivers/tools are from here https://github.com/Xilinx/dma_ip_drivers and vivado 2019.1 to build the design.

I have verified that I can set up a streaming queue and send/recieve data using dma_from_device and dma_to_device. The FPGA is connected via PCIe gen3 x8 (instead of x16 in the report).

However when using dmaperf and the included configs used in the performance report, I am not getting anywhere close to the % throughput shown. I should be expecting a maximum of 7.88GB/s but I am getting ~600MB/s when writing the largest (4096 Byte) packet size and 8 queues. I am yet to verify read as it appears the FPGA is not sending any data back and I am currently looking into this.

Is there anything that would cause such low numbers or can anyone see any mistakes I've made? I've attempted to copy the performance report setup as closely as possible. 

Dev Kit : Zynq Ultrascale+ ZCU111 Evaluation Platform

If more information is needed let me know.

Thanks,

Jas

Tags (2)
0 Kudos
6 Replies
Highlighted
Xilinx Employee
Xilinx Employee
784 Views
Registered: ‎08-06-2008

Could you try with the performance example design? You will need to use a tcl option to generate the design. Please find more details on that in the link below:

https://www.xilinx.com/support/answers/72352.html

Let us know how that goes.

Thanks.

0 Kudos
Highlighted
Visitor
Visitor
732 Views
Registered: ‎08-09-2019

Thanks for your response.

I've tried with the performance example designed which resolved my problems with reading from the FPGA too (c2h direction).

I've attached the outputs of dmaperf for read and write when using 1 queue and 4096 transfer size.

I am getting ~3GB/s for Read and ~650GB/s for Write.  

I should mention that the fpga is being detected as Gen 3 x4 so the read is around 75% bandwidth which is not far off the % bandwidth shown in the performance report, but the write is still much slower. I see some conflicting information about what mode the driver should be in so right now it is loaded in auto mode. Are there any other values in the configuration registers that could be optimised for write performance? The end goal is Gen3 x16 so I am worried about the rather slow write that doesnt appear to change regardless of PCIe lanes used.

I will do some more testing in the meantime.

0 Kudos
Highlighted
Observer
Observer
644 Views
Registered: ‎03-01-2017

Hi @jasvinderm,

Any progress ? Have you found the issue with slow write?

We are also looking into QDMA design with Gen 3 x4.

Br,

Nikola

0 Kudos
Highlighted
Visitor
Visitor
636 Views
Registered: ‎08-09-2019

@nikola.stojkov

Hey for us I believe this came down to debug messages generated from the driver. Once disabled we got full speed in both directions.

Good luck looking into QDMA, the documentation is confusing at best, and the software/drivers are mostly undocumented. Xilinx don't appear to respond very frequently on the forums, I have had several basic questions go unanswered when seeking simple clarification.

0 Kudos
Highlighted
Observer
Observer
615 Views
Registered: ‎03-01-2017

Hi @jasvinderm 

Thanks for the reply.

Yeah, information about QDMA is very thin in general. I hope you folks are going to shed light on some issues in future :)

Best Regards,

Nikola

0 Kudos
Highlighted
Visitor
Visitor
604 Views
Registered: ‎09-13-2019

Nikola - we are just starting to checkout the QDMA module as well. Please post any updates to your work or information you gather. We will do the same.