UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer h0-h0
Observer
3,601 Views
Registered: ‎05-10-2017

DMA Subsystem throughput using descriptor bypass

Hello all,

 

I am using the DMA Subsystem (v2.0) with Vivado 2016.2 and PCIe 3.0 x8, 256-bit AXI-Stream @250MHz and host PC running CentOS 6.9, kernel 2.6.32. My design is using the descriptor bypass interface to send data straight to a PC host DDR4 memory. I allocated the contiguous memory for DMA using the mem="XX" boot parameter at known physical address so I can set that address as the starting destination address on the FPGA (VCU108 eval board). The data is continuous ADC data at 2GB/s. I would need about 200 GB of data per transfer so that is why I use the mem boot parameter.

 

What I am seeing is for the first 16 us, the transmit is going fine. But after that, the s_axis_rq_tready started going low for long period (~300 ns) before going high again. The throughput went way down, at less than 100 MB/s if I calculated it correctly.

 

Now I have searched the forums and found the following thread: https://forums.xilinx.com/t5/PCI-Express/Throughput-issues-with-DMA-and-tx-buf-av-in-7-Series-Integrated/m-p/383997/highlight/true#M4854

@markzak stated that this is likely the link partner poor response time causing transmission stalls. Basically, I run out of transmit credit? The only solution he suggested is using a huge buffer to buffer through the stall.

 

Is this the only way to workaround this issue? The DMA Subsystem does not provide interface to the flow control interface so I cannot check to make sure that was the issue.

slowdown.png
0 Kudos
5 Replies
Highlighted
Explorer
Explorer
3,548 Views
Registered: ‎12-01-2010

Re: DMA Subsystem throughput using descriptor bypass

Over time, i have realized that the motherboard itself is mostly to blame for this type of behavior.  Some chip-sets & manufacturers will show these issues, and worse.  Yet placing the exact same card into another system get you flawless performance.  It's very strange, and has forced us to try numerous different motherboard/processor combinations until we found one that worked consistently.  ASUS seems to be one of the better ones.

Also, i would highly recommend that you evenly fill all DDR memory slots on the motherboard equally.  The memory controller throughput suffers greatly if all ranks & channels are not populated.

0 Kudos
Observer h0-h0
Observer
3,533 Views
Registered: ‎05-10-2017

Re: DMA Subsystem throughput using descriptor bypass

I'm using the ASUS X99-E-10G WS motherboard. It has 2 PEX 8747 PCIe switches. I make sure to connect the FPGA to the slot that is in different switch from the video card but it could be part of the problem. What motherboard end up working the best for you?

 

As for filling out DDR memory slot equally, what do you mean by that? I have all 8 slots filled in the motherboard, 32GB each. Do you mean I should split my DMA target address so that it fill different block of RAM?

0 Kudos
Xilinx Employee
Xilinx Employee
3,502 Views
Registered: ‎05-07-2015

Re: DMA Subsystem throughput using descriptor bypass

HI @markzak

 

Could you tell us the mother board you were  using before that you think affected  the throughput severely ?
Doers changing to a  host machine with a different mother board  alone solved your low throughput issue?

Thanks
Bharath
--------------------------------------------------​--------------------------------------------
Please mark the Answer as "Accept as solution" if information provided addresses your query/concern.
Give Kudos to a post which you think is helpful.
--------------------------------------------------​-------------------------------------------
0 Kudos
Explorer
Explorer
3,375 Views
Registered: ‎12-01-2010

Re: DMA Subsystem throughput using descriptor bypass

The only server motherboard that i found that works at extremely high PCIe bandwidth (>4GB/s) was the ASUS Z9PA-U8. Unfortunately it's DDR3, and hence this was obviously a few years back.

 

Last year, we attempted to build up a new DDR4 system using a SuperMicro MBD-X10SRA-O ATX Server Motherboard.  That FAILED.  I then attempted to do the same with the next generation of the exact same ASUS motherboard, the Z10PA-U8.  Surprisingly, that also FAILED.

 

So i went back to Amazon, ordered another ASUS Z9PA-U8, built-up another DDR3 system, and that worked flawlessly.  All three motherboards were tried with exactly the same Xilinx PCIe card, same FPGA firmware, etc.  There's got to be a deeper root cause somewhere, but i don't have the $200K to purchase a PCIe bus analyzer.

 

As far as memory, I was specifically referring to physically populating all DDR slots on the MB.  It sounds like you are doing that correctly. 

 

 

0 Kudos
Adventurer
Adventurer
1,983 Views
Registered: ‎11-24-2017

Re: DMA Subsystem throughput using descriptor bypass

Hello @h0-h0,

 

I have quite similar setup: DMA Subsystem for PCIe (v4.0), 256-bit AXI-Stream with descriptor bypass interface. Unlike you, I would need to send data to another FPGA board (using Bridge Subsystem for PCIe and DDR4 memory for data storage).

Since I'm newbie in PCI express world, I have difficulties to start with the design. Xilinx design example (dma_stream0 test, page 97 of PG195) gives me ERROR (---***ERROR*** C2H Transfer Data MISMATCH ---) after simulation is finished. This boosts my confusion.

 

Would you be so kind to share your project design with me or at least to give me some hints on how to start ?

 

Thanks in advance for your time and effort. Really appreciate it.

 

Sincerely,

Bojan.

0 Kudos