UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor akhvostov
Visitor
1,036 Views
Registered: ‎11-28-2017

Very slow PS DDR performance on ZCU102

Hello!
I am developing project which needs high-speed memcopy (more than 2GBytes/s). Getting into trouble (~100MB/s for AXIDMA transfer to driver's kernelspace + memcpy to userspace), I wrote simple memcpy benchmark and got only 1GB/s for pure memcpy! even "time dd if=/dev/mem of=/dev/zero bs=1M count=100" gives 2.4GB/s.
The board has DDR4-2133 module which gives 17GB/s of raw performance.
I have created standard vivado 2017.4 block design for this board, not changed memory settings (ust verified that they are correct), then in petalinux chanted nothing related to ddr4 in what it produces form hdf.
Please clear me where I loose ~10 times of memory speed?

0 Kudos
7 Replies
Visitor akhvostov
Visitor
969 Views
Registered: ‎11-28-2017

Re: Very slow PS DDR performance on ZCU102

anyone?
:(
Guys who have zynq us+ with ddr4 dimm for PS, just post please your speed of " dd if=/dev/mem of=/dev/zero bs=1M count=100" here, even this could help a bit
0 Kudos
Teacher xilinxacct
Teacher
960 Views
Registered: ‎10-23-2018

Re: Very slow PS DDR performance on ZCU102

@akhvostov

I have an Ultra96 (so not the same chip), but my dd command reports 1.1 GB/s

Hope that helps

If so, mark as solution accepted. Kudos also welcomed. :-)

Highlighted
Scholar jg_bds
Scholar
953 Views
Registered: ‎02-01-2013

Re: Very slow PS DDR performance on ZCU102

 

The maximum rate of DDR (e.g., 17 GB/s) is only guaranteed for 64 bytes at a time. That's the amount of data that can be transferred across a DDR3/4 memory interface in the most basic transaction--the simple-unit burst access. How long you can sustain that rate is up to you. You cannot haphazardly access DDR memory and expect a throughput like that. If you're trying to use a CPU to move data out of one spot of DDR memory, and then write it into another spot, you can give-up getting anywhere close to the maximum throughput. 

The key to maximizing bandwidth is maintaining continuous flows of data into or out-of the memory. Once you stop a flow, and then change course, you pay a throughput penalty. These penalties add-up quickly when you try to move small groups of data using a CPU. Instead, use DMA to burst data from a source DDR memory location to OCM, and then from OCM to a destination DDR memory location. You'll see a substantial increase in throughput.

-Joe G.

 

Visitor akhvostov
Visitor
923 Views
Registered: ‎11-28-2017

Re: Very slow PS DDR performance on ZCU102

@xilinxacct

Thank you very much.
Your board has PS DDR interface in x32 configuration that is half of that for so-dimm module, and you got ~2 times slower dd.
This shows the same situation as what I have.

0 Kudos
Visitor akhvostov
Visitor
912 Views
Registered: ‎11-28-2017

Re: Very slow PS DDR performance on ZCU102

@jg_bds
Thank you! I didn't think this way. Could you please point me to some good example of doing such transfer?

0 Kudos
Scholar jg_bds
Scholar
900 Views
Registered: ‎02-01-2013

Re: Very slow PS DDR performance on ZCU102

 

It sounds like you're looking for a software-centric solution. I'm a hardware guy by trade, so I'm not sure where you can find such information.

This looks to be a decent tutorial on using AXI DMA:

     http://www.fpgadeveloper.com/2014/08/using-the-axi-dma-in-vivado.html

The goal in the tutorial should be transferable to using GDMA in the PSU, but the tutorial does use an AXI DMA IP in the PL. Most people who target Zynq/Zynq MPSoC want to move data between the PL and PSU, so it seems most information available deals with using AXI DMA. An AXI DMA IP can still move data from PSU DDR to PSU DDR, though.

Best of luck.

-Joe G.

Visitor akhvostov
Visitor
895 Views
Registered: ‎11-28-2017

Re: Very slow PS DDR performance on ZCU102

@jg_bds

Thank you!
Yes, I'm looking for some SW-centric
I have read many topics related to AXI DMA, and what is under your link too.

I have already implemented working AXI DMA and got 1GB/s data to linux'kernel space through driver.

The problem is that the contiguous-memory-allocation block is restricted in size (at more than ~1GB the petalinux fails to biuld) therefore i need to move data to userspace during data transmission.

p.s. yes, i have back-up plan to restrict memory available to linux and manage the rest manually but currently want the nice driver-based solutoin

0 Kudos