10-25-2016 08:42 AM
our design moves data from the fabric via AXI DMA to the DDR RAM. We noticed that the tready line of the DMA IP is low for some time, which leads to fifo full issues in our design.
Do you have ideas how we could improve the performance of the data path?
We already tried to configure the DDR arbiter to allow high priority write from the HP port.
We also noticed that memory access time in the PS is sporadic very high, so it looks like something is blocking the memory access.
We use Vivado 2015.4. Device: XC7Z030-2FFG676I
The FIFO size cannot be further increased due to block ram limitations.
10-25-2016 07:30 PM
10-26-2016 12:12 AM
thanks for reply.
We only need about 250MByte/s of bandwidth.
The width is 64 bit. We also use different HP ports.
The data sources are two CSI-2 Cameras, each one is connected to its own HP Port.
The other two HP Ports are connected to a Xilinx 10G Subsystem, which is the data sink.
So the DDR arbitration is between all 4 HP ports -> which is probably reducing the performance.
Any other hints would be very appriciated.. :)
10-26-2016 10:50 PM
@amoser_bplus is 250 MB/s the total for 2 cameras + the 10Gb link? That's 2Gb of bandwidth. Even at 16 bit DDR3 you have 17Gb and even if you got half of that for PL, it should be plenty. What is the burst rate you are using? Try to increase it if you are generating very small bursts. You can also try to play with the QOS controls on HPx ports to control how the embedded interconnect between PS & PL behaves.
10-27-2016 12:02 AM
thanks for the reply.
250 MB/s is only for the 2 cameras. The bandwidth of the 10Gb link will be probably the same.
We have attached a DDR3 ram with 32 bit. The max burst size is already set to 256.
We already tried the QOS controls and increased the write priority for the HP ports where the cameras are attached.
How about the DDR ARB bypass port?
Maybe we could use this port to handle urgent transfers when our FIFOs in the fabric are almost full?
I think that there ist something within the PS which blocks the ram access from time to time.
The DDR memory controller is arbitrated between the 4 HP port, L2 cache and central interconnect.
Can we somehow debug which one is the most active?
10-27-2016 12:27 AM - edited 10-27-2016 12:39 AM
I think your application might benefit from address reordering. Review the following two documents: https://www.xilinx.com/Attachment/Zynq_DDRC_Addressing.pdf
and search for reorder in the second document (this one is also a good indication of what can be accomplished).
I am assuming you read section 5.2.2 of ug585 specifically the Rationale portion.
10-27-2016 07:36 AM
thanks for your reply, again.
Your hint regarding address reordering is very interesting...
I will check that on our system.
As i understand it from the documents we need to do the following to implement address reordering:
- insert an ip core at the memory mapped side of the axi dma which will handle the address reordering
- software must be able to handle the address reordering
The aim is to ensure that each frame buffer is within one DDR bank to avoid overhead.
am i right?
10-27-2016 10:06 AM
Basically yes. Opening new banks is quite expensive in DDR.
But I'd suggest you try to balance out PS DDR accesses through the QOS block first as it's easier to try.
06-08-2018 09:42 AM
I know this is an old thread, but I found it while researching ddr issues on zynq and found the linked paper about reordering quite interesting. The bandwidth they achieve in that paper is quite impressive, and made me realize the requirements for the design I'm working on are not as close to the limit as I feared.
I'm a little confused by your last post though - isn't the issue that opening a new *row* is expensive? I thought that accessing a different bank could be quite cheap, *if* that bank already has the row that you want open. I believe the point of the reordering in the paper is that with the way their data is coming in, row changes happen in parallel on multiple banks, instead of multiple row changes happening sequentially on a single bank. Is that correct?