cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
6,439 Views
Registered: ‎04-17-2014

Zynq DDR RAM Performance limitations?

Hello Guys,

 

our design moves data from the fabric via AXI DMA to the DDR RAM. We noticed that the tready line of the DMA IP is low for some time, which leads to fifo full issues in our design.

Do you have ideas how we could improve the performance of the data path?

We already tried to configure the DDR arbiter to allow high priority write from the HP port.

We also noticed that memory access time in the PS is sporadic very high, so it looks like something is blocking the memory access.

We use Vivado 2015.4. Device: XC7Z030-2FFG676I

The FIFO size cannot be further increased due to block ram limitations.

Any ideas?

Thanks. :)

0 Kudos
8 Replies
Highlighted
Teacher
Teacher
6,407 Views
Registered: ‎03-31-2012

what is the average ddr bandwidth you need?
one way to get more bandwidth is to use two HP ports (as PL as 2 ports to the DDRC) and use them with 64 bit width. Are you doing that?
- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
Highlighted
Visitor
Visitor
6,386 Views
Registered: ‎04-17-2014

Hi muzaffer,

 

thanks for reply.

We only need about 250MByte/s of bandwidth.

The width is 64 bit. We also use different HP ports.

The data sources are two CSI-2 Cameras, each one is connected to its own HP Port.

The other two HP Ports are connected to a Xilinx 10G Subsystem, which is the data sink.

So the DDR arbitration is between all 4 HP ports -> which is probably reducing the performance.

 

Any other hints would be very appriciated.. :)

 

 

Highlighted
Teacher
Teacher
6,345 Views
Registered: ‎03-31-2012

@amoser_bplus is 250 MB/s the total for 2 cameras + the 10Gb link? That's 2Gb of bandwidth. Even at 16 bit DDR3 you have 17Gb and even if you got half of that for PL, it should be plenty. What is the burst rate you are using? Try to increase it if you are generating very small bursts. You can also try to play with the QOS controls on HPx ports to control how the embedded interconnect between PS & PL behaves.

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
Highlighted
Visitor
Visitor
6,341 Views
Registered: ‎04-17-2014

Hi muzaffer,

 

thanks for the reply.

250 MB/s is only for the 2 cameras. The bandwidth of the 10Gb link will be probably the same.

We have attached a DDR3 ram with 32 bit. The max burst size is already set to 256.

We already tried the QOS controls and increased the write priority for the HP ports where the cameras are attached.

 

How about the DDR ARB bypass port?

Maybe we could use this port to handle urgent transfers when our FIFOs in the fabric are almost full?

 

I think that there ist something within the PS which blocks the ram access from time to time.

The DDR memory controller is arbitrated between the 4 HP port, L2 cache and central interconnect.

Can we somehow debug which one is the most active?

 

 

0 Kudos
Highlighted
Teacher
Teacher
6,338 Views
Registered: ‎03-31-2012

I think your application might benefit from address reordering. Review the following two documents: https://www.xilinx.com/Attachment/Zynq_DDRC_Addressing.pdf
https://www.xilinx.com/support/documentation/application_notes/xapp792-high-performance-video-zynq.pdf
and search for reorder in the second document (this one is also a good indication of what can be accomplished).

 

I am assuming you read section 5.2.2 of ug585 specifically the Rationale portion.

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
Highlighted
Visitor
Visitor
6,320 Views
Registered: ‎04-17-2014

Hi muzaffer,

 

thanks for your reply, again.

Your hint regarding address reordering is very interesting...

I will check that on our system.

As i understand it from the documents we need to do the following to implement address reordering:

- insert an ip core at the memory mapped side of the axi dma which will handle the address reordering

- software must be able to handle the address reordering

 

The aim is to ensure that each frame buffer is within one DDR bank to avoid overhead.

am i right?

 

 

 

0 Kudos
Highlighted
Teacher
Teacher
6,314 Views
Registered: ‎03-31-2012

Basically yes. Opening new banks is quite expensive in DDR.

 

But I'd suggest you try to balance out PS DDR accesses through the QOS block first as it's easier to try.

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
Highlighted
Observer
Observer
2,135 Views
Registered: ‎01-19-2018

I know this is an old thread, but I found it while researching ddr issues on zynq and found the linked paper about reordering quite interesting. The bandwidth they achieve in that paper is quite impressive, and made me realize the requirements for the design I'm working on are not as close to the limit as I feared.

 

I'm a little confused by your last post though - isn't the issue that opening a new *row* is expensive?  I thought that accessing a different bank could be quite cheap, *if* that bank already has the row that you want open.  I believe the point of the reordering in the paper is that with the way their data is coming in, row changes happen in parallel on multiple banks, instead of multiple row changes happening sequentially on a single bank.  Is that correct?

 

Thanks!

0 Kudos