10-18-2017 03:38 PM
I have done tests to determine what the maximum data transfer rate is between the PL and the PS DDR memory on a custom XCZU2 board. The PL section runs at 150MHz and implements a write-only AXI master that issues bursts of 16 * 128-bit words to the S_AXI_HPn_FPD slave interface on the Zynq. The master writes a total of 4MByte of data (sequentially) to an area in the DDR memory. It simply attempts to issue a new burst as soon as it has completed the previous burst - it does not wait for the bvalid responses. I have played around with the burst lengths, but the transfer rate maxes out at 266.6MB/s. I attach an ILA capture of the AXIS bus for one case with a burst length of 16. It is clear that initially the writes occur at the maximum rate, with no back-pressure, but when the FIFOs in the AXI slave fill up, back-pressure is applied and the write rate is throttled to about one write every 9 clock cycles.
My question is, is this the maximum attainable data rate from the PL to the PS DDR? It almost looks like the data transfer rate is limited to this value "by design" - I see no fluctuation in the number of clock cycles required to complete the transfer. Each run uses exactly the same number of cycles. Is there something I can do to increase this rate on the PS side? Is this a limit per AXI slave interface and would I be able to get a higher rate by using two interfaces simultaneously?
The DDR memory consists of 2 x 16-bit DDR4 devices running at 1067MHz clock. The theoretical memory bandwidth is therefore 2*16*2*1067=68.3gbps, or about 8.5GByte/s, and it feels to me that 266.6MB/s is a relatively small percentage of the maximum - even when accounting for a realistic maximum.
10-23-2017 03:02 AM
DDR bandwidth from PL is a function of many things, some are:
1. AXI bus data width.
2. Frequency of AXI interface, You can try increasing this.
3. Port type of PS
4. State of other PS peripherals and OS
5. No. of AXI PL ports used
There are multiple designs available which demonstrates higher bandwidth.
10-23-2017 06:50 AM
Thanks for the reply! I realize that there are many factors affecting the maximum rate. Form the PL side I think I understand most of them. To address the list you provided:
1. My AXI bus width is set to 128-bits (the maximum)
2. The AXI clock frequency is set to 150MHz (thus allowing a theoretical max transfer rate of 2400MB/s)
3. I use the HP port, which is supposed to be good for high bandwidth applications.
4. This is the one aspect I understand the least. While doing the test, I was running U-boot. I am pretty sure the DDR memory is correctly configured. I can boot into Linux and repeat the test there and I get the same results. I have no other major applications running and do not make use of any other high speed peripherals during the test - the system is affectively idle. I have not done any special configuration of the AXI interconnect fabric (for instance the QoS settings in the AFIFM module).
5. I am using a single port (S_AXI_HP1_FPD), and no other ports are active.
It is interesting to note that the maximum speed I get is exactly 1/32 of the theoretical maximum DDR bandwidth (DDR clock speed is 1066MHz and physical data width is 32 bits - thus 1066*2*4/266.6 = 32). Such ratios usually don't occur at random... Is there any reason I should see this? I have been looking through U1085 but have not come across anything related, but there is a lot of documentation to go through, so I might have missed something.