08-08-2017 10:05 AM
I am using a Zynq MPSoC xczu3eg-1.
On the PL side I have a Datmover IP block which is connected to the HP0 AXI port of the Zynq processing system. The HP0 bus is configured for 64-bit data width and is clocked at 200MHz. When I have the Datamover transfer 64kB of data from DDR, everything works fine except the throughput is much lower than expected. As shown in the ILA waveforms below, I see a continuous pattern of two data beats with 10 idle clock cycles in between. The system's DDR is DDR3 with a 32-bit bus running at 533MHz (DDR3-1066), so it should be able to provide much more throughput than what I am seeing.
Any ideas on what could be limiting the throughput from DDR on the HP0 AXI port?
08-08-2017 02:12 PM
I don't know the Datamover IP, but the trace looks like you are reading one byte at a time.
If that is the case, chances are good that this creates a major bandwidth bottleneck.
Not sure it helps,
08-08-2017 03:33 PM
Thanks for the suggestion, but data is being read 8 bytes at a time as expected. It not possible to see from the screenshot I provided, so I added some annotations to the image. The problem is the large gaps between the pairs of data beats.
08-10-2017 07:19 AM
Would you please share the snapshot of the IPI connection so that we can see the masters and the slaves of this transaction?
And you can expand the axi_interconnect IP and check if function IPs like data width converter or protocol converter are involved.
08-10-2017 12:07 PM
Please see the attached snapshot of the IPI connection.You can see that the interconnect is just a pass through.
The S00_AXI side of the interconnect connects to a custom IP block, which I cannot share here. However, internally the AXI signals are directly connected to an AXI Datamover IP instance for which I have attached the .xci file.
The ILA snapshot shown in my previous post was captured from the System ILA that is shown in the IPI connection image. You can see that this System ILA is connected right at the boundary of the Zynq system. You can also see from the ILA snapshot that the data is being throttled by the RVALID signal which comes directly from the Zynq. Therefore, it seems that it must be something outside of the PL that is causing the throttling and low throughput.
08-10-2017 05:35 PM - edited 08-10-2017 05:36 PM
Are you using the memory interface in the PS or is this 32-bit DDR3 interface configured in the PL?
08-10-2017 07:17 PM
I am using the PS memory interface. The AXI Datamover IP in the PL is connected to the S_AXI_HP0_FPD port of the Zynq system. The S_AXI_HP0_FPD port is clocked at 200MHz and the PS DDR3 memory is running at 533MHz.
08-10-2017 07:23 PM
Thanks @rjbohnert for clarifying. I might not be that much help in this case.
08-11-2017 04:50 AM
Are there heavy AXI transactions via AXI HPC0 port which you enable on this design?
And it is little strange that in ILA snapshot I can see signals named ps8_0_axi_periph....
Most of the time the axi_interconnect IP in this name is used to connect the CPU master and peripheral slaves.
08-11-2017 10:18 AM
I see this issue even when no there are no AXI transactions enabled on the HPC0 port. I even rebuilt the design with the HPC0 port disconnected, just to be sure. I still see the issue.
I am not sure why the ILA signals are named as they are.
Could this be related to the QoS functions that the Zynq MP has? I have tried maxing out the RDQoS value for the HP0 port, but it did not help. Do you have any suggestions on other settings for the DDR controller or Interconnect that I could try adjusting?
04-12-2018 06:30 AM
In my personal opinion, this is quite normal to have a 10-cycle latency when you do single beat access from DRAM. To better utilize the DRAM bandwidth you have to do burst read/write. Usually a larger burst size is better.