12-17-2015 07:46 AM
I would like to know the effective sequencial write bandwidth of a 64b DDR4 connected on PS part of a Zynq Ultrascale Plus.
In other words what is the data bus efficiency for sequencial write only operation.
Furthermore, what is the maximum bandwidth from PL to the DDR4 in the PS.
I saw that AXI interfaces where 128b, can I use/agregate 2 of them to obtain better bandwidth?
12-17-2015 08:07 AM
From the overview:
"Twelve dedicated AXI 32-bit, 64-bit, or 128-bit ports connect the PL to high-speed interconnect and DDR
in the PS via a FIFO interface."
And, from the data sheet:
"FAXICLK Maximum AXI interface performance. – 333 MHz"
"FAPUMAX Maximum APU clock frequency. 1500-3, 1200-1 MHz"
More details on page 20 of ds891 regarding the 12 AXI interfaces.
And, the DDR4max is 2400 Mb/s for the -3 (32 bit mode only).
The devil is in the details, of course.
As you know, DDR memories are not 100% efficient for anything, as there are refresh cycles, and modes to burst read/write various length blocks at a time to improve efficiency (get closer to the DDR4max).
And, what the four processors are doing (other than the one trying to write), along with any other devices contending for bandwidth.
For a reasonable estimate, I would say the slowest operation (DDR4 write) is based on the DDR4 memory with a 32 bit bus (not the Zynq MPSoC), and divide that in half.
One can add a new, wider DDR4 controller in the programmable logic, and get much higher bandwidth by a much wider memory bus if the built in DDR4 controller is not fast enough.
12-17-2015 12:24 PM
Thank you for your quick answer.
I do not plan to use any APU or RTPU. I use a zynq to benefit from lower power consumption of hardened function.
That is why I plan tu use DDR4 64b controller in the PS instead of MIG instance in the PL (There I know that I can go up to 80b at 2666Mbps).
Thank you for extracting the FAXICLK from the datasheet, I was assuming something around 600MHz, so the bottle neck may not be the DDR controller.
I need 120 to 140Gbps write bandwidth from PL to the DDR.
Maximum DDR4 64b write bandwidth is 153.6Gbps so I need 79 to 92% efficiency.
Is it something achievable in the PS?
12-17-2015 01:00 PM
The MIG in PL has a hardened part, and a soft logic part, so maybe that is still the best way to go?
12-18-2015 12:26 AM
From the power consumption point: DDR controler in the PS consume 50% less than MIG in the PL.
That is why I consider this solution (instead of Kintex Ultrascale+ based solution).
Any change to get figures of PS DDR4 bandwidth on the evaluation board xilinx have?
12-18-2015 06:51 AM
In the PL you can go wider than 32 bits. 2667Mbs in US+ -3.
12-18-2015 07:32 AM
I am interrested in the PS DDR controller.
As you may know it support DDR4, 64 bits, at 2400MHz.
Let me refocus the discution my original question:
What is the effective write only bandwidth of DDR4 memory using PS controller?
12-18-2015 07:48 AM
I thought I read that it is only 32 bits wide. Go check.
Write efficiency seems to be alightly less than 90% at best from what I have also read.
As all this is in the documentation, I encourage you to go read them.
What isn't in the documentation is what else is going on that will take away from the total bandwidth.
As you have stated you will not be using the processors, I wonder how the DDR4 controller can be used at all: it requires setup and control from a processor, and at least enough software code to allow it to be used with the programmable logic.
If all you wanted was to get dat to and from the PL, I would not even use a MPSoC Zynq, but rather use a Virtex US+ device. Not taking advantage of the processors means a great deal of wasted power, so an all logic device is perhaps a better (lower power) choice, and bandwidth is guaranteed (you will get exactly what you designed).
12-18-2015 10:12 AM
Thank you for the answer.
DDR4 bus width is 32b or 64b according to DS891. 32b limitation is for LPDDR4.
Can you tell me where to find the 90% efficiency for PS controller. The only figure I found was for ultrascale MIS in PG150 and was 100% for sequencial write in the text and 89% in table 2-1.
APU and RTPU are not used at all. DDR controller is initialised by the CSU during the FSBL.
VUS+ and KUS+ where considered but are not selected due to planning.
ZUS+ also give the low power advantage of hard PCIe controller with integrated DMA.
12-21-2015 04:40 AM - edited 12-21-2015 04:40 AM
virtex US+ are not shipping right?
It seems that no ultrascale + device with PL PCIe is shipping or sampling, I bet this is due to PCIe GEN 4.. so we may have to wait and wait for first U+ Virtex or Kintex devices..
12-21-2015 08:20 AM
Contact your Xilinx Distributor for engineering sample availability.
12-21-2015 10:10 AM
what you mean MIG has "hardened" part in PL, is there is something MORE than before?
I think only some phaser and fifo and some other "small bits and pieces" are in PL that are used by MIG, but there is no hardened DDR4 IP?
At least I am failing to find any info about this, and asfaik there are no new things in ZU+ for the PL MIG
05-09-2016 12:21 AM
given that it's more than 5 months since asking this question, I'd like to ask whether there are any real performance tests + results for the case of the PS part (random access) sharing the single DDR4 RAM with the PL (buffered serial stream).
I understand, that it can be highly application specific, but I'm curious especially about the rates achievable when using the PL part, since most of the boards are produced with DDRs connected only PS part.
07-25-2017 04:29 AM
I'm also interested to know if there's any figure regarding typical efficiency of the DDR4 controller
Say for an application with a few video streams write/read.
07-25-2017 10:17 AM
Consult the documentation (yes, all the thousands of pages of it). Once you understand the device, and the options, and choose how to configure your system you may begin to look at performance. There is no simple answer to ANY SoC related question of performance, regardless of the vendor.
It is the ONE issue that gets an immense amount of resources thrown at it, and often is the failure of a project to deliver, as the device or architecture cannot deliver the desired performance.
Xilinx has addressed this issue with a performance testing/estimation/simulation technology:
Our methodology allows you to benchmark to complete system WITHOUT YOUR CODE OR HARDWARE by using traffic sources and sinks, performance monitors, and choices of interfaces and flows.
The is unique - no one else has these capabilities.
09-13-2017 04:31 PM
Here's a link to the ZCU102 Eval Kit homepage:
If you go to the "Documents and Designs" tab here:
You'll see all the prebuilt examples and they should have what you're looking for.
09-15-2017 01:40 PM
02-07-2018 05:53 AM
I need to find the Zynq UltraScale+ DDR4 effective bandwidth from PL as well, BUT nobody gives a clear answer.
In my design I have the following block design (I use AXI Stream and AXI Data Movers) and in every read/write stream I need 140 pixels * 170 pixels * 17 images * (8 bytes). Every pixel is a double number 64bits. I use 6 HP ports 128 bits (4 HP ports + HPC ports) without cache coherency.
My design works perfectly but i need to know the theoretical bandwidth to compare my throughput!
How much is the theoretical effective bandwidth of DDR4 533MHz?????
The target device is xczu9eg-ffvc900-1-i-es1