06-08-2021 09:49 AM
I'm implementing a DDR4 interface using Vivado 2020.1 to generate a 32 bit DDR interface (2 off 16 bit 256M Micron parts) running at 1066MHz; see attached screenshot. All settings not shown are in their "default" state.
I have the basic design working in my target device (KintexUltra xcku025-ffva1156-1-c) but it's not achieving the throughput that I'm expecting.
On closer inspection of the simulated example design I can see that there are gaps between bursts on the DDR physical interface, even though successive accesses are to incrementing addresses in the same bank and of the same type (ie read or write); (see attached screenshot). In addition the IP keeps de-asserting c0_ddr4_app_en indicating that it is busy, but it only does this after it appears to have filled up it's internal buffers with the outstanding accesses.
I suspect that either
Any suggestions gratefully received.
06-15-2021 06:25 AM
Can you configure the address map to ROW COLUMN BANK and see if any improvement. pg150 chapter 4: performance, provides more detailed information.
06-15-2021 09:32 AM
Thank you very much for your response; see attached screenshots of the simulation which does indeed show some improvement. The one where co_ddr4_act_n is asserted once is my original BankRowColumn setup, where the one with c0_ddr4_act_n asserted for every transaction is the RowColumnBank version.
However it is still not achieving full bandwidth (ie there are still gaps between each DDR4 access), and it's completely counter-intuitive that having to Activate each Bank prior to use is quicker than using an already Activated Bank and streaming continuously to the same Column.
Is there any way to achieve the full bandwidth? (or is this a design limitation of the IP and/or DDR4)?
06-15-2021 07:40 PM
Glad that provided information has improved the bandwidth.
MIG has four group FSMs, to the hit maximum efficiency user logic should utilize all the group FSMs. pg150 chapter 4: performance, provides more detailed information.
06-16-2021 02:41 AM
I've re-read the section on Performance in chapter 4 and I think I understand how the group FSMs affect throughput, but I'm struggling to see how I have any control over the mapping between ap_addr and the FSMs. Is the expectation that I must re-order the ap_addr (ie some form of logical to physical address mapping) to match the group FSMs for any given memory configuration? ie Do I need to make some changes to ap_addr because I’m using a 32 bit physical DDR interface?
If not, then my interpretation of Table 4-83 when using the ROW_COLUMN_BANK setting you suggested is that sequential addresses (as generated by the Xilinx Example Design), should give optimal performance, but I still gaps between cycles on the DDR4 physical interface. In my design I need to get the maximum performance available from my physical memory.
Can you explain in more detail how to do this?
06-22-2021 12:59 AM
Yes, your understanding is correct app_address should map to ROW_COLUMN_BANK. Table 4-83 provides more details.
Figure 4-24 explains how MIG groups fsm maps to bank and bank group.
06-23-2021 05:44 AM
The bus utilization is calculated at the User Interface taking total number of Reads and Writes into consideration and the following equation is used:
((rd_command_cnt + wr_command_cnt) × (BURST_LEN / 2) × 100)
bw_cumulative = --------------------------------------------------------------------------------
((end_of_stimulus – calib_done) / tCK);
Simulation is very useful in finding the bus utilization