cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
MBB
Visitor
Visitor
635 Views
Registered: ‎06-08-2021

DDR4 Back to Back Cycles

Hi,

I'm implementing a DDR4 interface using Vivado 2020.1 to generate a 32 bit DDR interface (2 off 16 bit 256M Micron parts) running at 1066MHz; see attached screenshot.  All settings not shown are in their "default" state.

I have the basic design working in my target device (KintexUltra xcku025-ffva1156-1-c) but it's not achieving the throughput that I'm expecting.
On closer inspection of the simulated example design I can see that there are gaps between bursts on the DDR physical interface, even though successive accesses are to incrementing addresses in the same bank and of the same type (ie read or write); (see attached screenshot).  In addition the IP keeps de-asserting c0_ddr4_app_en indicating that it is busy, but it only does this after it appears to have filled up it's internal buffers with the outstanding accesses.

I suspect that either

  1. There is a problem with my configuration of the DDR4 interface,
  2. This is a known problem with the Xilinx IP
  3. A configuration setting is not suitable for the memory devices chosen.
  4. I've completely mis-understood how DDR4 works
  5. Some or all of the above

Any suggestions gratefully received.

Thanks

Tags (2)
DDR4_Simulation.JPG
DDR4_Setup1.JPG
0 Kudos
6 Replies
rpr
Moderator
Moderator
494 Views
Registered: ‎11-09-2017

Hi

Can you configure the address map to ROW COLUMN BANK and see if any improvement. pg150 chapter 4: performance, provides more detailed information.

Regards
Pratap

Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful.
MBB
Visitor
Visitor
469 Views
Registered: ‎06-08-2021

Hi Pratap,

 

Thank you very much for your response; see attached screenshots of the simulation which does indeed show some improvement.  The one where co_ddr4_act_n is asserted once is my original BankRowColumn setup, where the one with c0_ddr4_act_n asserted for every transaction is the RowColumnBank version.

However it is still not achieving full bandwidth (ie there are still gaps between each DDR4 access), and it's completely counter-intuitive that having to Activate each Bank prior to use is quicker than using an already Activated Bank and streaming continuously to the same Column.

Is there any way to achieve the full bandwidth?  (or is this a design limitation of the IP and/or DDR4)?

 

Thanks

DDR4_RowColumnBank.JPG
DDR4_BankRowColumn.JPG
0 Kudos
rpr
Moderator
Moderator
435 Views
Registered: ‎11-09-2017

Hi

Glad that provided information has improved the bandwidth.

MIG has four group FSMs, to the hit maximum efficiency user logic should utilize all the group FSMs. pg150 chapter 4: performance, provides more detailed information.

 

Regards
Pratap

Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful.
0 Kudos
MBB
Visitor
Visitor
407 Views
Registered: ‎06-08-2021

Hi Pratap,

I've re-read the section on Performance in chapter 4 and I think I understand how the group FSMs affect throughput, but I'm struggling to see how I have any control over the mapping between ap_addr and the FSMs.  Is the expectation that I must re-order the ap_addr (ie some form of logical to physical address mapping) to match the group FSMs for any given memory configuration?  ie Do I need to make some changes to ap_addr because I’m using a 32 bit physical DDR interface?

If not, then my interpretation of Table 4-83 when using the ROW_COLUMN_BANK setting you suggested is that sequential addresses (as generated by the Xilinx Example Design), should give optimal performance, but I still gaps between cycles on the DDR4 physical interface.  In my design I need to get the maximum performance available from my physical memory.

Can you explain in more detail how to do this?

Thanks

0 Kudos
rpr
Moderator
Moderator
320 Views
Registered: ‎11-09-2017

Hi

Yes, your understanding is correct app_address should map to ROW_COLUMN_BANK. Table 4-83 provides more details.

rpr_0-1624348579532.png

Figure 4-24 explains how MIG groups fsm maps to bank and bank group.

rpr_1-1624348728703.png

 

 

Regards
Pratap

Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful.
0 Kudos
rpr
Moderator
Moderator
280 Views
Registered: ‎11-09-2017

Hi Martin,

  • Sequential Read, Sequential Write – can hit the max efficiency.
  • Burst Read/Write Mix – can hit 80 to 90%
  • Short Burst Read/Write Mix – can hit 40 to 50%
  • Random Address Read/Write Mix – low efficiency.

 

The bus utilization is calculated at the User Interface taking total number of Reads and Writes into consideration and the following equation is used:

((rd_command_cnt + wr_command_cnt) × (BURST_LEN / 2) × 100)

bw_cumulative = --------------------------------------------------------------------------------

((end_of_stimulus – calib_done) / tCK);

 

Simulation is very useful in finding the bus utilization

 

Vivado

`define BEHV

rpr_0-1624452248955.jpeg

 

 

rpr_1-1624452248959.jpeg

 

Regards
Pratap

Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful.
0 Kudos