UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Observer drewboud
Observer
1,989 Views
Registered: ‎10-30-2014

ZU19EG Board Bringup with x72 DDR4-2400

We are currently bringing up a new ZU19EG (XCZU19EG-2FFVC1760I) based board using 2017.2 tools.  There are 5x MT40A512M16HA-083E:A devices comprising the x72 DDR bank on the PS Side.  In order to support debug ECC is disabled and the speed is set to 1600 MT/s.  While running FSBL via JTAG it was discovered that the SDRAM write/read eye training was failing (and since it was checked with a while loop it never exited). 

 

The psu_init.c code was updated to provide breaks from the training loops for debugging.  Once stepping completely through psu_ddr_phybringup_data the resulting value of PGSR0 shows 0x84c844ff, which indicates a read eye training error, write level adjustment error, and dqs gate training error.  The layout complies with UG583 for routing rules.  The power distribution also looks good.  Given the board fab I am unable to probe hardly anything other than termination resistors.  

 

Before I go digging into every controller register I wanted to ask if anyone else has experienced anything similar with the MPSoC?  I also wanted to comment on the psu_ddr_phybringup_data function.  

 

There is a prog_reg call to address 0xFD080014U, I cannot find what this is.  The TRM mentions a particular training sequence but it appears they are all issued at once in psu_ddr_phybringup_data, including the init bit which is recommended to be set on a separate register write.  It appears this way across multiple ZU boards so I assume it's standard.  There is a MR write command issued to 0xFD070014U, how does the data value map to the addr/bg/ba etc on the bus? 

 

My plan of action is to issue individual training commands to isolate the sequences for debug.  Any advisement is greatly appreciated.  Thank you.       

0 Kudos
18 Replies
Observer drewboud
Observer
1,954 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

I also wanted to include the DPLL configuration and the DDR configuration setup information.  

 

Thank you.

DRAM_Clk_Config.PNG
DRAM_Config.PNG
0 Kudos
Xilinx Employee
Xilinx Employee
1,874 Views
Registered: ‎07-30-2007

Re: ZU19EG Board Bringup with x72 DDR4-2400

Please enable DM/No DBI, as long as you have DM pins. Otherwise, please try a later Vivado version, as we have improved NO DM support.
0 Kudos
Observer drewboud
Observer
1,866 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

Dylan,

 

Our current implementation has the DM pins tied low with 39.2ohm at the devices given the intent for ECC on the PS side (https://forums.xilinx.com/t5/Memory-Interfaces/x72-DDR4-NO-DM-NO-DBI-Pins-still-connected/td-p/763918).  The termination resistors are physical (not buried) components and could be removed.  Is the recommendation to enable DM intended to drive that IO or to change the controller configuration?  They are 0201s in a densely populated area so I want to verify before we re-work. 

 

I am also installing the 17.4 tools on a separate machine to avoid DLC pod driver conflicts.  Will respond back once that is tested. 

 

Thank you.          

0 Kudos
Observer drewboud
Observer
1,854 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

If I re-enable ECC what should the DM/DBI settings be at that point?   Back to NO DM and NO DBI? 

 

Thanks!

0 Kudos
Xilinx Employee
Xilinx Employee
1,852 Views
Registered: ‎07-30-2007

Re: ZU19EG Board Bringup with x72 DDR4-2400

I suggest keeping the same DM settings in all configurations, as that is the default and what we test most.
0 Kudos
Observer drewboud
Observer
1,846 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

Alright my plan is to run with "DM no DBI" in the 17.4 tools and see if that improves anything.  It's looking like we will want to uninstall the DM pull-downs but I am still confused for the the proper connections.  If ECC is enabled the DM pins shouldn't be used but I should still set the core configuration as "DM no DBI"?  On the ZCU102 I see that "DM no DBI" is the default and that the DIMM is a x72 with ECC, so if it works there hopefully it work here.

 

Thanks!       

0 Kudos
Xilinx Employee
Xilinx Employee
1,842 Views
Registered: ‎07-30-2007

Re: ZU19EG Board Bringup with x72 DDR4-2400

You may have luck with NO DM in 2017.4, as we disabled calibration on DM pins then. But yes, I'd recommend treating DM as normal pins and remove the pulldowns.
0 Kudos
Observer drewboud
Observer
1,820 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

Looking at the layout I see that the DMs were not routed to length match with the rest of the DQ bus since they were not expected to be used once ECC was enabled (per the discussion I posted earlier).  How sensitive is the DM line to the length matching given the expected operation?  Some are within tolerance but a few of the bytes DM signal are ~250mils different, outside of the 107mil derated value for 1600MT/s operation.      

 

When attempting to run with "no DM no DBI" I see PSU_DDR_PHY_DTCR0_DTWBDDM  is still set to 1.  Is this intended?  Also PSU_DDR_PHY_DX(x)2GCR1_DMEN is also still set.  What should those be set to in a "no DM no DBI" config?  Also I don't see an entry for DX1GCR1.  Once the DM pull-down resistors are removed I am going to try the "no DM no DBI" setting again but with the above registers updated.    

 

Thanks!

0 Kudos
Observer drewboud
Observer
1,805 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

With the DM pull-downs removed and "DM no DBI" set in the HDF I am still seeing 0x84c844ff in the PGSR0 register.  The errors indicate Read Eye Training Error, Write Leveling Adjustment Error, DQS Gate Training Error, and VREF Training Error.  If I explore the DQS Gate error (read leveling?) registers DX0RSR1 I don't see any errors asserted.  The DQS gating, latency, and delay registers all have similar values.  Since PGSR0.QSGERR is set I would expect a byte to be set in DX(#)RSR1.  

 

For the Read Eye training I do so DX0GSR2.REERR with ESTAT value of 0 (Initial read data miscompare before centering).  The VREF ESTAT indicates a "Final check for DRAM VREF failed".  

 

This may still indicate a write issue.  What other actions can I take here?

 

Thank you.  

0 Kudos
Observer drewboud
Observer
1,783 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

By adding some DQS gating system latency (b'1) at 0xFD0807C0 [4:0] the QSGERR error bit is no longer asserted.  The 3 remaining failures are VREF training, Write Level Adjustment, and Read Eye Training.     

0 Kudos
Observer drewboud
Observer
1,774 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

For the Write Level Adjustment I see that it is failing on byte 7.  I've captured the write leveling registers but I cannot determine why it is happening or how to correct for this particular byte.  It doesn't seem out of the ordinary compared to the others. 

 

Layout (x16 parts):

                                                WLPRD WLD       WDQD  WLSL    WDQSL

Byte 0 = 2380mils             75           74           39             0            1                              CAC1 = 3200mils

Byte 1 = 2083mils             73           72           37             0            1                             

Byte 2 = 2169mils             75           78           3D             0            1                              CAC2 = 3621mils

Byte 3 = 2880mils             76           76           3B             0            1                             

Byte 4 = 2237mils             75           74           39             0            1                              CAC3 = 4130mils                              

Byte 5 = 1753mils             75           74           38            0             1                             

Byte 6 = 3043mils             75           74           39            0             1                              CAC4 = 4545mils

Byte 7 = 2298mils             73           DA          2A             3            2                             

Byte 8 = 3315mils             ECC NOT USED                                                                  CAC5 = 4980mils + Term

 

Any ideas on how I can adjust/correct byte 7?  

 

Thank you

    

0 Kudos
Xilinx Employee
Xilinx Employee
1,758 Views
Registered: ‎07-30-2007

Re: ZU19EG Board Bringup with x72 DDR4-2400

You can try to change to use 32-bit width so that 7th byte is not used, and then run the read/write eye tests.

 

It looks like you are running fairly slow- which worries me. What does full speed look like?

 

Also, there were a few improvements to some registers in 2018.1, which may be worthwhile to test with.

 

The individual skew of the DM pins should not be too much of a factor, there is per-bit write deskewing done. Termination matters, however.

 

Generally, DDR4 issues end up being board issues at this point of the silicon/software. Power supply issues seem to be the most common.

0 Kudos
Observer drewboud
Observer
1,738 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

Dylan,

 

Thanks for the feedback.  Switching to 32-bit did not fix the issue.  I am running slow at 1600 to minimize any SI effects that may be inherent in the design, given the initial issues.  We are in the process of upgrading our licenses to support the 18.1 tools. 

 

Unfortunately May is conference/WG meeting month so testing will be limited over the coming weeks.  I did want to share the attached images in the hopes someone might have an "ah-ha" moment, I certainly haven't.  I captured the attached signals on byte 1 on what is suspected to be the worst byte lane.  "Worst" meaning it is the first chip for CAC, the DQ bus is on through vias (tiny stubs), the stitching vias (circled) are more sparse than other data bytes, and it is at the end of the VREF pour.  The stitching vias for other data bytes are blind vias matched to the signal layer, byte 1 is the only byte going completely through the board.  This byte fails training similar to others.        

 

The captured signals are with psu_init default settings, ODT set to 40ohm on the DRAMs, Vref level set to 76% of VDDQ ~0.91V, and ZCAL completed on the PHY.  The DQ11 read transaction Vil looks high, but this may also be a function of the measurement setup (which we are refining also).  The same goes for the DQ11 write, the Vih looks low.  The DQS looks accurate in both cases given the Vref settings.  Perhaps I can just twist some Vref/ODT knobs to get this working (wishful thinking)?      

 

I know this is not nearly the complete picture and I don't expect Xilinx to help me debug my board but any additional direction on things to check is much appreciated.  I haven't found a smoking gun to correct for in layout yet and as seen in the picture updates will be very impactful for the entire board given the density.    

 

We are in the process of updating our IBIS simulations as a recent tool update (AD18) broke our original simulation.  We are now implementing the design in ADS.  I am also working to measure the PDN and ensure everything remains in spec.  Everything was over designed and our initial measurements showed VDDQ, VTT, VREF, and VPP all within spec during training but we are going to measure again.

 

Any additional advice is greatly appreciated!!!  Thank you.                  

udrt_dq11_read.jpg
udrt_dq11_write.jpg
udrt_dqs_read.jpg
udrt_dqs_write.jpg
Byte_1_Layout.PNG
0 Kudos
Observer drewboud
Observer
1,617 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

Dylan,

 

We are back on this debug and hoping you can steer us in the right direction.  In summary we get through PHY init and and through Write Leveling.  We are failing on Read Leveling (1600MT/s) on all bytes but cannot determine why.  We have probed all of byte 0 and a subset of images are attached.  You can see bit 6 and clock all look good (rest of the bits look just as good).  We have also checked our PDN with a very fast scope probe and there is nothing out of regulation during training. 

 

What I can't make sense of is why the read leveling is failing.  As far as we can tell the data and CAC bus look good.  The delay from when the Controller ODT turns on and the DRAM drives the MPR data.  For an MPR read command Micron has the Command->DQS turn time of PL(0) + AL(0) + CL(11).  The image shows the issuing of a read command (CAS_n) and the 13.75ns of CL at 1.25ns periods.  You will notice in the Long DQ6 capture the ODT is on for some time before it starts sending data.  Not knowing exactly how the Read Leveling algorithm uses the MPR this may be expected.     

 

 

All of our hardware measurements have looked good.  Is there perhaps a configuration issue we are overlooking?   

 

Thanks in advance for any additional comments.   

DQ6_Long.jpg
DQ6.jpg
CAS_DQ6.jpg
0 Kudos
Observer drewboud
Observer
1,568 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

Just wanted to provide an update.  If we set a DGSL value (based on layout) prior to running the Read Leveling we see no errors in QSGERR and QSGDONE is asserted but the resulting DGSL value is reduced by 1.  We've tested with different delay values and the resulting DGSL is always reduced by 1 once completed.  Assuming that the QSGDONE assertion is valid we continue with training and pass Read Deskew but are failing on Write Deskew.  We are investigating further but find it odd that the DQSGATE sequence cannot complete on it's own with setting DGSL first.  The results are very repeatable so we are inclined to trust what is reported in the PGSR0 register.  Will update on what we find regarding write deskew. 

 

Thank you.          

0 Kudos
Observer drewboud
Observer
1,551 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

One more update.  The P/N polarity from the controller to the DRAM is wrong.  The _P is going to the _c and vice versa.  I don't suppose there is any polarity flexibility on the PS side DRAM controller??

0 Kudos
Xilinx Employee
Xilinx Employee
1,543 Views
Registered: ‎07-30-2007

Re: ZU19EG Board Bringup with x72 DDR4-2400

Good catch. Generally no, but which signal?
0 Kudos
Observer drewboud
Observer
1,517 Views
Registered: ‎10-30-2014

Re: ZU19EG Board Bringup with x72 DDR4-2400

It is on all DQS pins. I assume the t/c designation on the DRAM threw me off but what a rookie mistake. Given how clean all the other facets of the interface are I assume once this is corrected there will not be an issue operating at rate. The unfortunate part is it requires a respin.
0 Kudos