02-06-2019 11:14 AM
I'm working on a custom Zynq MPSoC board where external DDR4 memories are MT40A512M16JY-075E.
The 256Mb version of the memory component has a preset I use in Vivado. Now I'm facing corrupted data when accessing this memory.
Bare metal memory tests are ok but not when run in linux. We've found this post which seems to be the same issue.
I'd like to know how to unconstraint the timing of the DDR in Vivado.
Should I lower the "Memory Interface Device frequency"? Should I use another speed bins?
Thanks for any help you may provide.
02-06-2019 11:58 AM
Fundamentally you will have problems if you're targeting the MT40A256M16DE-083E memory device in the Zynq PS DDR configuration GUI while a MT40A512M16JY-075E is installed on the board. The two biggest issues with your current settings is that the DRAMs have different die densities which means they have some different protocol timing and addressing based on this, the other problem is they're fundamentally different speed grades so the base timing requirements for the two parts are different. It's also possible because the base speed bins are different that you're programming the MT40A512M16JY-075E to operate in a point at which it doesn't support. Overall this is an invalid configuration.
Based on your current part and an assuming operating point of 1200MHz (DDR4-2400Mbps) the MT40A512M16JY-075E is backwards compatible to the DDR4 2400T (17-17-17) speed bin:
Here's a snapshot of the Zynq PS DDR configuration GUI settings that will work for this part assumng 2400Mbps operation:
Please give those settings a try and let me know if it works.
02-06-2019 12:39 PM
Many thanks for your answer. I'll give it a try. May I ask what is the document you've used to get the first screenshot of your post? could you send a link so I can have a look myself?
02-06-2019 01:11 PM
The screenshot came from the latest version of Micron's 8Gbit DDR4 device specification:
02-07-2019 01:35 AM - edited 02-07-2019 04:38 AM
I was using an older version of the datasheet. We did try the settings you've suggested and it does not work any better.
I realized I was incomplete in the description of my problem. In fact, we do have two versions of the board.
- one with MT40A512M16JY-083E IT:B
- one with MT40A512M16JY-075E AIT:B
the tests with your config have been done on a -83E memory. Unfortunately, I can't do them on the -75E for now. So I will focus on the -83E part.
My questions are:
1. Is there a mutual configuration capable of supporting both memory reference? how to find it?
2. the configuration is used for the -83E part is the following:
We've been using also an alternate config with tFAW set to 46.60ns.
The memory tests results (under linux):
Moreover, when a incorrect data is detected, we manage to know that the read is wrong and not the write. And, several consecutive data have the same incorrect pattern.
Note that no problems such as these were detected on bare metal tests.
The timings parameters up here seem to be ok for me. Switching to an effective DRAM bus width of 32-bits improved things but lead toward a crash in the end. So maybe we have PCB delay issue, so is there a way to unconstraint the DDR4 PS memory timings while being still compatible with the part on board?
02-07-2019 11:20 AM
The screenshot I had in my reply are settings that are compatible with both parts. In the Speed Bin table for the -075E part I highlighted that it's compatible with the -083 and -083D speed grades and NOT the -083E speed grade. The -083E part is compatible with the other -083x speed grades because these are all slower than the -083E. Using the -083 speed grade operating is the fastest compatible settings for both parts.
One of the first things you can try to do in these cases is lower the memory interface clock rate. Right now you're running at 1200MHz for 2400Mbps operation so I would try setting it to 800MHz for 1600MBps operation. When you're operating that slow both parts will be compatible and you just need to update the CAS and CWL settings like the GUI will say. You can keep or change the tRCD or tRP settings but it really doesn't matter at this point.
02-08-2019 06:57 AM
thanks. I've tried to lower the controller frequency to 800MHz instead of 1200MHz. Tests did not show any improvements.
However, actual frequency computed by the vivado on the configuration window showed ~749MHz... so a actual frequency not so close to the requested one.
Do you think that could be an issue? If so, I believe I need to change PLL configuration, could you please give me a hint on this as well?
02-11-2019 09:39 AM
When operating at 800MHz or below for DDR4 it doesn't really matter at that point so it's still a valid test to have the interface at 749MHz even though 800MHz is requested. Overall this is telling me you most likely have a hardware issue in your design. My recommendation is to review your layout against the guidance in UG583, double check that the package pin skews were accounted for in your layout, double check the schematic connections, termination resistors, decoupling capacitors for the PS and DDR4 rails, etc. In parallel I would also make power rail measurements for the PS and DDR4 power supplies to make sure they're stable during operation. Any mesaurements made here can't be made across a decoupling cap since they'll give overly optimisitc results. Use the Zynq MP DRAM Diagnostic Tests to try and isolate if the data error happens on certain bits or byte lanes.
02-12-2019 08:46 AM
I am working with sebo on this issue
I run the Zynq MP DRAM Diagnostic Tests.
DDR ECC is DISABLED
1-Rank 64-bit DDR4 2399
Bus Width = 64, D-cache is enable,
If launching Test first 16MB region of DDR, the test hangs after MT0(8). Trying to stop the debugger indicates "cannot halt processor core, timeout"
Idem if launching Test first 2GB region of DDR, the test hangs after MT0(8).
This is interesting because this is the first time we observe a freeze into bare metal. At this time our problem happened only when running Linux. By the way, there are no errors when launching the other SDK memory test ( I launched it onto the DDR full range)
The results of the read eye test are in the attached file: I guess we have a problem with the lane2 but I would have your interpretation please. Is the eye width too small for lane2? If this is a problem, is there a possible software correction?
The results of the write eye test are in the attached file
Can I found some relevant informations into the PHY debug registers?