UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor mhuff84
Visitor
1,103 Views
Registered: ‎02-06-2018

MIG DDR ECC Error Status But Data is Correct

I have a strange problem that I'm not able to find helpful information on...

I'm using an Kintex Ultrascale with the DDR3 Mig Controller.  I've done a memory test design using the microblaze controller and the example memory test available in SDK.  I did some tweaking to the memory test example memory test written in c.

Essentially what i'm doing...

1. Write to DDR memeory

2. Read ECC Status Register in the MIG Controller

3. Report ECC UE and SE indicators and clear ECC status register

4. Read same memory location

5. Read ECC Status Register in the MIG Controller

6. Report ECC UE and SE Indicators and clear ECC status register

7. Compare written data with read data

With the above steps I'll get SE and UE errors on both the write and read transactions, but the data is not corrupted.  And, this only happens above a certain address range (at which point every read write has ECC errors occur).

In my design any data written above 0x8801_0000 has this happen.  I don't understand how to get a UE and SE yet the data is correct.  And why would this be happening above a certain address range?

Note that i've configured the DDR memory range to be 0x8000_0000 to 0xFFFF_FFFF.

I must be missing something silly.  Or maybe we have a problem with the routing on the board with a particular address line.

Any suggestions would be most appreciated.

 

Thanks.

0 Kudos
12 Replies
Teacher xilinxacct
Teacher
1,094 Views
Registered: ‎10-23-2018

Re: MIG DDR ECC Error Status But Data is Correct

@mhuff84

Before the 'tweaking' did you also get the error? (e.g. do you think the error is native to the example?)

If not associated with the example, can you share your Vivado project (and highlight the tweaks to make it easier to review)?

I don't have the exact same hardware, so I may not be able to duplicate/resolve it. 

0 Kudos
Visitor mhuff84
Visitor
1,072 Views
Registered: ‎02-06-2018

Re: MIG DDR ECC Error Status But Data is Correct

You can't see the UE or CE status without tweaking.  The test is a pretty straight forward write/read/compare written in c.  So I'm doubtful that's the problem.

I tried uploaded my project, but it is over the 20MB size limit.  I've attached my sdk directory with the c source code, my pins constraints, and an image of the block diagram.

I followed the example microblaze project from:

https://www.xilinx.com/video/hardware/creating-a-simple-microblaze-design-in-ip-integrator.html

Here's what I ended up with.

 block_diagram.jpg

0 Kudos
Teacher xilinxacct
Teacher
1,064 Views
Registered: ‎10-23-2018

Re: MIG DDR ECC Error Status But Data is Correct

@mhuff84

Visibility is a bit limited outside of Vivado, but here is a stab just peaking at memorytest.c

ddr_stat is set via ddr_stat_reg[0]

if ddr_stat were ever non-zero, ddr_stat_reg[0] is set all FFs... subsequent accesses in the loop would then have the 0x1 & 0x2 bits set, so you would start to see the messages.

Hope that helps

If it does, please mark as solution accepted. (Kudos are also welcomed.)

Visitor mhuff84
Visitor
1,052 Views
Registered: ‎02-06-2018

Re: MIG DDR ECC Error Status But Data is Correct

Thanks for the reply.

Am I reading the guide wrong?  On table 4-42 in pg150-ulgrascale-memory-ip.pdf it says:

Annotation 2019-01-02 153248.jpg

I'm setting it to all FF to clear it if it is ever non-zero.  Is this not correct?

Thanks.

0 Kudos
Teacher xilinxacct
Teacher
1,046 Views
Registered: ‎10-23-2018

Re: MIG DDR ECC Error Status But Data is Correct

@mhuff84

My point to check is...

you are 'always' setting 0x1 & 0x2 bits... even if you did not get a CE_STATUS or UE_STATUS ...

e.g. if 'any' value came in you are guaranteed the next time through the loop, those two bits would be set. So, I think it is a self-fulfilling prophecy.

It would be good to check what comes first... you setting the flags to all FFs  OR an actual CE/UE status. as the ddr_status_check is outside the loop.

Hope that helps

0 Kudos
Visitor mhuff84
Visitor
1,042 Views
Registered: ‎02-06-2018

Re: MIG DDR ECC Error Status But Data is Correct

I see.

So the way this is written it'll set ddr_stat_reg only when it is first read to be non-zero.  I changed it up a bit to use pointers instead of array notation.

 

 

    u32 *ddr_stat_reg = (u32*)(0x70000000);
//Write data *(addr+i) = Val; ddr_stat = (u32)*(ddr_stat_reg); if(ddr_stat != 0) { if(ddr_stat & 0x1) { //printf("Write UE Detected\n\r"); print_UE_stats(); ue_cnt++; errors++; } if(ddr_stat & 0x2) { //printf("CE Detected\n\r"); print_CE_stats(); se_cnt++; } *ddr_stat_reg = 0x01; } //Now check the data back again Actual = *(addr+i); if(Actual != Val) { print("Data Fail\n\r"); data_error++; } if(ddr_stat != 0) { if(ddr_stat & 0x1) { //printf("Read UE Detected\n\r"); print_UE_stats(); ue_cnt++; errors++; } if(ddr_stat & 0x2) { //printf("UE Detected\n\r"); print_CE_stats(); se_cnt++; } *ddr_stat_reg = 0x02; }

 

0 Kudos
Teacher xilinxacct
Teacher
1,039 Views
Registered: ‎10-23-2018

Re: MIG DDR ECC Error Status But Data is Correct

@mhuff84

That seems closer, now that the status is checked in the loop.

However, I see a couple of remaining things that 'may' not be quick right.

In the first if block you always set to 0x1... Do you only want to do that of the 0x1 condition?

Likewise in the second if block for 0x2

Anyway, if it is now working without these changes, then hopefully you are good to go.

Please mark as solution accepted... and since the thread was so long, feel free to mark as many of the helpful posts along the way with a Kudo.

 

 

Visitor mhuff84
Visitor
1,013 Views
Registered: ‎02-06-2018

Re: MIG DDR ECC Error Status But Data is Correct

Instead of just writing one's to the register, I just turn around and write the value that a was read.  I'm still testing but near as I can tell, if you write 1's to clear and there's nothing to clear it can mess it up and start reporting counts that don't exist.  This seems strange, and it is hard for me to believe the HDL int he mig controller was written this way, but maybe that's the case?  It would be nice to have that confirmed by a Xilinx person.  I also consulted with a software engineer (I'm an HDL guy so C is not my strength) and he pointed out that I should be using u8 pointers and type casting to u32's as needed.  Apparently my pointer math wasn't quite right.

 

u8 *ddr_stat_reg = (u8*)0x70000000; // DDR Status reg starting address
ddr_stat = *((u32*)(ddr_stat_reg)); // Read Status address 0x0 if(ddr_stat != 0) { if(ddr_stat & 0x1) { print("Write UE Detected\n\r"); print_UE_stats(); ue_cnt++; errors++; } if(ddr_stat & 0x2) { print("CE Detected\n\r"); printf("Address Value = 0x%X\n\r", (int)(addr+i) ); print_CE_stats(); ce_cnt++; } *(u32*)ddr_stat_reg = ddr_stat; // Clear Status }

Once I've done more testing to confirm this has fixed the UE/CE counting even though data is correct I'll mark it as resolved.  The only other thing I can think of is that the syndrom bits (64 - 71) don't get written correctly.  Seems odd it would be just those bits.  But, maybe that DDR chip has something going on.  Hard to say.

 

0 Kudos
Visitor mhuff84
Visitor
968 Views
Registered: ‎02-06-2018

Re: MIG DDR ECC Error Status But Data is Correct

After doing a bunch more debugging we did find a problem with the board layout.  We had the wrong DM pull down resistors applied.  After fixing those, we now see the correct data in all tests run thus far.  However, the ECC status register is still reporting UEs and CEs routinely.  I can only surmise that the ECC status register is falsely reporting these errors.  There's a known issue:  https://www.xilinx.com/support/answers/71531.html

but i'm not sure if this is the same thing.

Visitor mhuff84
Visitor
948 Views
Registered: ‎02-06-2018

Re: MIG DDR ECC Error Status But Data is Correct

I'm not sure if this is real or not, but it seems that the enabling the option for

"Force Read and Write commands to use AutoPrecharge when Column Address bit A3 is asserted high" in the MIG Controller settings is what causes the erroneous status bits.  Once I turned that off things started working much better.  This looks like it might be a bug in the controller.

0 Kudos
Visitor mhuff84
Visitor
945 Views
Registered: ‎02-06-2018

Re: MIG DDR ECC Error Status But Data is Correct

For reference here's my settings for the MIG Controller with "Force Read and Write commands to use AutoPrecharge..." disabled:

image.pngimage.pngimage.pngimage.png

0 Kudos
Moderator
Moderator
713 Views
Registered: ‎11-28-2016

Re: MIG DDR ECC Error Status But Data is Correct

Hello @mhuff84,

I don't think there's an issue with the IP.  I'm more concerned you have other issues with your board layout that were cause issue when the A3 with auto-prechage was enabled.  Here, depending on your access pattern, you're having more precharges and activates being issued on the memory interface which is going to add to more toggling on the CAC bus signals as well as more load on your power rails.  I would review the layout guidelines in UG583 and make sure your layout accounted for package pin delays.  Next I would double check the FPGA and DRAM power rails, try running the example design with the same configuration and see if the data errors occur or go away.