UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Visitor hugo.k
Visitor
1,595 Views
Registered: ‎03-08-2018

NAND ECC errors on xilinx-v2017.3 kernel

Hi,

I'm facing an issue when upgrading kernel on xilinx-2017.3 tag with nand ECC.

A custom board with zynq Z020 is used, the pl353 controller is used with a MT29F1G08ABADAH4-IT NAND device.

nand: device found, Manufacturer ID: 0x01, Chip ID: 0xf1
nand: AMD/Spansion S34ML01G1
nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
Bad block table found at page 65408, version 0x01
Bad block table found at page 65344, version 0x01
nand_bbt: ECC error in BBT at 0x000007fc0000
nand_bbt: ECC error in BBT at 0x000007fa0000
Scanning device for bad blocks
Bad eraseblock 1023 at 0x000007fe0000
Bad block table written to 0x000007fc0000, version 0x01
Bad block table written to 0x000007fa0000, version 0x01
3 ofpart partitions found on MTD device pl35x-nand
Creating 3 MTD partitions on "pl35x-nand":
0x000000000000-0x000000040000 : "spl"
0x000000040000-0x000000060000 : "u-env"
0x000000060000-0x000008000000 : "ubi"
netem: version 1.3
u32 classifier
    Actions configured
Netfilter messages via NETLINK v0.30.
nf_conntrack version 0.5.0 (8192 buckets, 32768 max)
nf_tables: (c) 2007-2009 Patrick McHardy <kaber@trash.net>
ipip: IPv4 and MPLS over IPv4 tunneling driver
ip_tables: (C) 2000-2006 Netfilter Core Team
arp_tables: arp_tables: (C) 2002 David S. Miller
NET: Registered protocol family 10
NET: Registered protocol family 17
bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
8021q: 802.1Q VLAN Support v1.8
zynq_pm_ioremap: no compatible node found for 'xlnx,zynq-ddrc-a05'
zynq_pm_late_init: Unable to map DDRC IO memory.
Registering SWP/SWPB emulation handler
ubi0: attaching mtd2
ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read only 64 bytes, retry
ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read only 64 bytes, retry
ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read only 64 bytes, retry
ubi0 error: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read 64 bytes·

 

All the PEB are accessed during ubiattach and generate ECC errors.


I found an older working tag (xilinx-v2016.4) without ECC error.

When reading diff from old tag and xilinx-v2017.3 I identified there is some changes about MTD layer and ECC layout definition.

Commit 008b7ab777938a486110c7108fc56156815caecb introduces a new ECC layout definition.

 

I try to force the ECC to none in pl35x driver (diff below), in this condition the flash works with kernel xilinx-v2017.3 so I think the ECC layout definition changes introduce the error:


diff --git a/drivers/mtd/nand/pl35x_nand.c b/drivers/mtd/nand/pl35x_nand.c
index 736b0c0ee367..3b7ad3307de5 100644
--- a/drivers/mtd/nand/pl35x_nand.c
+++ b/drivers/mtd/nand/pl35x_nand.c
@@ -1006,7 +1006,7 @@ static void pl35x_nand_ecc_init(struct mtd_info *mtd, struct nand_ecc_ctrl *ecc,
 {
        struct nand_chip *nand_chip = mtd_to_nand(mtd);
·
-       ecc->mode = NAND_ECC_HW;
+       ecc->mode = NAND_ECC_NONE;
        ecc->read_oob = pl35x_nand_read_oob;
        ecc->read_page_raw = pl35x_nand_read_page_raw;
        ecc->strength = 1;

 

Below the flash device-tree node, this node comes frome a device-tree generated with petalinux 2015.1.:


            flash@e1000000 {
                reg = <0xe1000000 0x1000000>;
                compatible = "arm,pl353-nand-r2p1";
                arm,nand-cycle-t0 = <0x4>;
                arm,nand-cycle-t1 = <0x4>;
                arm,nand-cycle-t2 = <0x2>;
                arm,nand-cycle-t3 = <0x2>;
                arm,nand-cycle-t4 = <0x1>;
                arm,nand-cycle-t5 = <0x1>;
                arm,nand-cycle-t6 = <0x2>;
                status = "okay";
                #address-cells = <0x1>;
                #size-cells = <0x1>;

                partition@0x00000000 {
                    reg = <0x0 0x40000>;
                    label = "spl";
                };··

                partition@0x000C0000 {
                    reg = <0x40000 0x20000>;
                    label = "u-env";
                };··

                partition@0x004E0000 {
                    reg = <0x60000 0x0>;
                    label = "ubi";
                };··
            };··

 

Do I miss some device-tree parameters (or kernel configuration) that are useless in older kernel and become mandatory with kernel xilinx-v2017.3 (and the new ECC layout definition) or a Bug ?

14 Replies
1,591 Views
Registered: ‎01-08-2012

Re: NAND ECC errors on xilinx-v2017.3 kernel


@hugo.k wrote:

MT29F1G08ABADAH4-IT NAND device.
nand: device found, Manufacturer ID: 0x01, Chip ID: 0xf1


Are you sure about that?  I would have expected 0x2c for Micron as the manufacturer of the MT29F1G08ABADAH4-IT.

0 Kudos
Visitor hugo.k
Visitor
1,581 Views
Registered: ‎03-08-2018

Re: NAND ECC errors on xilinx-v2017.3 kernel

This is the response of  the board manufacturer...

But when I check for flash chip on the board, surprise this is an other chip.

 

Sorry for wrong information the chip is a SPANSION ML01G100BHI00.

http://www.cypress.com/file/207521/download

 

 

0 Kudos
Voyager
Voyager
1,533 Views
Registered: ‎09-14-2016

Re: NAND ECC errors on xilinx-v2017.3 kernel

0 Kudos
Visitor hugo.k
Visitor
1,510 Views
Registered: ‎03-08-2018

Re: NAND ECC errors on xilinx-v2017.3 kernel

Hi @trigger,

 

Thank for the reply.

 

The NAND work well on old kernel version and if ECC is disabled the NAND work without timeout.

 

Did you think that the timing tunning can resolv this error ?

(I don't have Vivado and hardware design, this is a delivery for me so i cannot modify timing in hardware design)

0 Kudos
Voyager
Voyager
1,507 Views
Registered: ‎09-14-2016

Re: NAND ECC errors on xilinx-v2017.3 kernel

Hi @hugo.k,

 


@hugo.kwrote:

 

The NAND work well on old kernel version and if ECC is disabled the NAND work without timeout.


My bad, if the NAND work well with an old kernel the trouble is probably inside driver :S

 

You should probably check differences between the old kernel version and your newer ... 

 

Cheers,

Trigger

0 Kudos
Visitor ptynan3
Visitor
1,409 Views
Registered: ‎03-27-2018

Re: NAND ECC errors on xilinx-v2017.3 kernel

Hi @hugo.k, did you ever figure this out?  We seem to have a very similar issue bringing up a board with the 2Gbit version of this NAND flash and see the same ECC errors when mounting a filesystem onto the device.  We would rather not modify the kernel code directly if possible as you mention above.

 

@derekparks do you have anything to add?

0 Kudos
Observer derekparks
Observer
1,397 Views
Registered: ‎09-14-2017

Re: NAND ECC errors on xilinx-v2017.3 kernel

@ptynan3, i was able to find a suitable workaround that hopefully will be fixed in a future release of the linux-xlnx tree. 

 

The documentation for pl35x smc and nand (http://www.wiki.xilinx.com/Zynq+Pl353+SMC+and+NAND+drivers) states that ondie support is only available for Micron chips.  This was the clue.  I agree that if ECC the errors would go away, but i wanted to leverage the chip's ECC 1bit ondie corrections.  So the workaround was in the pl35x_nand.c code.  Attached is a not-very-elegant, but simple patch to initialize the SMC driver with hardcoded ondie ecc support.    I haven't had chance to perform a full battery of tests, but had a clean "nandtest /dev/mtdX" and could mount jffs2 without ECC errors with data that survives a cold reboot.

 

Patch was hacked together by diff'ing the original pl35x_nand.c with a manually edited pl35x_nand.c, captured the output to a file, and replaced the first two lines of the output with git-like patch format.  The patch was copied to <my-peta-dir>/project-spec/meta-user/recipes-kernel/linux/linux-xlnx/ and adding SRC_URI_append += " file://spansionS34MLondie.patch" to the bbappend file.

 

Using:

Zynq 7100

Spansion S34ML02G1

SMC pl353

petalinux 2017.3

linux-xlnx commit f1b1e077d641fc83b54c1b8f168cbb58044fbd4e

 

 

 

 

0 Kudos
Observer derekparks
Observer
1,372 Views
Registered: ‎09-14-2017

Re: NAND ECC errors on xilinx-v2017.3 kernel

After having time to test with the workaround, im not convinced its actually a solution.  There are a couple unwanted side affect of hardcoding the pl35x driver:

 

  1. pl35x driver probe displays warning possibly indicating ECC not performed correctly.
    1. nand: device found, Manufacturer ID: 0x01, Chip ID: 0xda
      nand: AMD/Spansion S34ML02G1
      nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
      nand: WARNING: pl35x-nand: the ECC used on your system is too weak compared to the one required by the NAND chip
  2. nandwrite doesn't update ECC OOB/spare section
    1. The linux image.ub mtd partition is field upgraded using nandwrite, which bypasses the oob bytes.  After a reboot, the u-boot has different oob layout and thus doesn't correctly load from nand.

 

I suspect that the pl35x driver shouldn't be modified and the issue resides in how jffs2 and now u-boot handle the oob/spare section of the nand chip.

 

0 Kudos
Adventurer
Adventurer
1,362 Views
Registered: ‎12-02-2014

Re: NAND ECC errors on xilinx-v2017.3 kernel

Actually, I'm fairly sure that warning should be ignored in the Ondie ECC case.  The OS will have no insight to the ECC state when ondie is used.  There should probably be a flag to check for ondie status, and only print that error if it's not enabled.

0 Kudos
Xilinx Employee
Xilinx Employee
1,251 Views
Registered: ‎10-11-2011

Re: NAND ECC errors on xilinx-v2017.3 kernel

The NAND flashes S34ML02G and S34ML01G are meant to be used with 1-bit HW ECC on zynq-7000.

The onDIE ECC should not be used (or faked) for those flashes.

the latest report I have is that a similar error was fixed back in 2014.2.

See https://github.com/Xilinx/linux-xlnx/commit/35281f9fd986d75fb4429cf203f0dc8d63b5c123

 

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Visitor hugo.k
Visitor
1,234 Views
Registered: ‎03-08-2018

Re: NAND ECC errors on xilinx-v2017.3 kernel

Hi,

 

Thanks for replies.

 

I tried the workaround: the kernel boot and can load data on the flash (rootfs) without error.

 

I tried to write a new rootfs on another UBI volume (many UBI volume are used to store all ressources, U-BOOT, KERNEL, FPGA and ROOTFS), the write is done without error but after a reboot  all UBI volumes seam to be corrupted (or all the flash, I havent time to investigate more). The u-boot is not loaded, I had to use jtag to restore a working flash content.

 

Is there an incompatibility between u-boot and kernel ECC ? (an old u-boot version is used (2015.1) with a new kernel (2017.3) )

0 Kudos
Xilinx Employee
Xilinx Employee
1,224 Views
Registered: ‎10-11-2011

Re: NAND ECC errors on xilinx-v2017.3 kernel

There's should be an AR releasing soon (in few days) with a patch that might be related to this post:

71078 - 2017.x Zynq-7000 - Embedded Linux: ubi ECC error on S34ML02G1

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Visitor hugo.k
Visitor
1,195 Views
Registered: ‎03-08-2018

Re: NAND ECC errors on xilinx-v2017.3 kernel

Hi,

 

The commit below (on master branch of linux-xlnx) fix the issue:

https://github.com/Xilinx/linux-xlnx/commit/59e8b54e301b89d4f05b0cb02dc1777ee6bd3a1f

 

I generate a patch with this commit and apllying it on my kernel (2017.3), with this patched kernel the nand works.

 

Thanks

0 Kudos
Xilinx Employee
Xilinx Employee
1,170 Views
Registered: ‎10-11-2011

Re: NAND ECC errors on xilinx-v2017.3 kernel

Great! This is the same patch https://www.xilinx.com/support/answers/71078.html

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos