cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Observer
Observer
882 Views
Registered: ‎05-19-2017

Zynq-7000 ethernet broken in Petalinux 2018.1?

After upgrading from Petalinux 2017.4 to Petalinux 2018.1 the Linux kernel crashes during boot while initiliazing the Zynq-7000 integrated ethernet device. I've bisected the Xilinx Linux kernel between 2017.4 and 2018.1 and I think i found the commit causing the problems. 
 
I am not sure if this is caused by me or somebody else, so I would be nice if others may have a look at this to rule out that only me is effected. Possible reasons:
  • Z-Turn peculiarities (I've got the version with the KSZ9031N instead of the Atheros PHY)
  • My hardware design/device tree
  • Z-Turn u-boot init code I hacked from their outdated ubuntu release into the recent U-Boot releases (worked with 2017.4)
  • Anything else? 
 
Here's my error description. During bootup linux segfaults in the cadence ethernet driver. Probably caused by an interrupt occuring while trying to release the IRQ:
[    0.875121] libphy: Fixed MDIO Bus: probed
[    0.881203] CAN device driver interface
[    0.886178] libphy: MACB_mii_bus: probed
[    0.890434] Unable to handle kernel paging request at virtual address 6b6b6b73
[    0.897577] pgd = c0004000
[    0.900266] [6b6b6b73] *pgd=00000000
[    0.903829] Internal error: Oops - BUG: 5 [#1] PREEMPT SMP ARM
[    0.909642] Modules linked in:
[    0.912684] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc4-xilinx #1
[    0.919536] Hardware name: Xilinx Zynq Platform
[    0.924051] task: ef0cf7c0 task.stack: ef0e4000
[    0.928573] PC is at macb_interrupt+0x24/0x3a8
[    0.932997] LR is at __free_irq+0x17c/0x298
[    0.937158] pc : [<c04d6674>]    lr : [<c016b920>]    psr: 60000193
[    0.943407] sp : ef0e5d18  ip : ef0e5d58  fp : ef0e5d54
[    0.948615] r10: 20000113  r9 : ef2f2538  r8 : ef215c14
[    0.953824] r7 : 0000001b  r6 : ef215ce4  r5 : 6b6b6b6b  r4 : ef2f2538
[    0.960333] r3 : c04d6650  r2 : 00000000  r1 : 6b6b6b6b  r0 : 0000001b
[    0.966845] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    0.974048] Control: 18c5387d  Table: 0000404a  DAC: 00000051
[    0.979776] Process swapper/0 (pid: 1, stack limit = 0xef0e4210)
[    0.985766] Stack: (0xef0e5d18 to 0xef0e6000)
[    0.990108] 5d00:                                                       ef0e5d54 ef0e5d28
[    0.998273] 5d20: c016b1e0 c016aefc ef0e5d44 ef215c00 eea4df80 ef215ce4 0000001b ef215c14
[    1.006431] 5d40: ef2f2538 20000113 ef0e5d8c ef0e5d58 c016b920 c04d665c eea66a80 ef2f2538
[    1.014590] 5d60: ef0e5d7c 0000001b ef2f2538 ef214810 00000006 eea66a80 00000000 c0464db8
[    1.022749] 5d80: ef0e5da4 ef0e5d90 c016bad8 c016b7b0 eea4ddc0 ef0e5db8 ef0e5db4 ef0e5da8
[    1.030909] 5da0: c016f4d0 c016ba8c ef0e5dec ef0e5db8 c0464f98 c016f4c0 eea66e00 eea4ddc0
[    1.039067] 5dc0: a0000113 ef214810 c0b8756c c0b87570 ffffffed 00000000 c0b429f0 00000000
[    1.047227] 5de0: ef0e5e04 ef0e5df0 c0465598 c0464df4 ef214810 c0b8756c ef0e5e34 ef0e5e08
[    1.055386] 5e00: c04616e0 c0465564 00000000 ef214810 ef214844 c0b429f0 c0b3a6d8 c0a32ef8
[    1.063545] 5e20: c0a82f9c 00000000 ef0e5e54 ef0e5e38 c04618dc c046159c 00000000 c0b429f0
[    1.071705] 5e40: c0461850 c0b3a6d8 ef0e5e7c ef0e5e58 c045fbdc c046185c ef12fe70 ef18cac0
[    1.079863] 5e60: c06e609c c0b429f0 eea69880 00000000 ef0e5e8c ef0e5e80 c0461154 c045fb58
[    1.088023] 5e80: ef0e5eb4 ef0e5e90 c0460c98 c0461138 c08b0b66 ef0e5ea0 c0b429f0 c0b6ab00
[    1.096182] 5ea0: 000000b2 00000000 ef0e5ecc ef0e5eb8 c04624b4 c0460b20 ffffe000 c0b6ab00
[    1.104342] 5ec0: ef0e5edc ef0e5ed0 c046331c c0462410 ef0e5eec ef0e5ee0 c0a32f18 c04632e8
[    1.112500] 5ee0: ef0e5f5c ef0e5ef0 c0101d18 c0a32f04 00000000 c08b556e ef0e5f00 ef0e5f08
[    1.120659] 5f00: c013dcb4 c0a00654 00000000 c091f380 000000b1 c091f380 00000006 00000006
[    1.128817] 5f20: 000000b2 c091e598 efffcde4 00000000 00000000 00000007 c0b6ab00 00000007
[    1.136978] 5f40: c0b6ab00 000000b2 c0a51840 c0b6ab00 ef0e5f94 ef0e5f60 c0a00f50 c0101c14
[    1.145136] 5f60: 00000006 00000006 00000000 c0a00648 00000000 c06def9c 00000000 00000000
[    1.153296] 5f80: 00000000 00000000 ef0e5fac ef0e5f98 c06defb4 c0a00dc0 00000000 c06def9c
[    1.161454] 5fa0: 00000000 ef0e5fb0 c0107c08 c06defa8 00000000 00000000 00000000 00000000
[    1.169614] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.177772] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    1.185942] [<c04d6674>] (macb_interrupt) from [<c016b920>] (__free_irq+0x17c/0x298)
[    1.193662] [<c016b920>] (__free_irq) from [<c016bad8>] (free_irq+0x58/0x6c)
[    1.200695] [<c016bad8>] (free_irq) from [<c016f4d0>] (devm_irq_release+0x1c/0x20)
[    1.208246] [<c016f4d0>] (devm_irq_release) from [<c0464f98>] (release_nodes+0x1b0/0x1cc)
[    1.216403] [<c0464f98>] (release_nodes) from [<c0465598>] (devres_release_all+0x40/0x4c)
[    1.224566] [<c0465598>] (devres_release_all) from [<c04616e0>] (driver_probe_device+0x150/0x2c0)
[    1.233418] [<c04616e0>] (driver_probe_device) from [<c04618dc>] (__driver_attach+0x8c/0xb8)
[    1.241835] [<c04618dc>] (__driver_attach) from [<c045fbdc>] (bus_for_each_dev+0x90/0xa0)
[    1.249994] [<c045fbdc>] (bus_for_each_dev) from [<c0461154>] (driver_attach+0x28/0x30)
[    1.257980] [<c0461154>] (driver_attach) from [<c0460c98>] (bus_add_driver+0x184/0x1ec)
[    1.265966] [<c0460c98>] (bus_add_driver) from [<c04624b4>] (driver_register+0xb0/0xf0)
[    1.273955] [<c04624b4>] (driver_register) from [<c046331c>] (__platform_driver_register+0x40/0x54)
[    1.282983] [<c046331c>] (__platform_driver_register) from [<c0a32f18>] (macb_driver_init+0x20/0x28)
[    1.292095] [<c0a32f18>] (macb_driver_init) from [<c0101d18>] (do_one_initcall+0x110/0x130)
[    1.300433] [<c0101d18>] (do_one_initcall) from [<c0a00f50>] (kernel_init_freeable+0x19c/0x1e0)
[    1.309116] [<c0a00f50>] (kernel_init_freeable) from [<c06defb4>] (kernel_init+0x18/0x11c)
[    1.317358] [<c06defb4>] (kernel_init) from [<c0107c08>] (ret_from_fork+0x14/0x2c)
[    1.324900] Code: e8bd4000 e5915000 e1a04001 e5911008 (e5953008)
[    1.330980] ---[ end trace 1ebb284224a5896a ]---
[    1.335680] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.335680]
[    1.344743] CPU1: stopping
[    1.347431] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D         4.14.0-rc4-xilinx #1
[    1.355497] Hardware name: Xilinx Zynq Platform
[    1.360033] [<c011151c>] (unwind_backtrace) from [<c010c3ec>] (show_stack+0x20/0x24)
[    1.367752] [<c010c3ec>] (show_stack) from [<c06cc750>] (dump_stack+0xa8/0xdc)
[    1.374953] [<c06cc750>] (dump_stack) from [<c010fa64>] (handle_IPI+0x238/0x330)
[    1.382330] [<c010fa64>] (handle_IPI) from [<c01014cc>] (gic_handle_irq+0x94/0xa0)
[    1.389879] [<c01014cc>] (gic_handle_irq) from [<c010cef0>] (__irq_svc+0x70/0xb0)
[    1.397336] Exception stack(0xef0fff40 to 0xef0fff88)
[    1.402377] ff40: 00000001 00000000 00000000 00000000 00000000 00000000 ffffe000 c0b054a8
[    1.410537] ff60: 0000406a 413fc090 00000000 ef0fff9c ef0fff90 ef0fff90 c01086e4 c01086e8
[    1.418690] ff80: 60000113 ffffffff
[    1.422172] [<c010cef0>] (__irq_svc) from [<c01086e8>] (arch_cpu_idle+0x30/0x4c)
[    1.429555] [<c01086e8>] (arch_cpu_idle) from [<c06e5f84>] (default_idle_call+0x40/0x48)
[    1.437621] [<c06e5f84>] (default_idle_call) from [<c015aa60>] (do_idle+0x110/0x1c8)
[    1.445345] [<c015aa60>] (do_idle) from [<c015ac84>] (cpu_startup_entry+0x28/0x2c)
[    1.452897] [<c015ac84>] (cpu_startup_entry) from [<c010f5ac>] (secondary_start_kernel+0x130/0x154)
[    1.461923] [<c010f5ac>] (secondary_start_kernel) from [<0010194c>] (0x10194c)
[    1.469131] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.469131]
This happens not all the time, but in almost every try, may be releated to the randomness of interupts. 
 
I did a git bisect with the xilinx kernel from Github. Seems that this commit is the culprit:
$ git bisect bad
66bdede495c71da9c5ce18542976fae53642880b is the first bad commit
commit 66bdede495c71da9c5ce18542976fae53642880b
Author: Geert Uytterhoeven <geert+renesas@glider.be>
Date:   Wed Oct 18 13:54:03 2017 +0200

    of_mdio: Fix broken PHY IRQ in case of probe deferral

    If an Ethernet PHY is initialized before the interrupt controller it is
    connected to, a message like the following is printed:

        irq: no irq domain found for /interrupt-controller@e61c0000 !

    However, the actual error is ignored, leading to a non-functional (POLL)
    PHY interrupt later:

        Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL)

    Depending on whether the PHY driver will fall back to polling, Ethernet
    may or may not work.

    To fix this:
      1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
         of_irq_get().
         Unlike the former, the latter returns -EPROBE_DEFER if the
         interrupt controller is not yet available, so this condition can be
         detected.
         Other errors are handled the same as before, i.e. use the passed
         mdio->irq[addr] as interrupt.
      2. Propagate and handle errors from of_mdiobus_register_phy() and
         of_mdiobus_register_device().

    Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 19c3e6ee64cbeebdb85ae4b2935c28cac70f2557 c2a372a9a35c51c0a565e1559d2dcb79d4612487 M      drivers
 
Git bisect between tags/xilinx-v2017.4 and tags/xilinx-v2018.1
git bisect start
# good: [b450e900fdb473a53613ad014f31eedbc80b1c90] imx274: Fix error handling
git bisect good b450e900fdb473a53613ad014f31eedbc80b1c90
# bad: [15b23f7fa80ed8166af46fb4dd971dbc12d46ad2] drm: xlnx: zynqmp: Disable a plane when the fb format changes
git bisect bad 15b23f7fa80ed8166af46fb4dd971dbc12d46ad2
# good: [0be75179df5e20306528800fc7c6a504b12b97db] Merge tag 'driver-core-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect good 0be75179df5e20306528800fc7c6a504b12b97db
# good: [2cd648c110b5570c3280bd645797658cabbe5f5c] include/linux/sem.h: correctly document sem_ctime
git bisect good 2cd648c110b5570c3280bd645797658cabbe5f5c
# good: [aae3dbb4776e7916b6cd442d00159bea27a695c1] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect good aae3dbb4776e7916b6cd442d00159bea27a695c1
# good: [ae46654bcff303b33facbbd04a3ad9c21d303f9b] Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good ae46654bcff303b33facbbd04a3ad9c21d303f9b
# good: [2569e7e1d684e418ba7ffc9d0ad9a5f5247df0a0] Merge commit 'keys-fixes-20170927' into fixes-v4.14-rc3
git bisect good 2569e7e1d684e418ba7ffc9d0ad9a5f5247df0a0
# bad: [b5ac3beb5a9f0ef0ea64cd85faf94c0dc4de0e42] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect bad b5ac3beb5a9f0ef0ea64cd85faf94c0dc4de0e42
# good: [8d473320eebf938e9c2e3ce569e524554006362c] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good 8d473320eebf938e9c2e3ce569e524554006362c
# good: [e7a36a6ec9cf1b60273e48ee980b8920f333bd4d] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good e7a36a6ec9cf1b60273e48ee980b8920f333bd4d
# good: [545ea16f7c42969f94c769d0c2267cf4a65e5850] Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 545ea16f7c42969f94c769d0c2267cf4a65e5850
# good: [c92e8c02fe664155ac4234516e32544bec0f113d] tcp/dccp: fix ireq->opt races
git bisect good c92e8c02fe664155ac4234516e32544bec0f113d
# good: [54d431176429e9cf064461589e5174349a9f73da] sock: correct sk_wmem_queued accounting on efault in tcp zerocopy
git bisect good 54d431176429e9cf064461589e5174349a9f73da
# good: [e5f468b3f23313994c5e6c356135f9b0d76bcb94] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect good e5f468b3f23313994c5e6c356135f9b0d76bcb94
# good: [98870943a561c64aca22d10820a881aa4fa728e4] net: stmmac: Fix stmmac_get_rx_hwtstamp()
git bisect good 98870943a561c64aca22d10820a881aa4fa728e4
# good: [7433a8d6fa60a2f6910206fa10f3550c8f11f45f] textsearch: fix typos in library helpers
git bisect good 7433a8d6fa60a2f6910206fa10f3550c8f11f45f
# bad: [864e2a1f8aac05effac6063ce316b480facb46ff] ipv6: flowlabel: do not leave opt->tot_len with garbage
git bisect bad 864e2a1f8aac05effac6063ce316b480facb46ff
# bad: [66bdede495c71da9c5ce18542976fae53642880b] of_mdio: Fix broken PHY IRQ in case of probe deferral
git bisect bad 66bdede495c71da9c5ce18542976fae53642880b
# first bad commit: [66bdede495c71da9c5ce18542976fae53642880b] of_mdio: Fix broken PHY IRQ in case of probe deferral
 
Can somebody please try to recreate the issue? 
 
Any help is appreciated
Tags (4)
0 Kudos
0 Replies