07-21-2020 08:43 AM
Hi,
I have an Avnet Ultra96v2 dev board with Xilinx ZCU100 ZYNQMP+ SOC on it. I built a Xilinx 2020.1 kernel (5.4.0) and a buildroot rootfs on it. While booting fine it freezes frequently when executing programs on it. This is much worse when using a Xilinx 2019.1 rootfs (there currently is no 2020.1 BSP that would allow for building a 2020.1 rootfs).
What happens is that whenever I execute a "problematic" program I get a freeze. These programs include "ifconfig eth0 up" and "ip link set eth0 up" and more. Sometimes it even hangs at boot shortly before I would log in. With OpenOCD and gdb attached I found the program counter and with objdump -S --start-address=0xffffffc0105ded54 vmlinux | awk '{print $0} $3~/retq?/{exit}' I found the function in which I am stuck. It is the cdns_uart_console_putchar function in drivers/tty/serial/xilinx_uartps.c. To be precise I am stuck in the second while loop of the function:
static void cdns_uart_console_putchar(struct uart_port *port, int ch) { unsigned int ctrl_reg; ctrl_reg = readl(port->membase + CDNS_UART_CR); while (ctrl_reg & CDNS_UART_CR_TX_DIS) { ctrl_reg = readl(port->membase + CDNS_UART_CR); cpu_relax(); } while (readl(port->membase + CDNS_UART_SR) & CDNS_UART_SR_TXFULL) cpu_relax(); writel(ch, port->membase + CDNS_UART_FIFO); }
At least that is what I assume when I interpret the assembler code given to me in gdb as I am looping between fff...ded54 and ...74 (see below). Thus, I inferred that the TX FIFO must be full (the condition in the second while loop, see above). The part I don't get is why that would happen. Any hints are appreciated.
ffffffc0105ded10 <cdns_uart_console_putchar>: ffffffc0105ded10: f9400802 ldr x2, [x0, #16] ffffffc0105ded14: b9400042 ldr w2, [x2] ffffffc0105ded18: d50331bf dmb oshld ffffffc0105ded1c: 2a0203e3 mov w3, w2 ffffffc0105ded20: ca030063 eor x3, x3, x3 ffffffc0105ded24: b5000003 cbnz x3, ffffffc0105ded24 <cdns_uart_console_putchar+0x14> ffffffc0105ded28: 36280182 tbz w2, #5, ffffffc0105ded58 <cdns_uart_console_putchar+0x48> ffffffc0105ded2c: d503201f nop ffffffc0105ded30: f9400802 ldr x2, [x0, #16] ffffffc0105ded34: b9400042 ldr w2, [x2] ffffffc0105ded38: d50331bf dmb oshld ffffffc0105ded3c: 2a0203e3 mov w3, w2 ffffffc0105ded40: ca030063 eor x3, x3, x3 ffffffc0105ded44: b5000003 cbnz x3, ffffffc0105ded44 <cdns_uart_console_putchar+0x34> ffffffc0105ded48: d503203f yield ffffffc0105ded4c: 372fff22 tbnz w2, #5, ffffffc0105ded30 <cdns_uart_console_putchar+0x20> ffffffc0105ded50: 14000002 b ffffffc0105ded58 <cdns_uart_console_putchar+0x48> ffffffc0105ded54: d503203f yield ffffffc0105ded58: f9400802 ldr x2, [x0, #16] ffffffc0105ded5c: 9100b042 add x2, x2, #0x2c ffffffc0105ded60: b9400042 ldr w2, [x2] ffffffc0105ded64: d50331bf dmb oshld ffffffc0105ded68: 2a0203e3 mov w3, w2 ffffffc0105ded6c: ca030063 eor x3, x3, x3 ffffffc0105ded70: b5000003 cbnz x3, ffffffc0105ded70 <cdns_uart_console_putchar+0x60> ffffffc0105ded74: 3727ff02 tbnz w2, #4, ffffffc0105ded54 <cdns_uart_console_putchar+0x44> ffffffc0105ded78: d50332bf dmb oshst ffffffc0105ded7c: f9400800 ldr x0, [x0, #16] ffffffc0105ded80: 9100c000 add x0, x0, #0x30 ffffffc0105ded84: b9000001 str w1, [x0] ffffffc0105ded88: d65f03c0 ret ffffffc0105ded8c: d503201f nop
I already took a look at the version history in the git repository but it did not offer me any insight on why this is happening.
Furthermore, using OpenOCD and gdb I was unable to set any breakpoints into the delinquent function (or any function for that matter). For reference, I called openocd as follows: openocd -f board/avnet_ultra96v2.cfg (version 0.10.0.r1193.g5c8de6a72-1) and gdb version 9.2-1as follows: aarch64-linux-gnu-gdb then in the gdb: tar ext :3333; symbol-file vmlinux; layout split; b cdns_uart_console_putchar; c.
It would only proceed with the following error message:
Continuing.
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0xffffffc0105ded10
So any hints to why this does not work either are also appreciated.
Thanks in advance y'all,
Alex
07-21-2020 09:45 AM - edited 07-21-2020 09:46 AM
Rolling back the driver to the tag xilinx-v2019.1 (just the driver not the whole kernel) gets rid of this problem. Would still like to know what caused it though.
07-23-2020 07:01 AM - edited 07-23-2020 12:45 PM
Maybe you could do a comparison of the driver source code between 5.4 and the xilinx-v2019.1 tag, to see what has changed.
From a quick look at the stable-linux-5.4.y tree, it seems that a number of recent commits have been reverted: https://lore.kernel.org/linux-serial/20190523091839.GC568@localhost/
If you are using vanilla linux-5.4 kernel, it might be worthwhile to try switching to stable-5.4.y, to get these (and other) fixes.
[Edit: per the reply from @sandeepg below, this is indeed the issue. The specific patch that causes the problem was fixed in stable-5.4.y tree here. It is a one-line change in the xilinx_uartps.c driver.]
07-23-2020 12:04 PM
Hi @tangboshi ,
We are shipping an AR with attached patch to Avnet team and this patch should be part of their Ultra96v2 OOB BSP.
1) Copy the attached patch from the Attachments section to the linux-xlnx directory as shown below.
If these directory and recipes do not exist then manually create <plnx-proj-root>/project-spec/meta-user/recipes-kernel/linux/linux-xlnx
$ cp 0001-Revert-tty-xilinx_uartps-Add-the-id-to-the-console.patch <plnx-proj-root>/project-spec/meta-user/recipes-kernel/linux/linux-xlnx
2) Modify the linux-xlnx_%.bbappend file with the below content using a text editor:
$ vim <plnx-proj-root>/project-spec/meta-user/recipes-kernel/linux/linux-xlnx_%.bbappend
# linux-xlnx_%.bbappend content SRC_URI += " \ file://0001-Revert-tty-xilinx_uartps-Add-the-id-to-the-console.patch \ " FILESEXTRAPATHS_prepend := "${THISDIR}/${PN}:"
3) Clean the kernel sstate cache and rebuild the kernel recipes:
$ petalinux-build -c kernel -x cleansstate
$ petalinux-build -c kernel
07-23-2020 02:06 PM - edited 07-23-2020 02:07 PM
No, this is NOT the ONLY problem. I should have mentioned that I used the Xilinx/linux-xlnx fork of the Linux kernel. That version of the driver (revision history) already features the revert since the 8th July. Without that revert I could not even get any output past "Starting Kernel". Even with the revert I get freezes with the current head version (6d965ab3773403618a66c6cd026e954b2513dba8). There must be at least one more problem!
07-23-2020 04:48 PM - edited 07-23-2020 06:29 PM
I am also using Ultra96v2.
I made the following modifications to drivers/tty/serial/xilinx_uartps.c.
Postscript: It seems that a new patch has been posted. This may be better.
https://www.spinics.net/lists/linux-serial/msg39343.html
07-23-2020 05:02 PM
Hi All,
For 2020.1 release you need use xlnx_rebase_v5.4_2020.1 tags and corresponding commit id is 22b71b41620dac13c69267d2b7898ebfb14c954e.
Looks like you are using master branch which is always work in progress.
For more details refer 2020.1 release notes AR https://www.xilinx.com/support/answers/73686.html
11-17-2020 06:49 AM
@kawazomeThanks a lot! I am also using Ultra96 v2 with 2020.1 tools, using UART1 as serial0 and PetaLinux was hanging after these lines:
console [tty0] enabled
bootconsole [cdns0] disabled
My device tree and configs were all correct.
I found the Xilinx answer record 75417 and tried it, but as someone here mentioned: it was not enough. Even with the Xilinx patch, I was still hanging after those lines. So then I tried your second patch and it worked and I was able to boot all the way to the login.
This was the patch that worked for me: https://www.spinics.net/lists/linux-serial/msg39343.html
Jeff