09-16-2019 10:53 AM
Hello, I received the following error on the serial console
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 9 Unhandled Exception in EL3. x30 = 0x0000000000003140 x0 = 0x000000000000aa40 x1 = 0x000000000000fba8 x2 = 0x000000000000fba4 x3 = 0x0000000000000001 x4 = 0x000000000000006c x5 = 0x000000000000005b x6 = 0x000000000000fba0 x7 = 0x0000000000000001 x8 = 0xffffff8010c942f0 x9 = 0x0000000000000000 x10 = 0x0000000000000000 x11 = 0x000000000000bc50 x12 = 0x000000000000fc00 x13 = 0x000000000000000d x14 = 0x0000000000011470 x15 = 0x0000000000001dd8 x16 = 0x00000000a0000005 x17 = 0xffffff8010098ce4 x18 = 0x0000000000000731 x19 = 0x0000000000000000 x20 = 0x000000000000006c x21 = 0x000000000000fbf0 x22 = 0x000000000000006c x23 = 0x0000000000000000 x24 = 0x000000000000006c x25 = 0x0000000007735940 x26 = 0xffffff8010a92b78 x27 = 0x0000000000000000 x28 = 0x0000000000000000 x29 = 0x000000000000fb70 scr_el3 = 0x0000000000000731 sctlr_el3 = 0x0000000030cd183f cptr_el3 = 0x0000000000000000 tcr_el3 = 0x0000000080803520 daif = 0x00000000000003c0 mair_el3 = 0x00000000004404ff spsr_el3 = 0x00000000000002cc elr_el3 = 0x0000000000001678 ttbr0_el3 = 0x0000000000011400 esr_el3 = 0x0000000096000061 far_el3 = 0x000000000000fba4 spsr_el1 = 0x0000000060000005 elr_el1 = 0xffffff801091ee90 spsr_abt = 0x0000000000000000 spsr_und = 0x0000000000000000 spsr_irq = 0x0000000000000000 spsr_fiq = 0x0000000000000000 sctlr_el1 = 0x0000000034d4d91d actlr_el1 = 0x0000000000000000 cpacr_el1 = 0x0000000000300000 csselr_el1 = 0x0000000000000000 sp_el1 = 0xffffff8010e3b890 esr_el1 = 0x000000009200000b ttbr0_el1 = 0x000000006aed7000 ttbr1_el1 = 0x0166000006b89000 mair_el1 = 0x0000bbff440c0400 amair_el1 = 0x0000000000000000 tcr_el1 = 0x00000032b5593519 tpidr_el1 = 0x000000405ef17000 tpidr_el0 = 0x0000000000000000 tpidrro_el0 = 0x0000000000000000 dacr32_el2 = 0x0000000000000000 ifsr32_el2 = 0x0000000000000000 par_el1 = 0x0000000000000000 mpidr_el1 = 0x0000000080000000 afsr0_el1 = 0x0000000000000000 afsr1_el1 = 0x0000000000000000 contextidr_el1 = 0x0000000000000000 vbar_el1 = 0xffffff8010082000 cntp_ctl_el0 = 0x0000000000000005 cntp_cval_el0 = 0x000000083ea7c51a cntv_ctl_el0 = 0x0000000000000000 cntv_cval_el0 = 0x8820a00280000100 cntkctl_el1 = 0x00000000000000d6 sp_el0 = 0x000000000000fb70 isr_el1 = 0x0000000000000080 cpuectlr_el1 = 0x0000000000000040 cpumerrsr_el1 = 0x0000000001000008 l2merrsr_el1 = 0x0000000010080040 cpuactlr_el1 = 0x00001000080ca000 gicc_hppir = 0x00000000000003fe gicc_ahppir = 0x0000000000000400 gicc_ctlr = 0x00000000000005eb gicd_ispendr regs (Offsets 0x200 - 0x278) Offset: value 0000000000000200: 0x0000000040000001 0000000000000208: 0x0000000200000000 0000000000000210: 0x0000000000000000 0000000000000218: 0x0000000000000000 0000000000000220: 0x0000000000000000 0000000000000228: 0x0000000000000000 0000000000000230: 0x0000000000000000 0000000000000238: 0x0000000000000000 0000000000000240: 0x0000000000000000 0000000000000248: 0x0000000000000000 0000000000000250: 0x0000000000000000 0000000000000258: 0x0000000000000000 0000000000000260: 0x0000000000000000 0000000000000268: 0x0000000000000000 0000000000000270: 0x0000000000000000 0000000000000278: 0x0000000000000000 cci_snoop_ctrl_cluster0 = 0x00000000c0000003 cci_snoop_ctrl_cluster1 = 0x00000000c0000000 DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15
It looks like this error comes from the ATF (Arm Trused Firmware). This is using the 2019.1 atf tagged release form the Xilinx git. The pmufw and fsbl are also built from the 2019.1 sdk. The linux kernel is a 5.2.10 kernel.
So I have a couple of questions related to this.
First, obviously what is causing it and how to fix it?
Second, is there any way to see this information other than from the serial console directly?
All I was really doing from the linux side was testing eth0. I was bringing the interface down and up (ifdown eth0 & ifup eth0). The ifup resulted in the DHCPDISCOVER lines.
09-19-2019 11:16 PM
@pthomas if you are using 2019.1 ATF/FSBL/PMUFW, you should use 2019.1 Linux(4.19 kernel). And mixing the version is not supported and tested. AFAIK, Xilinx has never tested 5.2 kernel. So please use 4.19 kernel(2019.1 linux) instead.
09-20-2019 06:05 AM
Hi longley, thank you for the response, but this is not very helpful. First, the zynqmp firmware driver has been upstreamed since September of 2018.
Second, it would be a horrible architecture design to allow a layer higher in the stack to cause an error lower in the stack. As an analogy this would be like saying a userspace application always has be correct, and if does something wrong such as trying to access memory out side of it's virtual address space (seg fault) then the Linux kernel would panic. No this is not how it works, if a userspace application seg faults, the kernel humms along just fine. So bringing it back to this issues, if the upstream kernel wasn't up-to-date and did something like call arm_smccc_smc() incorrectly, that resulted in an error being returned then we could debug that. But that is not what happened.
Third, the whole point of the forum is to bring issues into the open so that people with varied and broad experience can see it. We ARE using the mainline kernel in out application. Both because there are features that are not supported in 4.19 and because this a best practice from a long term support point of view. Here's a hypothetical (although I have experienced real examples similar to this). In three years, when the kernel community has moved on to 6.x.x, we have an issue with a USB serial driver. Is someone from Xilinx going to help or be responsible for a bug that has nothing to do with Xilinx? No. This is why customers insist on mainline support, and why Xilinx spends so much time and effort pushing their drivers upstream. It is an unpleasant and slow process, but it is the only way to have a vibrant long-term ecosystem.
So let's focus on debugging the actual issue.
10-04-2019 11:02 AM
I have the same error message when booting the Linux kernel from the SD card. The FSBL, u-boot, ATF, and PMU firmware were all built using the 2018.1 repositories and 2018.1 SDK. The Linux version is 4.14.0-xilinx-v2018.1.
The boot sequence has reached the loading of the root filesystem when the unhandled exception occurs. The boot sequence does encounter two previous errors, but it is able to continue until the unhandled exception. The previous errors are "ERROR: could not get clock /amba_pl@0/dma@a0030000:m_axi_s2mm_aclk(3)" and "ERROR: could not get clock /amba_pl@0/dma@a0031000:m_axi_s2mm_aclk(3)". I am not sure if the unhandled exception and the clock errors are connected, but I am debugging the two clock issues. I am not sure how to proceed solving the unhandled exception.
I've uploaded the output of the boot process. Thank you in advance for any help.
11-19-2019 10:18 AM
I am not sure if this was your problem, but hopefully this helps someone. I was receiving many different error messages including the Unhandled Exception. My problem was the location of the ARM trusted firmware (ATF). According to https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842107/Arm+Trusted+Firmware:
"By default, the Arm-trusted firmware builds for OCM space at address 0xFFFEA000. But, with DEBUG flag set to 1, it can't fit in OCM, so by default with DEBUG=1, it builds for DDR location 0x1000 with build flag DEBUG=1 mentioned while building."
My ATF was compiled with DEBUG=1, so it was loaded into DDR, and was subsequently being overwritten causing the unhandled exception. Changing the compilation to release solved my problem.