cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
pthomas
Adventurer
Adventurer
1,212 Views
Registered: ‎04-22-2015

Unhandled Exception in EL3.

Hello, I received the following error on the serial console

DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 9
Unhandled Exception in EL3.
x30 =           0x0000000000003140
x0 =            0x000000000000aa40
x1 =            0x000000000000fba8
x2 =            0x000000000000fba4
x3 =            0x0000000000000001
x4 =            0x000000000000006c
x5 =            0x000000000000005b
x6 =            0x000000000000fba0
x7 =            0x0000000000000001
x8 =            0xffffff8010c942f0
x9 =            0x0000000000000000
x10 =           0x0000000000000000
x11 =           0x000000000000bc50
x12 =           0x000000000000fc00
x13 =           0x000000000000000d
x14 =           0x0000000000011470
x15 =           0x0000000000001dd8
x16 =           0x00000000a0000005
x17 =           0xffffff8010098ce4
x18 =           0x0000000000000731
x19 =           0x0000000000000000
x20 =           0x000000000000006c
x21 =           0x000000000000fbf0
x22 =           0x000000000000006c
x23 =           0x0000000000000000
x24 =           0x000000000000006c
x25 =           0x0000000007735940
x26 =           0xffffff8010a92b78
x27 =           0x0000000000000000
x28 =           0x0000000000000000
x29 =           0x000000000000fb70
scr_el3 =               0x0000000000000731
sctlr_el3 =             0x0000000030cd183f
cptr_el3 =              0x0000000000000000
tcr_el3 =               0x0000000080803520
daif =          0x00000000000003c0
mair_el3 =              0x00000000004404ff
spsr_el3 =              0x00000000000002cc
elr_el3 =               0x0000000000001678
ttbr0_el3 =             0x0000000000011400
esr_el3 =               0x0000000096000061
far_el3 =               0x000000000000fba4
spsr_el1 =              0x0000000060000005
elr_el1 =               0xffffff801091ee90
spsr_abt =              0x0000000000000000
spsr_und =              0x0000000000000000
spsr_irq =              0x0000000000000000
spsr_fiq =              0x0000000000000000
sctlr_el1 =             0x0000000034d4d91d
actlr_el1 =             0x0000000000000000
cpacr_el1 =             0x0000000000300000
csselr_el1 =            0x0000000000000000
sp_el1 =                0xffffff8010e3b890
esr_el1 =               0x000000009200000b
ttbr0_el1 =             0x000000006aed7000
ttbr1_el1 =             0x0166000006b89000
mair_el1 =              0x0000bbff440c0400
amair_el1 =             0x0000000000000000
tcr_el1 =               0x00000032b5593519
tpidr_el1 =             0x000000405ef17000
tpidr_el0 =             0x0000000000000000
tpidrro_el0 =           0x0000000000000000
dacr32_el2 =            0x0000000000000000
ifsr32_el2 =            0x0000000000000000
par_el1 =               0x0000000000000000
mpidr_el1 =             0x0000000080000000
afsr0_el1 =             0x0000000000000000
afsr1_el1 =             0x0000000000000000
contextidr_el1 =                0x0000000000000000
vbar_el1 =              0xffffff8010082000
cntp_ctl_el0 =          0x0000000000000005
cntp_cval_el0 =         0x000000083ea7c51a
cntv_ctl_el0 =          0x0000000000000000
cntv_cval_el0 =         0x8820a00280000100
cntkctl_el1 =           0x00000000000000d6
sp_el0 =                0x000000000000fb70
isr_el1 =               0x0000000000000080
cpuectlr_el1 =          0x0000000000000040
cpumerrsr_el1 =         0x0000000001000008
l2merrsr_el1 =          0x0000000010080040
cpuactlr_el1 =          0x00001000080ca000
gicc_hppir =            0x00000000000003fe
gicc_ahppir =           0x0000000000000400
gicc_ctlr =             0x00000000000005eb
gicd_ispendr regs (Offsets 0x200 - 0x278)
 Offset:                        value
0000000000000200:               0x0000000040000001
0000000000000208:               0x0000000200000000
0000000000000210:               0x0000000000000000
0000000000000218:               0x0000000000000000
0000000000000220:               0x0000000000000000
0000000000000228:               0x0000000000000000
0000000000000230:               0x0000000000000000
0000000000000238:               0x0000000000000000
0000000000000240:               0x0000000000000000
0000000000000248:               0x0000000000000000
0000000000000250:               0x0000000000000000
0000000000000258:               0x0000000000000000
0000000000000260:               0x0000000000000000
0000000000000268:               0x0000000000000000
0000000000000270:               0x0000000000000000
0000000000000278:               0x0000000000000000
cci_snoop_ctrl_cluster0 =               0x00000000c0000003
cci_snoop_ctrl_cluster1 =               0x00000000c0000000
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15

It looks like this error comes from the ATF (Arm Trused Firmware). This is using the 2019.1 atf tagged release form the Xilinx git. The pmufw and fsbl are also built from the 2019.1 sdk. The linux kernel is a 5.2.10 kernel.

So I have a couple of questions related to this.

First, obviously what is causing it and how to fix it?

Second, is there any way to see this information other than from the serial console directly?

All I was really doing from the linux side was testing eth0. I was bringing the interface down and up (ifdown eth0 & ifup eth0). The ifup resulted in the DHCPDISCOVER lines.

thanks,

Paul

 

0 Kudos
4 Replies
longley
Xilinx Employee
Xilinx Employee
1,166 Views
Registered: ‎04-15-2011

@pthomas if you are using 2019.1 ATF/FSBL/PMUFW, you should use 2019.1 Linux(4.19 kernel). And mixing the version is not supported and tested. AFAIK, Xilinx has never tested 5.2 kernel. So please use 4.19 kernel(2019.1 linux) instead.

Thanks,

Longley


------------------------------------------------------------------------------------------------

Don’t forget to reply, kudo, and accept as solution.

If starting with Versal take a look at our Versal Design Process Hub and our
Versal Blogs

------------------------------------------------------------------------------------------------
0 Kudos
pthomas
Adventurer
Adventurer
1,150 Views
Registered: ‎04-22-2015

Hi longley, thank you for the response, but this is not very helpful. First, the zynqmp firmware driver has been upstreamed since September of 2018.

Second, it would be a horrible architecture design to allow a layer higher in the stack to cause an error lower in the stack. As an analogy this would be like saying a userspace application always has be correct, and if does something wrong such as trying to access memory out side of it's virtual address space (seg fault) then the Linux kernel would panic. No this is not how it works, if a userspace application seg faults, the kernel humms along just fine. So bringing it back to this issues, if the upstream kernel wasn't up-to-date and did something like call arm_smccc_smc() incorrectly, that resulted in an error being returned then we could debug that. But that is not what happened.

Third, the whole point of the forum is to bring issues into the open so that people with varied and broad experience can see it. We ARE using the mainline kernel in out application. Both because there are features that are not supported in 4.19 and because this a best practice from a long term support point of view. Here's a hypothetical (although I have experienced real examples similar to this). In three years, when the kernel community has moved on to 6.x.x, we have an issue with a USB serial driver. Is someone from Xilinx going to help or be responsible for a bug that has nothing to do with Xilinx? No. This is why customers insist on mainline support, and why Xilinx spends so much time and effort pushing their drivers upstream. It is an unpleasant and slow process, but it is the only way to have a vibrant long-term ecosystem.

So let's focus on debugging the actual issue.

thanks,

Paul

0 Kudos
davidlucking
Visitor
Visitor
1,079 Views
Registered: ‎09-20-2019

I have the same error message when booting the Linux kernel from the SD card. The FSBL, u-boot, ATF, and PMU firmware were all built using the 2018.1 repositories and 2018.1 SDK. The Linux version is 4.14.0-xilinx-v2018.1.

The boot sequence has reached the loading of the root filesystem when the unhandled exception occurs. The boot sequence does encounter two previous errors, but it is able to continue until the unhandled exception. The previous errors are "ERROR: could not get clock /amba_pl@0/dma@a0030000:m_axi_s2mm_aclk(3)" and "ERROR: could not get clock /amba_pl@0/dma@a0031000:m_axi_s2mm_aclk(3)". I am not sure if the unhandled exception and the clock errors are connected, but I am debugging the two clock issues. I am not sure how to proceed solving the unhandled exception.

I've uploaded the output of the boot process.  Thank you in advance for any help.

0 Kudos
davidlucking
Visitor
Visitor
936 Views
Registered: ‎09-20-2019

I am not sure if this was your problem, but hopefully this helps someone. I was receiving many different error messages including the Unhandled Exception. My problem was the location of the ARM trusted firmware (ATF). According to https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842107/Arm+Trusted+Firmware:

"By default, the Arm-trusted firmware builds for OCM space at address 0xFFFEA000. But, with DEBUG flag set to 1, it can't fit in OCM, so by default with DEBUG=1, it builds for DDR location 0x1000 with build flag DEBUG=1 mentioned while building."

My ATF was compiled with DEBUG=1, so it was loaded into DDR, and was subsequently being overwritten causing the unhandled exception. Changing the compilation to release solved my problem.

0 Kudos