07-22-2021 02:29 PM
I am stuck in my application using a XZCU4CG and Vivado / Vitis 2020.1 where I the code is always ending up in the Xil_SyncAbortHandler function infinite loop in Platform/psu_cortexa53_0/standalone_domain/bsp/psu_cortexa53_0/libsrc/standalone_v7_2/src/xil_exception.c.
I am struggling with understanding how to debug this.
Is there any Xilinx document that would help in debugging a situation like this? And if not Xilinx, any other documentation that can help?
07-23-2021 09:06 AM
Trying to narrow down where the problem is I do the following:.
1. Stop at a breakpoint near where the issue seems to occur.
2. Right click the Cortex-A53 in the Vitis Debug view and select Instruction Stepping Mode:
3. F5 to "step into" so no commands are skipped. Here are my last couple of F5s before it lands in the Abort Handler:
4. Hit F8 to continue through the Abort Handler and land in the infinite loop. Then hit the Pause (Suspend) button to stop in the Abort Handler infinite loop.
5. Look at Registers tab. Hopefully this will tell me something??
07-24-2021 07:53 PM
@tim_severance , thanks for providing all that info as we can see everything. You indicate it traps on ".word 0x00000000". The issue is right there: 0x00000000 is not a valid Aarch64 instruction. This is reported in the syndrome register ESR_EL3 (0x02000000), bit #25 is 1 and, although there are many triggers for that bit to be raised, in your case it's surely that illegal instruction. You are facing a compiler/optimizer bug.
There are no magic bullet to solve that. Here's a short list of things that could be tried
- Compile that function with a lower optimization level.
- Re-work the code, throw static, volatile here and there for local variables.
- Put right at the beginning asm(" orr, x0, x0, x0") - an asm() statement sort of make the optimizer to reset itself - that one is a NOP.
- Write that function in assembly language - take the dis-assembled code and remove .word 0x00000000.
- A different compiler version
07-25-2021 02:33 PM
@tim_severance , looking more carefully at the disassembly code, a wild pointer seems more probable because there are two pairs of 0x00000000 words. Check right after loading the code (before running) if these 0x00000000 are there. If not, these locations are over-written during run-time - then use watch points on write at these addresses as it will stop the code when they are written.
07-25-2021 04:56 PM
If you debug your code by disassemble code without C language, I suggest you to refer aarch64 ABI.
It might be helpful for you.
Hope this helps.
07-26-2021 07:16 AM
@ericv Thank you for the great response, this is very helpful for an ARM newbie!
Regarding setting a lower optimization I tend to set the extra compiler flags to "-g3 -O0 -DDEBUG" for all 3 BSPs shown below. I believe the "O0" is the lowest optimization setting, right?
07-26-2021 07:51 AM
@ericv, I have another version of my code that breaks at a different point in the code, and low-and-behold, it also breaks at ".word 0x00000000" in another function!!
Wow, I am shocked that the compiler is doing this!
07-26-2021 08:05 AM
@ericv , Sorry so many responses. One more.
In my latest one that is breaking in a similar fashion it is occurring in the function shown below.
Even with the ASM command added at the top of that function, there is still the .WORD 0x00000000 command below.
Did I do the ASM command correctly? Below I show where I added it in the C code:
07-26-2021 08:40 AM
07-26-2021 09:20 AM
So when I launch the debugger and stop just before it would hit the .WORD 0x00000000 I can indeed see it in the Assembly view of the debugger (see below):
But if I take the assembly view I generated from the ELF file which I generated before launching debugger I don't see those lines at addresses 0x8000,0x8004,0x8010,0x8014 (see below):
Do you know why these would be different?
07-26-2021 10:04 AM
@tim_severance , as the freshly loaded code doesn't have these 0x00000000, it's clearly not a compiler problem. This confirm what I've indicated in a previous post - you have in all likelihood a wild pointer, i.e. un-initalized or corrupted, and when you run your application, 0x00000000 gets written at these locations. The easiest way to find what is that pointer (or those pointers) is to use watch-points.
07-26-2021 10:39 AM
@ericv Thanks again for all your assistance, you have been a life saver.
It appears that AXI Timer setup is overwriting that code section. If you look at my xparameters.h below you will see that the BASEADDR is set to that area. I am guessing this is an incorrect base address.
07-26-2021 01:43 PM
07-26-2021 02:47 PM
>So when I launch the debugger and stop just before it would hit the .WORD 0x00000000 I can indeed see it in the Assembly view of the debugger (see below):
It seems that debugger changes an instruction to raise NMI for break point.
It means that since A53 fetches an operation at 0x8000 and 0x8004 at the same time, debugger changes instructions from "sxth w0, w0" & "orr w0, w1, w0" to undefined instruction ("0x00000000" & "0x00000000").
Hope this helps.
07-26-2021 02:50 PM