cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
tim_severance
Scholar
Scholar
776 Views
Registered: ‎03-03-2017

Zynq UltraScale+ Xil_SyncAbortHandler - How to debug?

I am stuck in my application using a XZCU4CG and Vivado / Vitis 2020.1 where I the code is always ending up in the Xil_SyncAbortHandler function infinite loop in Platform/psu_cortexa53_0/standalone_domain/bsp/psu_cortexa53_0/libsrc/standalone_v7_2/src/xil_exception.c.

I am struggling with understanding how to debug this.

Is there any Xilinx document that would help in debugging a situation like this?   And if not Xilinx, any other documentation that can help?

Thanks.

Tim

0 Kudos
14 Replies
tim_severance
Scholar
Scholar
616 Views
Registered: ‎03-03-2017

Trying to narrow down where the problem is I do the following:.

1. Stop at a breakpoint near where the issue seems to occur.

2. Right click the Cortex-A53 in the Vitis Debug view and select Instruction Stepping Mode:

tim_severance_0-1627055757041.png

3. F5 to "step into" so no commands are skipped.   Here are my last couple of F5s before it lands in the Abort Handler:

tim_severance_1-1627055906809.png

tim_severance_2-1627055970766.png

tim_severance_3-1627056133996.png

4. Hit F8 to continue through the Abort Handler and land in the infinite loop.   Then hit the Pause (Suspend) button to stop in the Abort Handler infinite loop.

tim_severance_4-1627056257327.png

5. Look at Registers tab.   Hopefully this will tell me something??

tim_severance_5-1627056339233.png

tim_severance_6-1627056369964.png

 

 

 

 

0 Kudos
ericv
Scholar
Scholar
568 Views
Registered: ‎04-13-2015

@tim_severance , thanks for providing all that info as we can see everything. You indicate it traps on ".word 0x00000000". The issue is right there: 0x00000000 is not a valid Aarch64 instruction. This is reported in the syndrome register ESR_EL3 (0x02000000), bit #25 is 1 and, although there are many triggers for that bit to be raised, in your case it's surely that illegal instruction. You are facing a compiler/optimizer bug.

There are no magic bullet to solve that. Here's a short list of things that could be tried

- Compile that function with a lower optimization level.

- Re-work the code, throw static, volatile here and there for local variables.

- Put right at the beginning asm(" orr, x0, x0, x0") - an asm() statement sort of make the optimizer to reset itself - that one is a NOP.

- Write that function in assembly language - take the dis-assembled code and remove .word 0x00000000.

- A different compiler version

ericv
Scholar
Scholar
531 Views
Registered: ‎04-13-2015

@tim_severance , looking more carefully at the disassembly code, a wild pointer seems more probable because there are two pairs of 0x00000000 words. Check right after loading the code (before running) if these 0x00000000 are there. If not, these locations are over-written during run-time - then use watch points on write at these addresses as it will stop the code when they are written.

watari
Professor
Professor
510 Views
Registered: ‎06-16-2013

Hi @tim_severance 

 

If you debug your code by disassemble code without C language, I suggest you to refer aarch64 ABI.

It might be helpful for you.

 

https://developer.arm.com/documentation/ihi0055/b/

 

Hope this helps.

 

Best regards,

tim_severance
Scholar
Scholar
471 Views
Registered: ‎03-03-2017

@ericv Thank you for the great response, this is very helpful for an ARM newbie!   

Regarding setting a lower optimization I tend to set the extra compiler flags to "-g3 -O0 -DDEBUG" for all 3 BSPs shown below.   I believe the "O0" is the lowest optimization setting, right?

tim_severance_1-1627308966999.png

Tim

 

0 Kudos
tim_severance
Scholar
Scholar
458 Views
Registered: ‎03-03-2017

@ericv, I have another version of my code that breaks at a different point in the code, and low-and-behold, it also breaks at ".word 0x00000000" in another function!!

Wow, I am shocked that the compiler is doing this!

0 Kudos
tim_severance
Scholar
Scholar
448 Views
Registered: ‎03-03-2017

@ericv , Sorry so many responses.  One more.

In my latest one that is breaking in a similar fashion it is occurring in the function shown below.

tim_severance_0-1627311710522.png

Even with the ASM command added at the top of that function, there is still the .WORD 0x00000000 command below.

tim_severance_1-1627311790805.png

Did I do the ASM command correctly?   Below I show where I added it in the C code:

tim_severance_2-1627311891499.png

Thanks!

Tim

 

 

0 Kudos
ericv
Scholar
Scholar
437 Views
Registered: ‎04-13-2015

@tim_severance , yes -O0 is lowest optimization, in fact it's no optimization at all. As you are at -o0, the asm() is not useful.  Please confirm these 0x00000000 are there before you run the code.

tim_severance
Scholar
Scholar
414 Views
Registered: ‎03-03-2017

@ericv 

Interesting.

So when I launch the debugger and stop just before it would hit the .WORD 0x00000000 I can indeed see it in the Assembly view of the debugger (see below):

tim_severance_0-1627316165807.png

But if I take the assembly view I generated from the ELF file which I generated before launching debugger I don't see those lines at addresses 0x8000,0x8004,0x8010,0x8014 (see below):

tim_severance_1-1627316432765.png

Do you know why these would be different?

Thanks.

Tim

 

 

0 Kudos
ericv
Scholar
Scholar
390 Views
Registered: ‎04-13-2015

@tim_severance , as the freshly loaded code doesn't have these 0x00000000, it's clearly not a compiler problem.  This confirm what I've indicated in a previous post - you have in all likelihood a wild pointer, i.e. un-initalized or corrupted, and when you run your application, 0x00000000 gets written at these locations.  The easiest way to find what is that pointer (or those pointers) is to use watch-points.

 

tim_severance
Scholar
Scholar
382 Views
Registered: ‎03-03-2017

@ericv Thanks again for all your assistance, you have been a life saver.

It appears that AXI Timer setup is overwriting that code section.   If you look at my xparameters.h below you will see that the BASEADDR is set to that area.   I am guessing this is an incorrect base address.

tim_severance_0-1627321148205.png

 

0 Kudos
tim_severance
Scholar
Scholar
332 Views
Registered: ‎03-03-2017

@ericv I seemed to have found a solution and it appears to be a Xilinx driver issue.   I have detailed the solution at the forum HERE.

Thanks again for your assistance, it was much needed and is much appreciated!

Tim

0 Kudos
watari
Professor
Professor
310 Views
Registered: ‎06-16-2013

Hi @tim_severance 

 

>So when I launch the debugger and stop just before it would hit the .WORD 0x00000000 I can indeed see it in the Assembly view of the debugger (see below):

 

It seems that debugger changes an instruction to raise NMI for break point.

It means that since A53 fetches an operation at 0x8000 and 0x8004 at the same time, debugger changes instructions from  "sxth w0, w0" & "orr w0, w1, w0" to undefined instruction ("0x00000000" & "0x00000000").

 

Hope this helps.

 

Best regards,

tim_severance
Scholar
Scholar
309 Views
Registered: ‎03-03-2017

@watari The DP TX driver was overwriting the memory due to “bug” in the timer initialization code.   I have detailed the issue in my response above.   
Thanks.  
Tim

0 Kudos