09-07-2017 11:18 AM
I am having a tricky issue related to Xilinx's xscugic driver and the UART PS. The bug is very repeatable in the sense that it will fail every time when running a program repeatedly. However it is very unrepeatable in that if I change small unrelated parts of the code (like declaring an unused variable) the bug will appear/disappear. This makes me think it has something to do with timing/interrupts happening at an inopportune time for the interrupt handler.
The actual bug is that the program will halt execution in the function Xil_DataAbortHandler(). The stack trace looks like this:
some random memory location
It happens repeatably after an interrupt, but which interrupt it happens after completely depends on the program.
I'm pretty confident that the bug has something to do with calling XUartPs_Send() inside of a handler function that is called as a result of XUartPs_Recv().
Any thoughts on why this could happen??
Thanks so much.
09-11-2017 01:56 PM
A data abort will occur when a peripheral is accessed and the the core gives a SLVERR response. In the case of a UART, I'd guess it may be involved with mis-accessing the FIFO. Perhaps the interrupt is not being handled correctly and the FIFO is underflowing or overflowing? Check the UART peripheral documentation.
09-11-2017 05:38 PM
These are always a bit tricky to pinpoint.
One way I use to zoom in on the culprit is to rely on the stack trace back in the debugger.
Click on the stack location in the trace-back window for the interrupt handler where the trap was triggered.
This will bring the editor to the line where it occurred.
The registers, local variables & disassembly windows will reflect the state of the processor when the trap has occurred.
You'll quite likely have to remove all optimization (set gcc to -O0) to see all local variables.
You should also check where the IRQ stack pointer is.
09-11-2017 05:48 PM
Is there something inherently wrong with trying to send data to the UART from it's interrupt handler?
I talked to one of my computer architecture professors and they suggested it might be that I'm spending too much time in the interrupt handler. Is that a possible explanation?
09-11-2017 07:19 PM
Yes, spending too much time could be a possible explanation.
But it's a matter of priority.
I don't think you spending time trying to get the interrupts to work with the UART is futile.
Today, most systems handle their UARTs through interrupts.
Interrupts and DMA are among the trickiest to debug.
But when you get them to work it's a "WOW! THAT WAS ONLY THIS" moment and it's an asset because you've figured a way to debug interrupts and it remains with you forever.
You are not alone; read Steve Woziniak's auto-bio (iWoz) : he talks in it about his problems debugging interrupts for the Apple II floppy disk drive controller.
09-11-2017 08:43 PM
Thanks @ericv appreciate the tips.
What I meant with the 'spending too much time in the interrupts' comment was that the processor was potentially spending too much time in the ISR :) E.g. if it is in the ISR too long other interrupts occur and throw off the program in some way. Is it possible for the interrupt handler to be corrupted if interrupts occur while an ISR is still being serviced?
09-12-2017 12:58 AM
The A9 interrupts are not nested; i.e. an interrupt can't interrupt another interrupt (except FIQ interrupting IRQ).
It goes pending until the first one is done.
UART interrupts rate (compared to the processor speed) are quite slow so there should plenty of CPU to process the UART.
If the UART handling was too long, your application would not fault but it would miss interrupts.
Because your stack trace-back shows the data abort happening in the UART interrupt handler, the handler itself is doing something wrong and that's why analyzing the variable & register values is a key thing to do.
09-12-2017 12:34 PM
Is it possible that it's a bug in Xilinx's UART or ScuGic driver? If that's the case I'm not sure that I know enough about how the interrupt processing works to even attempt debugging that. Just to clarify---the XilDataAbort is happening inside the XUartPs_InterruptHandler() and not inside my interrupt handler.
When the device starts up, there are pending UART interrupts (I believe they are RECV and TX error). So although you're right that under normal operation the UART interrupt rate is slow, it is possible to get 2 interrupts occurring in close proximity at startup.
07-18-2019 02:21 AM
i am trying to debug the pcam5c project for zyboz7020 which i have re-targetted for board zybo z-7010.
i dnt understand this error, reading online says its an interrupt problem.
am using 2016.4 SDK ..let me know how to overcome this if anyone has faced this before.