cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
1,491 Views
Registered: ‎09-30-2011

AXI DMA Interrupt Behavior - Can it be explained?

We are having an issue in which the MM2S DMA transfer in SG mode on the AXI DMA IP core hangs. We are running on Linux on a Zynq, and what we see is that the call to the transmit operation randomly hangs after a large number of successful operations. When we look at the DMA registers they indicate that the transfer is complete and no error occured but, it seems, no completion interrupt is generated. Our IRQThreshold is set to 1 and we are sending 1024 bytes of data at a time in a single SG buffer. (All of our transfers are of a single SG buffer so it is a degenerate case)

We started down the path of looking at Tlast and the interrupt signal. We assumed (perhaps incorrectly) that for every Tlast there should be a matching interrupt and at first it looked like when the system hung there was a Tlast but no interrupt following. Then we did some more investigation and we thought we saw that several Tlasts may precede an interrupt and we maybe even saw in some cases the interrupt precedes Tlast.

So now we are quite confused. Do interrupts and Tlasts map one-to-one? Is it the case that only Tlast can trigger a completion interrupt (or more simply should an interrupt ALWAYS follow Tlast - or I suppose Tlasts)? 

Thanks for any clarifications

0 Kudos
10 Replies
longley
Xilinx Employee
Xilinx Employee
1,430 Views
Registered: ‎04-15-2011

neil@formidableengineeringconsultants.com 

I had some tests on tlast vs idle bit for AXI DMA SG mode before. And it turns out that idle bit  of MM2S status register asserts once mm interface is done, even stream interface is still running.

So back to your case, I think interrupt is similar to idle bit. That's why you can see the interrupt precedes Tlast sometimes. 

Thanks,

Longley


------------------------------------------------------------------------------------------------

Don’t forget to reply, kudo, and accept as solution.

If starting with Versal take a look at our Versal Design Process Hub and our
Versal Blogs

------------------------------------------------------------------------------------------------
0 Kudos
1,407 Views
Registered: ‎09-30-2011

That is a little helpful. Basically what we are seeing is that the transmit side (MM2S) hangs after running "for a while". It looks like the transaction completes (judged by examining the registers) but that the interrupt is not delivered. Or maybe there is some race condition and the interrrupt is missed or blocked since the software driver is hanging on some other semaphore or spin lock. It would be helpful to understand or get a pointer to some document that described how we need to manage the AXI signals to make the IP work. The AXI DMA datasheet indicates that the we need to manage the signals but doesn't state how or which signals.

0 Kudos
1,401 Views
Registered: ‎07-23-2019

neil@formidableengineeringconsultants.com 

My assumption is there must be one and only one interrupt for Tlast. Otherwise, they signal different things.

You mention semaphores, so I assume you have an RTOS and you play around with interrupts, disabling/ enabling/ clearing them. Well, that's a good environment for getting lost. I'd suggest you revise your interrupts, things like having a slow process with ints disabled, or clearing all (or more than one) somewhere. 

Either an extra interrupt or a lack of it, I think it won't hang your code, but would produce some anomalous thing and the code would keep running. Software hangs usually because of bad jumps because of corrupted pointers. Stack and heap overflows are many times the last thing one thinks about when running into problems, check that!  

 

0 Kudos
1,392 Views
Registered: ‎09-30-2011

We are running Linux. What happens on the software side is that the transfer call hangs. When we examine the IP registers it says the transfer is complete. We believe that the transfer call is waiting for the completion interrupt or being held in some spin lock owing to some race condition between the AXI signals and the driver. The code does not crash, the stack and the heap are not corrupt, there is no stack trace or kernel panic. It just hangs. The AXI signal management is left as an exercise to the end user (according to the datasheet) so I am trying to get some clarity on the exact nature of this management. Which signals should be managed Tidle? Tlast? and how should they be managed?

0 Kudos
1,371 Views
Registered: ‎07-23-2019

neil@formidableengineeringconsultants.com 

Man... interrupts are to not wait for events among other reasons. what do you mean by 'waiting for the completion interrupt'? Checking the flag? That's not how you (best) use interrupts. An interrupt jumps to its ISR and there you do what it has to be done. Is there an ISR and you are checking the int flag? In that case bear in mind, the flag is cleared after the ISR so if it jumps to it before you check, you miss it, could that be the case?

 

 

0 Kudos
1,357 Views
Registered: ‎09-30-2011

We are using the linux-supplied driver. We did not write anything new. I know the driver gets an interrupt on transmitcompletion and I know there is asome code in the kernel that traps it and that there are some spin locks that synchronize activity on the DMA. I know our call to transmit hangs. So I am speculating that it is hung on a spin lock and maybe that spin lock is waiting foran interrupt. This all kernel magic that I am loathe to wade into especially since my feeling is the problem is on the FPGA implmentation side and specifically related to AXI signal management. Tha is the motivation for my question

0 Kudos
1,334 Views
Registered: ‎07-23-2019

Mmm,

It would be good if you could share specific details. What Linux (source and version), what platform, what application, etc. I used the DMA example code from Xilinx without a problem, but it was a bare metal app.

0 Kudos
1,324 Views
Registered: ‎09-30-2011

My sense is that the software is fine. And that our FPGA implementation is not. Specifically that the AXI interface is messed up. That is why I am seeking details about the signaling.

We are running on a Zynq, using Linux generated by Yocto based on Xilinx's recipes. Again, I am not suspicious of the software
0 Kudos
1,308 Views
Registered: ‎07-23-2019

My suspicion is always on the software. As a general rule. Software is terribly more complex than hardware so chances of a bug there are greater. Simple statistics.

Philosophy apart, what you want is to sort that out. Ways that come to my mind:

- Try the DMA with a bare-metal software. does it work? That could imply something in the Linux OS is messing up.

- Use the ILA to watch the interrupt and maybe other data to check if it's missing

 

0 Kudos
1,296 Views
Registered: ‎09-30-2011

The problem is that we don't know what we are looking for. We have used ILAs and we sometimes see Tlast and the interrupt match up and sometimes we don't. But there is also Tready and Tidle. They play a role but we don't know what role. And we don't know the expected sequencing. The only thing we know for sure that is MM2S hangs and we can see all the signals but have no idea what constitutes correct signalling.

0 Kudos