01-16-2019 02:15 AM - edited 01-16-2019 03:14 AM
****** Summary *******
In some cases, the RX channel (stream to memory-mapped) of the AXI DMA (SG + Multichannel modes) randomly hangs. This doesn't always occur. Using ILA cores, I can observe that when the problem occurs, the DMA actually updates the status fields of the last descriptor completed, accepts 4 data beats of the new incoming AXI-Stream packet (which is the normal behaviour), but does make any AXI read request to fetch the next descriptor, while it should.
*** End of summary ***
I am using the AXI DMA (v7.1) with Scatter-Gather and Multichannel modes enabled, with 2 channels on both Read and Write channels. I use the standalone driver provided with Vivado in a standalone application. The board is a Trenz Electronic UltraSOM TE0808-ES1 with a "xczu9eg-ffvc900-1-i-es1" Zynq UltraScale+ MPSoC. I use Vivado 2017.4 (64-bits) under Ubuntu 16.04 LTS.
In my design, I sometimes experience that the RX channel of the DMA hangs, i.e. it doesn't accept any more data. In fact, it does not make any read request to fetch the next RX descriptor, even though descriptors are ready and the "tail descriptor" pointer has not been reached. This issue occurs "randomly", i.e. it does not always happen, and never at the same times. I've been stuck with this issue for now 4 weeks. I could not find anything wrong with the descriptors nor the signals of the S2MM AXI-Stream, so I think this could possibly be an issue with the AXI DMA core occuring in a particular situation.
I could reproduce the same issue with a minimal demonstrator project (Github link below). The design of the demonstrator is as follow:
The AXI-DMA is configured with Scatter Gather Engine enabled, Multi Channel support enabled. Both Read and Write channels are enabled, with 2 channels each, 32-bits memory-mapped and stream data.
The S2MM AXI-Stream Interconnect has property "Arbitrate on TLAST transfer" at "Yes", "Arbitrate on maximum number of transfers" at 0, and "Arbitrate on number of LOW TVALID cycles" at 0.
Beyond the AXI-Stream Interconnects, I instantiated two instances of the custom IPs "dma_killer", developed only for this issue demonstrator. The dma_killer custom IP receives jobs on its slave AXI-Stream interface. Each jobs consist in generating an AXI-Stream packet from the master interface, with a desired pause before the next packet. An packet received by the AXIS slave interface can contain any number of jobs, and must contain a multiple of 3 32-bits words. Each group of 3 successive 32-bits words form a job:
- The first word is the length of the packet to generate, in 32-bits words. Maximum 1023 words.
- The second word is the pause between the current packet and the next one, in clock cycles. Maximum 1023 clock cycles.
- The third word is the TID and TDEST value of the packet to generate. In this example design, it must be 0 or 1.
The dma_killer block includes an internal FIFO which allows to store 1024 jobs.
In the example application, only the dma_killer_0 is used (TDEST = 0 for TX packets).
3 ILA cores with Advanced Trigger are also instantiated in order to capture the moment where the problem occurs. Depending on the duration of the pause between packets, the issue will occur or not. The example program gives a working example (commented) and a non-working example (uncommented).
The example application "dma_issue_demonstrator" provided with the project perform the following tasks:
- Mark the buffers containing the descriptors and the data to transfer as "uncacheable"
- Configure the DMA, and prepare the buffer descriptors rings for all channels.
- Configure the interrupts.
- Call test function dma_transfers() (two different calls with different parameters can be chosen, one working, the other not). This function tries to perform following steps 100 times:
- Prepare all descriptors of the RX channels
- Prepare jobs for the dma_killer block, and send them as TX packets.
- Wait until all prepared descriptors are completed or until the timeout is reached.
- Dump the registers of the DMA
- Dump the last and the current descriptor of each channel
With packets length of 1000 words, and a pause of 100 clock cycles between packets, here is an example of the registers states when the transfer is hanging:
------- DMA issue demonstrator -------- --- Entering main() --- DMA transfers in progress... Try no 0 successful Try no 1 successful Try no 2 successful Transfers failed at try no 3 TxDone=103/1000, RxDone=2/10000 ******* Dump registers of the DMA: ******* Channel TX 0.0: Dump registers A0000000: Control REG: 64017003 Status REG: 00010008 Cur BD REG: 430307C0 Tail BD REG: 4303E7C0 Channel RX 0.0: Dump registers A0000030: Control REG: 64017003 Status REG: 0001000A Cur BD REG: 410EA640 Tail BD REG: 41138800 Channel RX 0.1: Dump registers A0000030: Control REG: 64017003 Status REG: 0001000A Cur BD REG: 420EA640 Tail BD REG: 42138800 *********** Dump descriptors *********** Channel TX 0.0 Dump BD 43030780: Next Bd Ptr: 430307C0 Buff addr: 6105AE10 MCDMA Fields: 3000000 VSIZE_STRIDE: 80001 Contrl len: C000078 Status: 80000078 APP 0: 0 APP 1: 0 APP 2: 0 APP 3: 0 APP 4: 0 SW ID: 6105AE10 StsCtrl: 0 DRE: 4 Dump BD 430307C0: Next Bd Ptr: 43030800 Buff addr: 6105AE88 MCDMA Fields: 3000000 VSIZE_STRIDE: 80001 Contrl len: C000078 Status: 0 APP 0: 0 APP 1: 0 APP 2: 0 APP 3: 0 APP 4: 0 SW ID: 6105AE88 StsCtrl: 0 DRE: 4 Channel RX 0.0 Dump BD 410EA600: Next Bd Ptr: 410EA640 Buff addr: 7826EEC0 MCDMA Fields: 3000000 VSIZE_STRIDE: 80001 Contrl len: FA0 Status: 8C000000 APP 0: 0 APP 1: 0 APP 2: 0 APP 3: 0 APP 4: 0 SW ID: 7826EEC0 StsCtrl: 0 DRE: 4 Dump BD 410EA640: Next Bd Ptr: 410EA680 Buff addr: 78270E00 MCDMA Fields: 3000000 VSIZE_STRIDE: 80001 Contrl len: FA0 Status: 0 APP 0: 0 APP 1: 0 APP 2: 0 APP 3: 0 APP 4: 0 SW ID: 78270E00 StsCtrl: 0 DRE: 4 Channel RX 0.1 Dump BD 420EA600: Next Bd Ptr: 420EA640 Buff addr: 7826FE60 MCDMA Fields: 3000000 VSIZE_STRIDE: 80001 Contrl len: FA0 Status: 8C000101 APP 0: 0 APP 1: 0 APP 2: 0 APP 3: 0 APP 4: 0 SW ID: 7826FE60 StsCtrl: 0 DRE: 4 Dump BD 420EA640: Next Bd Ptr: 420EA680 Buff addr: 78271DA0 MCDMA Fields: 3000000 VSIZE_STRIDE: 80001 Contrl len: FA0 Status: 0 APP 0: 0 APP 1: 0 APP 2: 0 APP 3: 0 APP 4: 0 SW ID: 78271DA0 StsCtrl: 0 DRE: 4 Transfers failed at try no 3 --- Exiting main() ---
As we can see, there is no error bit in both RX and TX status register, nor in the descriptors.
Here is the capture of signals on the S2MM AXI-Stream (between AXIS Interconnect and DMA):
As we can see, several packets are accepted by the DMA, and then it asserts TREADY low. Here is a zoom on the last TLAST signal:
Everything seems to be OK. Then here is a zoom on the time when a new packet is available:
As we can see, the AXI DMA accepts 4 data beats of the new incoming packet (which is the normal behaviour), and asserts TREADY low. At this moment, the DMA should fetch the next descriptor (here of channel 0), but it does not do any read request for this descriptor. Here is a capture of the Read address channel captured by the ILA placed at the AXI SG interface of the DMA (the time scale is the same, because it is triggered at the same time):
The addresses beginning with 0x41... refer to descriptors of RX channel 0, the addresses beginning with 0x42... refer to descriptors of RX channel 1, and the addresses beginning with 0x43... refer to descriptors of the TX channel.
The two read requests marked in yellow are the only ones referring to RX channels. All other read requests refer to TX channel. The last RX descriptor fetched has address 0x420000c0. This is actually the descriptor right before the "current descriptor pointer" seen in the register dump.
The following capture shows the Write Address channel at the same time:
We can see that the status word (descriptor address + 0x1C) of the two last completed RX descriptors is written by the AXI DMA. All other write requests are related to the TX channel.
So here is my question: do you have an idea why the RX channel of the DMA could stop fetching descriptors while the tail descriptor has not been reached? I could not find any information about such a problem in forums and known issues.
The project can be accessed here: https://github.com/mattzimm91/dma_issue_demonstrator . Measurements (.ila files, messages displayed and prinscreens) are attached to this post.
Please tell me if I forget any useful information.
Thanks in advance for you help!
IMPORTANT NOTE: In the application, only 1 GB of the DDR must be "visible" from the linker script. In the second GB of the DDR (between addresses 0x40000000 to 0x7FFFFFFF), buffers (RX_BD_SPACE_BASE, TX_BD_SPACE_BASE, RX_BUFFER_BASE, TX_BUFFER_BASE in file dma_management.h) are manually allocated as constants and marked as "uncacheable" (in the beginning of the main function). If necessary, please adapt this.