UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Visitor matt.zimm91
Visitor
435 Views
Registered: ‎03-08-2018

AXI DMA in SG + Multichannel mode : RX channel randomly hanging (not fetching next descriptor)

Hi everybody,

****** Summary *******
In some cases, the RX channel (stream to memory-mapped) of the AXI DMA (SG + Multichannel modes) randomly hangs. This doesn't always occur. Using ILA cores, I can observe that when the problem occurs, the DMA actually updates the status fields of the last descriptor completed, accepts 4 data beats of the new incoming AXI-Stream packet (which is the normal behaviour), but does make any AXI read request to fetch the next descriptor, while it should.
*** End of summary ***

I am using the AXI DMA (v7.1) with Scatter-Gather and Multichannel modes enabled, with 2 channels on both Read and Write channels. I use the standalone driver provided with Vivado in a standalone application. The board is a Trenz Electronic UltraSOM TE0808-ES1 with a "xczu9eg-ffvc900-1-i-es1" Zynq UltraScale+ MPSoC. I use Vivado 2017.4 (64-bits) under Ubuntu 16.04 LTS.

In my design, I sometimes experience that the RX channel of the DMA hangs, i.e. it doesn't accept any more data. In fact, it does not make any read request to fetch the next RX descriptor, even though descriptors are ready and the "tail descriptor" pointer has not been reached. This issue occurs "randomly", i.e. it does not always happen, and never at the same times. I've been stuck with this issue for now 4 weeks. I could not find anything wrong with the descriptors nor the signals of the S2MM AXI-Stream, so I think this could possibly be an issue with the AXI DMA core occuring in a particular situation.

I could reproduce the same issue with a minimal demonstrator project (Github link below). The design of the demonstrator is as follow:

block_design.jpgBlock design of the demonstrator projectThe AXI-DMA is configured with Scatter Gather Engine enabled, Multi Channel support enabled. Both Read and Write channels are enabled, with 2 channels each, 32-bits memory-mapped and stream data.
The S2MM AXI-Stream Interconnect has property "Arbitrate on TLAST transfer" at "Yes", "Arbitrate on maximum number of transfers" at 0, and "Arbitrate on number of LOW TVALID cycles" at 0.
Beyond the AXI-Stream Interconnects, I instantiated two instances of the custom IPs "dma_killer", developed only for this issue demonstrator. The dma_killer custom IP receives jobs on its slave AXI-Stream interface. Each jobs consist in generating an AXI-Stream packet from the master interface, with a desired pause before the next packet. An packet received by the AXIS slave interface can contain any number of jobs, and must contain a multiple of 3 32-bits words. Each group of 3 successive 32-bits words form a job:
- The first word is the length of the packet to generate, in 32-bits words. Maximum 1023 words.
- The second word is the pause between the current packet and the next one, in clock cycles. Maximum 1023 clock cycles.
- The third word is the TID and TDEST value of the packet to generate. In this example design, it must be 0 or 1.
The dma_killer block includes an internal FIFO which allows to store 1024 jobs.

In the example application, only the dma_killer_0 is used (TDEST = 0 for TX packets).
3 ILA cores with Advanced Trigger are also instantiated in order to capture the moment where the problem occurs. Depending on the duration of the pause between packets, the issue will occur or not. The example program gives a working example (commented) and a non-working example (uncommented).

The example application "dma_issue_demonstrator" provided with the project perform the following tasks:
- Mark the buffers containing the descriptors and the data to transfer as "uncacheable"
- Configure the DMA, and prepare the buffer descriptors rings for all channels.
- Configure the interrupts.
- Call test function dma_transfers() (two different calls with different parameters can be chosen, one working, the other not). This function tries to perform following steps 100 times:
    - Prepare all descriptors of the RX channels
    - Prepare jobs for the dma_killer block, and send them as TX packets.
    - Wait until all prepared descriptors are completed or until the timeout is reached.
- Dump the registers of the DMA
- Dump the last and the current descriptor of each channel

With packets length of 1000 words, and a pause of 100 clock cycles between packets, here is an example of the registers states when the transfer is hanging:

 

------- DMA issue demonstrator --------                                         
--- Entering main() ---                                                         
DMA transfers in progress...                                                    
Try no 0 successful                                                             
Try no 1 successful                                                             
Try no 2 successful                                                             
Transfers failed at try no 3                                                    
TxDone=103/1000, RxDone=2/10000                                                 
                                                                                
******* Dump registers of the DMA: *******                                      
Channel TX 0.0:                                                                 
Dump registers A0000000:                                                        
Control REG: 64017003                                                           
Status REG: 00010008                                                            
Cur BD REG: 430307C0                                                            
Tail BD REG: 4303E7C0                                                           
                                                                                
Channel RX 0.0:                                                                 
Dump registers A0000030:                                                        
Control REG: 64017003                                                           
Status REG: 0001000A                                                            
Cur BD REG: 410EA640                                                            
Tail BD REG: 41138800                                                           
                                                                                
Channel RX 0.1:                                                                 
Dump registers A0000030:                                                        
Control REG: 64017003                                                           
Status REG: 0001000A                                                            
Cur BD REG: 420EA640                                                            
Tail BD REG: 42138800                                                           
                                                                                
*********** Dump descriptors ***********                                        
Channel TX 0.0                                                                  
Dump BD 43030780:                                                               
        Next Bd Ptr: 430307C0                                                   
        Buff addr: 6105AE10                                                     
        MCDMA Fields: 3000000                                                   
        VSIZE_STRIDE: 80001                                                     
        Contrl len: C000078                                                     
        Status: 80000078                                                        
        APP 0: 0                                                                
        APP 1: 0                                                                
        APP 2: 0                                                                
        APP 3: 0                                                                
        APP 4: 0                                                                
        SW ID: 6105AE10                                                         
        StsCtrl: 0                                                              
        DRE: 4                                                                  
                                                                                
Dump BD 430307C0:                                                               
        Next Bd Ptr: 43030800                                                   
        Buff addr: 6105AE88                                                     
        MCDMA Fields: 3000000                                                   
        VSIZE_STRIDE: 80001                                                     
        Contrl len: C000078                                                     
        Status: 0                                                               
        APP 0: 0                                                                
        APP 1: 0                                                                
        APP 2: 0                                                                
        APP 3: 0                                                                
        APP 4: 0                                                                
        SW ID: 6105AE88                                                         
        StsCtrl: 0                                                              
        DRE: 4                                                                  
                                                                                
                                                                                
Channel RX 0.0                                                                  
Dump BD 410EA600:                                                               
        Next Bd Ptr: 410EA640                                                   
        Buff addr: 7826EEC0                                                     
        MCDMA Fields: 3000000                                                   
        VSIZE_STRIDE: 80001                                                     
        Contrl len: FA0                                                         
        Status: 8C000000                                                        
        APP 0: 0                                                                
        APP 1: 0                                                                
        APP 2: 0                                                                
        APP 3: 0                                                                
        APP 4: 0                                                                
        SW ID: 7826EEC0                                                         
        StsCtrl: 0                                                              
        DRE: 4                                                                  
                                                                                
Dump BD 410EA640:                                                               
        Next Bd Ptr: 410EA680                                                   
        Buff addr: 78270E00                                                     
        MCDMA Fields: 3000000                                                   
        VSIZE_STRIDE: 80001                                                     
        Contrl len: FA0                                                         
        Status: 0                                                               
        APP 0: 0                                                                
        APP 1: 0                                                                
        APP 2: 0                                                                
        APP 3: 0                                                                
        APP 4: 0                                                                
        SW ID: 78270E00                                                         
        StsCtrl: 0                                                              
        DRE: 4                                                                  
                                                                                
Channel RX 0.1                                                                  
Dump BD 420EA600:                                                               
        Next Bd Ptr: 420EA640                                                   
        Buff addr: 7826FE60                                                     
        MCDMA Fields: 3000000                                                   
        VSIZE_STRIDE: 80001                                                     
        Contrl len: FA0                                                         
        Status: 8C000101                                                        
        APP 0: 0                                                                
        APP 1: 0                                                                
        APP 2: 0                                                                
        APP 3: 0                                                                
        APP 4: 0                                                                
        SW ID: 7826FE60                                                         
        StsCtrl: 0                                                              
        DRE: 4                                                                  
                                                                                
Dump BD 420EA640:                                                               
        Next Bd Ptr: 420EA680                                                   
        Buff addr: 78271DA0                                                     
        MCDMA Fields: 3000000                                                   
        VSIZE_STRIDE: 80001                                                     
        Contrl len: FA0                                                         
        Status: 0                                                               
        APP 0: 0                                                                
        APP 1: 0                                                                
        APP 2: 0                                                                
        APP 3: 0                                                                
        APP 4: 0                                                                
        SW ID: 78271DA0                                                         
        StsCtrl: 0                                                              
        DRE: 4                                                                  
                                                                                
Transfers failed at try no 3                                                    
--- Exiting main() ---  

As we can see, there is no error bit in both RX and TX status register, nor in the descriptors.

Here is the capture of signals on the S2MM AXI-Stream (between AXIS Interconnect and DMA):
axis_rx_0.jpgAXI-Stream RXAs we can see, several packets are accepted by the DMA, and then it asserts TREADY low. Here is a zoom on the last TLAST signal:
axis_rx_2.jpgAXI-Stream RXEverything seems to be OK. Then here is a zoom on the time when a new packet is available:
axis_rx_3.jpgAXI-Stream RX
As we can see, the AXI DMA accepts 4 data beats of the new incoming packet (which is the normal behaviour), and asserts TREADY low. At this moment, the DMA should fetch the next descriptor (here of channel 0), but it does not do any read request for this descriptor. Here is a capture of the Read address channel captured by the ILA placed at the AXI SG interface of the DMA (the time scale is the same, because it is triggered at the same time):
axi_sg_ar (annotated).jpgAXI SG interface of the DMA, Read Address channelThe addresses beginning with 0x41... refer to descriptors of RX channel 0, the addresses beginning with 0x42... refer to descriptors of RX channel 1, and the addresses beginning with 0x43... refer to descriptors of the TX channel.
The two read requests marked in yellow are the only ones referring to RX channels. All other read requests refer to TX channel. The last RX descriptor fetched has address 0x420000c0. This is actually the descriptor right before the "current descriptor pointer" seen in the register dump.

The following capture shows the Write Address channel at the same time:
axi_sg_aw (annotated).jpgAXI SG interface of the DMA, Write Address channelWe can see that the status word (descriptor address + 0x1C) of the two last completed RX descriptors is written by the AXI DMA. All other write requests are related to the TX channel.

So here is my question: do you have an idea why the RX channel of the DMA could stop fetching descriptors while the tail descriptor has not been reached? I could not find any information about such a problem in forums and known issues.

The project can be accessed here: https://github.com/mattzimm91/dma_issue_demonstrator . Measurements (.ila files, messages displayed and prinscreens) are attached to this post.

Please tell me if I forget any useful information.


Thanks in advance for you help!

IMPORTANT NOTE: In the application, only 1 GB of the DDR must be "visible" from the linker script. In the second GB of the DDR (between addresses 0x40000000 to 0x7FFFFFFF), buffers (RX_BD_SPACE_BASE, TX_BD_SPACE_BASE, RX_BUFFER_BASE, TX_BUFFER_BASE in file dma_management.h) are manually allocated as constants and marked as "uncacheable" (in the beginning of the main function). If necessary, please adapt this.

0 Kudos