cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
malstew
Visitor
Visitor
314 Views
Registered: ‎06-05-2020

VDMA locking AXI bus...

Jump to solution

I have a system with 3 VDMA blocks driving a mixer.   When I turn on the first two  VDMA's the system is okay, however, when I enable the third one the AXI bus gets locked up because the third VDMA does not receive a TREADY signal from the mixer -- and cascades back to the VDMA read port.  I see TUSER=1, TVALID=1, but the mixer isn't accepting data.  

Below is the configuration of the mixer.  

malstew_0-1618274740769.png

The first two VDMA's are configured as follows: 

malstew_2-1618275020344.pngmalstew_3-1618275038511.png

The third VDMA is configured as follows: 

malstew_4-1618275097689.pngmalstew_5-1618275115997.png

Here is a picture of the AXI read side bus locked up: 

malstew_6-1618275804593.png

..and at the stream side of the VDMA: 

malstew_7-1618277558737.png

So I gather that the VDMA has no internal throttling mechanism in case it's internal line buffer is full besides de-asserting RREADY? 

Suggested fix?  

Malcolm

 

 

 

 

 

Tags (1)
0 Kudos
1 Solution

Accepted Solutions
malstew
Visitor
Visitor
188 Views
Registered: ‎06-05-2020

I figured out a fix for it...  

Design: There are 3 VDMA's going into a mixer:  layer3 (ch3_* signals in pic), layer2 (ch2_* signals in pic), and layer1 (ch1_* signals in pic).  Layer2 and layer1 VDMA's are the same and most recently are configured (changed from first post) with 128-bit interface, 16 beat read burst size, stream data width of 32-bits, and buffer size of 512.   Layer3 VDMA is configured with a 128-bit interface, 64-beat read burst size, stream data width of 128-bits, and buffer size of 2048.  

The waveforms below shows what was happening the first picture (ila1) captures the memory mapped AXI (AXI-MM) transactions and is triggered on (ch3_rready ==0 & ch3_rvalid==1).  The second picture (ila2) uses the trigger output of the first ILA to capture all of the mixer AXI-stream signals (except tdata).   

What I saw happening is that the Layer3 VDMA (ch3_*) did not stop requesting data EVEN when its internal buffer filled up.  The buffer was 2kx128-bits deep and the burst size was 64 beats (also 128-bits) so it could buffer 32 AXI burst transactions.  Layer1/layer2 did not work the same way -- maybe based on the DMA datawidth configuration (ratio of AXI_MM data size to AXI-S data size) the VDMA throttling gets disabled?  (just a guess)  

My first attempt to make my design work was to try and line things up better.  I did this by enabling the Frame Sync (fsync) signal in the MM2S VDMAs.   According to the VDMA spec fsync is triggered on the falling edge of the input so I could just use the TUSER[0] input from layer 0.  To be extra safe I AND'd the TUSER signal with TVALID and TREADY (overkill).  This worked a little better but things still locked up.  This was  a step forward, however, because it lined up everything that was going on and made it easier (cleaner to see) that the ch3_vdma was  not being throttled. 

I could not find a mechanism in the VDMA to throttle ch3 -- I tried different line buffer sizes and burst sizes but couldn't find a combo that worked.  I did not play with the AXI-MM size because the AXI-width to DRAM was 128-bits, and on the AXI-Stream side I needed 128-bits (4-pixels of ARGB) per cycle (4k-video output support).   Maybe someone can explain sometime why the VDMA would not throttle itself in this configuration? 

The "fix" that I found was in the AXI-interconnect.  I was able to turn on the 512-deep data FIFO in packet mode.  This FIFO is used to make sure that full/empty stalls in the middle of bursts don't happen.  Packet mode causes a delay in the issuing of the read transaction on the AR channel until the data FIFO has enough room to store the entire burst.  

Hopefully this helps someone else out!

Regards,
Malcolm

malstew_0-1618411986843.png

malstew_1-1618412216071.png

 

View solution in original post

4 Replies
malstew
Visitor
Visitor
302 Views
Registered: ‎06-05-2020

Can I use mixer Layer0 signals TUSER[0] & TVALID & TREADY to create a FSYNC signal that could be used for the other VDMA's to synchronize MM2S paths? 

 

0 Kudos
florentw
Moderator
Moderator
256 Views
Registered: ‎11-09-2015

HI @malstew 

First to me, it seems to be more a limitation from the Video Mixer than from the AXI VDMA. It is weird to accept only one pixels.

I would think a better behaviour would be to wait to be actually ready before asserting tready. What version are you using?

Then for the AXI VDMA, to avoid blocking the AXI4-Stream interface , you can change the configuration of the AMI-MM data width, line buffer depth and line buffer size to make sure it can buffer the full burst. This will have an impact of the performance in a working system (reducing the burst size) or an impact on the resources (increasing the buffer size) but this should avoid the lock of the AXI4-Stream interface.


Florent
Product Application Engineer - Xilinx Technical Support EMEA
**~ Don't forget to reply, give kudos, and accept as solution.~**
0 Kudos
malstew
Visitor
Visitor
238 Views
Registered: ‎06-05-2020

Thanks for the reply...

Tool: Vivado v2020.2 (64-bit)

Versions: 

  1. AXI Video Direct Memory Access (6.3)
  2. Video Mixer (5.1)

the issue I believe is not the blocking of the AXI-Stream interface but the blocking of the AXI-MM interface.  The waveform shows that there were 21 AXI-MM burst requests and 16-bursts (32-beats) have come back to fill the buffer.  There are still 5 outstanding requests and another one on the output port that hasn't been accepted yet.  It seems like the VDMA (mm2s) has no throttling mechanism on the read side and will even make requests even if its line buffer doesn't have enough room to accept them.

If I disable layer1/layer2 and just run layer3 it works okay.  As soon as I enable layer1/layer2 with layer3 then things lock up.  All three VDMA's on layer1/layer2/layer3 are free running... no fsync.  

Here is what I think is happening -- when I enable the layer3 Mixer input the start-of-frame (tuser[0]) does not line up with the layer0 start of frame (maybe it is half-way thru a frame).   So the mixer does not accept the input (TREADY->0), and the VDMA starts buffering.  The VDMA appears to have no throttling mechanism so it will continue to make requests until its internal buffer is full and the AXI-MM is backed up all the way to the memory.  At this point the AXI-MM fabric stops servicing requests from layer1/2 until layer3 data gets pulled out.  Because layer1/layer2 get starved for data the mixer locks up waiting for them.  

From your experience does that make sense?  I think that increasing the buffer will only work if it gets to be almost the size of a frame (1920x1200x4bpp).

 

 

malstew_0-1618319378571.png

 

0 Kudos
malstew
Visitor
Visitor
189 Views
Registered: ‎06-05-2020

I figured out a fix for it...  

Design: There are 3 VDMA's going into a mixer:  layer3 (ch3_* signals in pic), layer2 (ch2_* signals in pic), and layer1 (ch1_* signals in pic).  Layer2 and layer1 VDMA's are the same and most recently are configured (changed from first post) with 128-bit interface, 16 beat read burst size, stream data width of 32-bits, and buffer size of 512.   Layer3 VDMA is configured with a 128-bit interface, 64-beat read burst size, stream data width of 128-bits, and buffer size of 2048.  

The waveforms below shows what was happening the first picture (ila1) captures the memory mapped AXI (AXI-MM) transactions and is triggered on (ch3_rready ==0 & ch3_rvalid==1).  The second picture (ila2) uses the trigger output of the first ILA to capture all of the mixer AXI-stream signals (except tdata).   

What I saw happening is that the Layer3 VDMA (ch3_*) did not stop requesting data EVEN when its internal buffer filled up.  The buffer was 2kx128-bits deep and the burst size was 64 beats (also 128-bits) so it could buffer 32 AXI burst transactions.  Layer1/layer2 did not work the same way -- maybe based on the DMA datawidth configuration (ratio of AXI_MM data size to AXI-S data size) the VDMA throttling gets disabled?  (just a guess)  

My first attempt to make my design work was to try and line things up better.  I did this by enabling the Frame Sync (fsync) signal in the MM2S VDMAs.   According to the VDMA spec fsync is triggered on the falling edge of the input so I could just use the TUSER[0] input from layer 0.  To be extra safe I AND'd the TUSER signal with TVALID and TREADY (overkill).  This worked a little better but things still locked up.  This was  a step forward, however, because it lined up everything that was going on and made it easier (cleaner to see) that the ch3_vdma was  not being throttled. 

I could not find a mechanism in the VDMA to throttle ch3 -- I tried different line buffer sizes and burst sizes but couldn't find a combo that worked.  I did not play with the AXI-MM size because the AXI-width to DRAM was 128-bits, and on the AXI-Stream side I needed 128-bits (4-pixels of ARGB) per cycle (4k-video output support).   Maybe someone can explain sometime why the VDMA would not throttle itself in this configuration? 

The "fix" that I found was in the AXI-interconnect.  I was able to turn on the 512-deep data FIFO in packet mode.  This FIFO is used to make sure that full/empty stalls in the middle of bursts don't happen.  Packet mode causes a delay in the issuing of the read transaction on the AR channel until the data FIFO has enough room to store the entire burst.  

Hopefully this helps someone else out!

Regards,
Malcolm

malstew_0-1618411986843.png

malstew_1-1618412216071.png

 

View solution in original post