UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Observer daleye
Observer
5,437 Views
Registered: ‎11-13-2014

VDMA is choppy

Having issues with triple frame buffer causing choppy video. This is barely noticeable with 1 VDMA, but we are using a scaler and require VDMAs on both input/output of the scaler which compounds the issue.

 

Pipe as follows:

 

Video-in-to-axis -> VDMA -> Scaler -> VDMA - Axis-to-video-out (Master)

 

 

VDMA config:

 

S2MM-

Dynamic Master

MM2S

Dynamic Slave

 

S2MM fsync on TUSER

 

 

We've tried pulling the scaler out and the choppy video is still present. Scaler is currently passing through, same res and frame rate on both in/output.

 

Clocks:

 

148.5 Axis clock

200 Core clock scaler

50 Axi-lite clock

 

Does http://www.xilinx.com/support/answers/54878.html apply still? Does running the axi lite control at 50 MHz actually degrade throughput?

 

We've also tried running frame buffers in Master/Slave, with frame delays set properly ie frame_dly = 1 or more, and no change.

 

I feel like I've combed through every post, tip, answer record etc without any resolution to this issue.

 

Can someone help?

0 Kudos
6 Replies
Observer daleye
Observer
5,395 Views
Registered: ‎11-13-2014

Re: VDMA is choppy

Little hidden note in the VDMA product guide, before I spend time building, can anyone confirm:

 

Stride = Bytes per pixel * line length

 

HOWEVER

 

XAPP1205 notes adjusting this to hit a boundary of 8KB for 1080p content.

 

There is also the note on pg 58 of 

http://www.xilinx.com/support/documentation/ip_documentation/axi_vdma/v6_2/pg020_axi_vdma.pdf

 

That mentions stride to be an aligned transfer needs to be 0x4, 0x8, 0xC, etc for 32-bit wide data.

 

So I believe, stride is actually supposed to be:

 

Stride = line_length * (memory map data width/8)

Stride = 1920 * (32/8)

Stride = 7680 

 

ELSE

 

allow unaligned transfers box is to be checked and

Stride = line_length * bytes per pixel

 

For whatever reason XAPP1205 creates 16MB frames, is it advised to have stride = some power of 2? ie 1080p content = 2^13

 

Just trying to decipher the tiny little nuances of this core that are resulting in stable but choppy video, we expect only an increase in latency not any sort of degradation of apparent frame rate or order of the frames.

 

Thanks! 

0 Kudos
Xilinx Employee
Xilinx Employee
5,376 Views
Registered: ‎08-02-2011

Re: VDMA is choppy

Hello,

 

First, I'll go through your specific questions.

 

Does http://www.xilinx.com/support/answers/54878.html apply still? Does running the axi lite control at 50 MHz actually degrade throughput?

Yes it does still apply and yes setting up your clocks will cause the VDMA efficiency to suffer. Ultimately, this AR is basically just re-iterating the clocking section of the doc which says:

 

In asynchronous mode, s_axi_lite_aclk clock must have a lower frequency than both
m_axi_mm2s_aclk and m_axi_s2mm_aclk clocks.

...

IMPORTANT: Make sure the memory map side clock frequency is equal to or greater than the
streaming side clock frequency to achieve required performance.

So just make sure that your m_axi_*_aclks are the fastest clocks (or equal).

 

Regarding the stride, this is likely not your issue. The stride defines where in memory the VDMA stores (and looks for) each consecutive line. You can set stride==hsize to pack the lines directly adjacent to each other in memory or stride > hsize to pack them in some other way. There are a few reasons you might want to do the latter. Xapp1205 is setting it to a power of two to optimize memory access efficiency.

 

Anyway, your setup sounds reasonable to me. If it were me, I'd start by debugging the 1 VDMA setup and then add the second VDMA with scaler. I assume by your description that you've seen this AR. You should also set the frame delay for the genlock slave >0. One last thing to make sure of: register writes are only committed once you write the VSIZE register. So make sure that when you make any changes that you also re-write VSIZE as the very last thing.

 

What version of the VDMA are you using? This AR may be relevant.

 

Lastly, you should look at using the new Video Processing Subsystem. It can accomplish scaling (both up and down) with only 1 VDMA instance and completely abstracts all these setup details from you. It really is much easier to use.

www.xilinx.com
0 Kudos
Observer daleye
Observer
5,360 Views
Registered: ‎11-13-2014

Re: VDMA is choppy

Thank you for the reply.. today was a very good day, but it did not come without it's own set of new issues.

 

Similar to your suggestion, we culled our design down to the very basics:

 

VDMA- -> VDMA -> AXI-to-Video-Out (master) 

 

What we learned was the following:

 

VDMA0 -

Write Throughput 254MBps 

Read Throughput 148MBps

 

VDMA1 -

Write Throughput 148MBps

Read Throughput 124MBps

 

We expected:

 

124 MBps across all but the initial input. This does not occur as VDMA0's read is free running without any blanking intervals, thus it spits data at an equiv framerate of 35.8 rather than 30. 

 

This results in VDMA1 in Master/slave or dynamic genlock to have cases where the write side completes in advance of the read and caused frames out of order which make it appear choppy.

 

First resolution:

 

VDMA0 MM2S Fsync = external

VDMA1 S2MM Fsync = external

 

Tied these pins to VTC FSync out

 

All throughputs are now essentially locked to 30 FPS and frame buffers all rotate with 1 frame of delay. 

 

PERFECT...

 

Let's apply with our scaler surrounded. Fail.

 

We're making progress, but it's interesting that we need to basically gate our VDMAs with the external fsync, as almost all recommendations appear to have it free run.

0 Kudos
Xilinx Employee
Xilinx Employee
5,347 Views
Registered: ‎08-02-2011

Re: VDMA is choppy

Hello,

 

Oh, great! Glad you were able to make some progress.

 

Yeah, most of the recommendations are for the common case where you have 1 VDMA as a triple buffer just for decoupling input from output. The philosophy for putting MM2S in free-run mode in this case is that the VTC+AXIS2Vid cores work together to replicate the proper timing and throttle back on the VDMA as appropriate. In this scenario, adding an additional fsync to the VDMA can cause race conditions.

 

In the 2 VDMA/scaler case, the situation is a little different because you have the output VDMA's S2MM throttling back on the input VDMA's MM2S. In this case there's no video timing master... it's all just AXI Stream and things happen as fast (or slow) as dictated by each core, so genlock will cause the framebuffers to do more jumping around than is necessary.

www.xilinx.com
0 Kudos
Observer daleye
Observer
5,336 Views
Registered: ‎11-13-2014

Re: VDMA is choppy

Thanks for the input.

 

I'm not sure if it makes sense to close this, as that issue is solved or to keep my thread moving forward for the next issue. I'll let a mod decide, any way here it goes:

 

Scaler!!

 

So I appreciate the response that the new core can possibly resolve our issues, and we have requested an eval license, just waiting on a response :)

 

But, while that response is being generated, been an entire business day BTW...

 

The scaler appears to be unable to keep up with a very very simple operation, zoom.

 

When the scaler is in passthrough, throughput is as we expect; however, when we increase the zoom factor to 2x, we see a reduction in throughput as great as 75% ie 125 MBps becomes 31 MBps this reduction depends on if the VDMAs have the above fsync solution in place or not.

 

If the VDMAs are free running at 148.5MBps, the throughput drops to 100MBps with a zoom factor of 2x with core clock at 200MHz.

 

We've had some luck, oddly enough, reducing the core clock from 200MHz down to 148.5MHz which got us a 50% reduction in throughput with our fsync solution and are going to try 74.25MHz to see if this make get us closer to a 25-30% reduction in throughput. Not that that is acceptable, but it will at least identify a trend.

 

Everything in the data sheet notes that a higher clock shouldn't hurt anything, but this appears to be incorrect. 

 

We've gone through the calculations assuming this is essentially the unity case, but perhaps zoom augments those basic formulas a bit and requires a higher frequency clock in/out/core?

 

bwiec, 

 

You are the primary reason I come to these forums, your posts across the xilinx community have been incredible useful for us in our troubleshooting efforts.

 

 

0 Kudos
Xilinx Employee
Xilinx Employee
5,318 Views
Registered: ‎08-02-2011

Re: VDMA is choppy

Thanks for the kind words! Happy to help.

 

Let me know if you don't get a response on the VPSS eval and I can follow up.

 

Yes, using the crop feature in the scaler actually varies the clocking requirements and is actually more difficult for the scaler to deal with. The reason being that the scaler needs to wait until it gets to the zoomed portion of the image before it can start computation and has relatively small amount of time to finish processing before the end of the line. Usually this results in higher clocks being needed, but getting the ratios right can be tricky.

 

Instead, I usually recommend doing the cropping in your front VDMA and leave the scaler at the full (smaller) frame rate but with no cropping. So if you have 1080p coming in, you want to crop out a 512x512 section of the image and scale it back up to 1080p to achieve zoom, I would have the input VDMA write side at 1080p, read side at 512x512. Then the scaler input resolution is 512x512 and set it to upscale to 1080p. Then the output VDMA is as normal at 1080p. This should help improve the memory bandwidth situation too since the input read stream now only needs to read 512x512 instead of 1080p.

 

I'll also point out that the VPSS can do this crop/zoom operation with only a single input VDMA and it's a simple API function call to do it. The scaler in the VPSS is totally re-designed from scratch.

www.xilinx.com
0 Kudos