cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
so-lli1
Adventurer
Adventurer
1,300 Views
Registered: ‎11-26-2016

VCU encoding framerate very low

Jump to solution

I use the omxh264enc gstreamer plugin in order to interface the VCU. Encoding works, however, the framrate after encoding is very low even if the bandwidth of the system is high, the input video stream has low resolution and the targeted bitrate is very low.

The following pipeline without encoder works fine with 60fps, no frames are dropped. CPU ultilization is around 0.x%. This ensures that the video source is fine. frames are delivered as expected.

gst-launch-1.0 -v v4l2src device=/dev/video0 ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! videoconvert ! fpsdisplaysink text-overlay=false

Using the Encoder, the framerate drops to 14fps, no frames are dropped and CPU utilization is around 25% (using busybox top, seems like it uses 1Core to 100%)

gst-launch-1.0 -v v4l2src device=/dev/video0 ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! videoconvert ! omxh264enc ! fpsdisplaysink text-overlay=false

According to PG252 Page 311 I already checked:

  • "prefetch-buffer=true" -> No performance impact
  • "target-bitrate" very low -> No performance impact
  • "b-frames=0" -> No performance impact
  • "queue" -> No performance impact
  • SMMU is disabled in Device-Tree
  • Increased CMA Size from 1000MB to 1200MB

Since the omx module seems to make use of the dmaproxy module, I checkt that it is loaded and can be used. Looks fine to me.
When I unload it (rmmod dmaproxy), the pipeline issues a warning "MA channel is not available, CPU move will be performed", the framerate is still at 14fps with a CPU usage of 25%. So it does not seem to have any effect.

However, it seems that the limiting factor is the CPU. The current frequency for ACPU is set to 750MHz. Sure the core allows higher frequencies, but I doubt that such a high frequency is required for a low resolution image and suspect the problem somewhere else.

I would appreciate any help resolving the issue.

Thanks,
so-lli1

 

0 Kudos
1 Solution

Accepted Solutions
aoifem
Moderator
Moderator
559 Views
Registered: ‎11-21-2018

Hi @so-lli1 

Thank you for the information. 

Although videotestsrc has proper GRAY8 support, omxh264/5enc does not yet have proper GRAY8 support. For example this pipeline will fail because of omxh264/5enc: 

root@vcu_trd:~# gst-launch-1.0 videotestsrc ! video/x-raw, width=1280, height=720, format=GRAY8, framerate=30/1 ! omxh264enc ! fpsdisplaysink name=fpssink text-overlay=false 'video-sink=fakesink' sync=true -v
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpssink/GstFakeSink:fakesink0: sync = true
ERROR: from element /GstPipeline:pipeline0/GstVideoTestSrc:videotestsrc0: Internal data stream error.
Additional debug info:
../../../../git/libs/gst/base/gstbasesrc.c(3072): gst_base_src_loop (): /GstPipeline:pipeline0/GstVideoTestSrc:videotestsrc0:
streaming stopped, reason not-negotiated (-4)
ERROR: pipeline doesn't want to preroll.
Setting pipeline to NULL ...
Freeing pipeline ...

 

This is the reason for the low framerate. I am attaching a 'hack' which should fix this issue, however please be aware this is not an official patch and has not been fully tested. Development are looking at adding support for GRAY8 in a future release. 

PG252 page 203 seems to imply that GRAY8 is supported. I have flagged this with development as a mistake, and they will fix it in a future release. Some gstreamer elements (like videotestsrc) already support GRAY8 but encoder does not. 

Aoife
Product Application Engineer - Xilinx Technical Support EMEA


**~ Got a minute? Answer our Vitis HLS survey here! ~**

**~ Don't forget to reply, give kudos, and accept as solution.~**

View solution in original post

15 Replies
watari
Teacher
Teacher
1,254 Views
Registered: ‎06-16-2013

Hi @so-lli1 

 

Did you make sure io-mode in v4l2src and set proper QoS parameter to CCI-400 ?

 

Best regards,

so-lli1
Adventurer
Adventurer
1,209 Views
Registered: ‎11-26-2016

Hi @watari ,

Thanks for pointing out the io-mode setting of the v4l2src plugin.
Default setting is "auto", however using explicit settings like "dmabuf" (also used in some pipelines in pg252) did not change anything.

Regarding CCI-400, please elaborate. As far as I can tell HP0-3 are not connected to CCI and therefore it should not matter.
However, another look in the datasheet revealed that the QoS can also be set for HP0-3. Tried this also, but had no effect on the framerate.

Edit:

PG252 also states that all interrupts are served by CPU0. I checkt /proc/interrupts and moved the framebuffer and al5e interrupts to different CPUs using smp_affinity. Also without success.

Furthermore I set the DDR_QOS_CTRL Register of Port3-5 to Best Effort (BE). No effect on the framerate.

 

Quite frustrating. Any other ideas?

Thanks,
so-lli1

0 Kudos
watari
Teacher
Teacher
1,174 Views
Registered: ‎06-16-2013

Hi @so-lli1 

 

Did you confirm pipeline diagram with dot file and CPU Usage via gsh-shark ?

If no, I suggest you to try them to investigate root cause.

 

https://developer.ridgerun.com/wiki/index.php?title=GstShark

 

Best regards,

0 Kudos
watari
Teacher
Teacher
1,157 Views
Registered: ‎06-16-2013
0 Kudos
so-lli1
Adventurer
Adventurer
1,143 Views
Registered: ‎11-26-2016

Hi @watari ,

first of all, thank you for your support!

As you suggested I did a measurement of CPU usage with gst-shark. This reflects what top already presented - a single CPU has a very high load and seems to reduce the framerate:

0:00:02.132731370  4958   0x556d915b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)0, load=(double)0.000000;
0:00:02.132859080  4958   0x556d915b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)1, load=(double)2.000000;
0:00:02.132893020  4958   0x556d915b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)2, load=(double)0.000000;
0:00:02.132927750  4958   0x556d915b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)3, load=(double)100.000000;

Same pipeline but tracing the fps (note that the framerate is higher than mentioned in the initial post since I increased CPU frequency for testing):

0:00:04.359835630  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)sink_proxypad0, fps=(uint)23;
0:00:04.359958670  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)capsfilter0_src, fps=(uint)23;
0:00:04.359990370  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)queue0_src, fps=(uint)23;
0:00:04.360019900  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)queue1_src, fps=(uint)23;
0:00:04.360048340  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)videoconvert0_src, fps=(uint)23;
0:00:04.360077350  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)sink_proxypad1, fps=(uint)23;
0:00:04.360105310  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)queue2_src, fps=(uint)23;
0:00:04.360133110  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)omxh264enc_omxh264enc0_src, fps=(uint)23;
0:00:04.360162810  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)v4l2src0_src, fps=(uint)24;

This is the corresponding pipeline that I used for measurement. The queue elements where added in order to make sure that multiple threads are generated. I would have expected an even share of CPU load between the cores, but this is not the case.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="cpuusage" gst-launch-1.0 -v v4l2src device=/dev/video0 ! queue ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! queue ! videoconvert ! queue ! omxh264enc ! fpsdisplaysink text-overlay=false

Even more interesting, when I exchange the omxh264enc plugin with the software codec x264enc the framerate is 60fps and the load is spread:

0:00:03.124405800  5062   0x55845f0b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)0, load=(double)32.989689;
0:00:03.124558260  5062   0x55845f0b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)1, load=(double)47.959183;
0:00:03.124595280  5062   0x55845f0b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)2, load=(double)61.000000;
0:00:03.124627780  5062   0x55845f0b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)3, load=(double)52.999996;

Again the same pipeline but tracing the fps:

0:00:02.859839130  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)v4l2src0_src, fps=(uint)59;
0:00:02.859978350  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)capsfilter0_src, fps=(uint)59;
0:00:02.860010820  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)queue0_src, fps=(uint)59;
0:00:02.860040470  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)sink_proxypad0, fps=(uint)59;
0:00:02.860069990  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)queue1_src, fps=(uint)59;
0:00:02.860099890  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)videoconvert0_src, fps=(uint)59;
0:00:02.860129810  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)queue2_src, fps=(uint)59;
0:00:02.860158550  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)sink_proxypad1, fps=(uint)59;
0:00:02.860187670  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)x264enc0_src, fps=(uint)59;

The corresponding pipeline:

GST_DEBUG="GST_TRACER:7" GST_TRACERS="cpuusage" gst-launch-1.0 -v v4l2src device=/dev/video0 ! queue ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! queue ! videoconvert ! queue ! x264enc ! fpsdisplaysink text-overlay=false

 

In my case the software codec solution is working at a higher FPS than using the VCU.

Hope this helps and you have some idea where to look.

 

Regards,
so-lli1

watari
Teacher
Teacher
1,107 Views
Registered: ‎06-16-2013

Hi @so-lli1 

 

Thank you for your sharing the result.

It's very interesting result for me.

 

It seems load balancing/cpu scheduling issue for CPU on SMP linux.

 

I'm not an expert about their regions.

But if possible to optimize their parameters, it works faster than previous.

 

Thank you for your sharing details, again.

Best regards,

0 Kudos
so-lli1
Adventurer
Adventurer
1,064 Views
Registered: ‎11-26-2016

Hi @watari 

I am not sure if this is really a balancing issue on SMP Linux, or if the CPU load is simply too high because something else is not designed as required.

Therefore I also attached my BD. Maybe you, or somebody else can spot a problem.

so-lli1_0-1596604944420.png

Thank you very much for your support.

Regards,
so-lli1

0 Kudos
abstract
Observer
Observer
959 Views
Registered: ‎06-12-2017

Hi @so-lli1 ,

I have a similar problem:

https://forums.xilinx.com/t5/Video-and-Audio/H-265-frame-rate-is-reduced-when-resolution-is-lowered/td-p/1140105

The image resolution has a huge impact to CPU usage. Attached is the log of GstShark of interlatency analysis. The log seems to show the delay between v4l2src and source pad of omxh265enc is too long (nearly 425[ms]).  Thus, I currently suspect there's a bug in frame buffer management, especially for non-normal frame resolutions (like 640x512).

Regards,

 

0 Kudos
so-lli1
Adventurer
Adventurer
953 Views
Registered: ‎11-26-2016

Hi @abstract ,

Your problem really reflects what I observed, but I never thought that the video resoultion might be the issue here. Thanks for letting me know.

I still hope somebody with more insight might be able to figure out what the problem is exactly. If you do, please keep me up to date.

Regards,

0 Kudos
so-lli1
Adventurer
Adventurer
822 Views
Registered: ‎11-26-2016

I also tried to use io-mode=5 (dmabuf-import), however I received "ERROR: from element /GstPipeline:pipeline0/GstV4l2Src:v4l2src0: Failed to allocate required memory."

gst-launch-1.0 -v v4l2src io-mode=5 device=/dev/video0 ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! videoconvert ! omxh264enc ! fpsdisplaysink text-overlay=false

According to /proc/meminfo, there should be enough memory available:

CmaTotal: 1024000 kB
CmaFree: 990252 kB

Using more Cma Memory (set using u-boot kernel commandline) did not change the behaviour.

 

 

0 Kudos
watari
Teacher
Teacher
791 Views
Registered: ‎06-16-2013

Hi @so-lli1 

 

I suggest you to make sure whether it make sense or not on media graph when you encounter "Failed to allocate required memory." issue.

So, would you make sure it by media-ctl command ?

 

Best regards,

0 Kudos
so-lli1
Adventurer
Adventurer
777 Views
Registered: ‎11-26-2016

Hi @watari,

thank you again for constantly trying to help me out here, however, I do not understand your last post.
Can you elaborate please.

Regards,
so-lli1

0 Kudos
kvasantr
Moderator
Moderator
721 Views
Registered: ‎04-12-2017

Hello @so-lli1 

1. Can you confirm what is the software version you are using for your VCU application?

2. Also can you tell us what is your complete video pipeline from capture ->encode -> Display? 

3. Which IPs are part of your pipeline? are they all Xilinx IPs and are you using corresponding Xilinx IP drivers for the same?

4. Have you tried running yavta capture tool? if you are using Xilinx framebuffer WR then you can refer following link for trying yavta capture in to memory

https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842236/Video+Framebuffer+Write

Above information will be important to debug further.

With regards

Kunal

 

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
so-lli1
Adventurer
Adventurer
711 Views
Registered: ‎11-26-2016

Hello @kvasantr,

I am using Petalinux 2019.2 with all the default tools and drivers.

The device-tree is auto generated by petalinux, however, I had to apply two patches in order to fix some inconsistencies between the TPG IP and petalinux 2019.2 as well as a patch to allow VCU usage with encoder/decoder only. Both patches are already used by Xilinx for newer Petalinux versions, so I think they are fine (referring to 306d604d323f99b27a5da643d8db8afe84112eb5 and fd4674c03f0df73d7a7ffd80978ddf0b2aef3093).

The pipeline consists of xilinx IP's only (please take a look a the post describing the BD). The pipeline on the software side is best described by the GStreamer pipeline and looks as follows:

v4l2src -> videoconvert -> omxh264env -> fpsdisplaysink

I was able to run yavta using /dev/video0 and the framerate is as expected. That's why I think the v4l2src should be fine:

 

root@xilinx-zcu104-2019_2:~# yavta --size 640x512 --format Y8 --capture /dev/video0 --capture=20
Device /dev/video0 opened.
Device `vcap_tp0 output 0' on `platform:vcap_tp0:0' is a video output (without mplanes) device.
Video format set: Y8 (59455247) 640x512 field none, 1 planes: 
 * Stride 640, buffer size 327680
Video format: Y8 (59455247) 640x512 field none, 1 planes: 
 * Stride 640, buffer size 327680
8 buffers requested.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 0/0 mapped at address 0x7fb0238000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 1/0 mapped at address 0x7fb01e8000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 2/0 mapped at address 0x7fb0198000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 3/0 mapped at address 0x7fb0148000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 4/0 mapped at address 0x7fb00f8000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 5/0 mapped at address 0x7fb00a8000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 6/0 mapped at address 0x7fb0058000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 7/0 mapped at address 0x7fb0008000.
0 (0) [-] none 0 0 B 1123.802673 1123.802766 29.442 fps ts mono/EoF
1 (1) [-] none 1 0 B 1123.819601 1123.819690 59.074 fps ts mono/EoF
2 (2) [-] none 2 0 B 1123.836476 1123.836565 59.259 fps ts mono/EoF
3 (3) [-] none 3 0 B 1123.853417 1123.853505 59.028 fps ts mono/EoF
4 (4) [-] none 4 0 B 1123.870358 1123.870446 59.028 fps ts mono/EoF
5 (5) [-] none 5 0 B 1123.887299 1123.887387 59.028 fps ts mono/EoF
6 (6) [-] none 6 0 B 1123.904240 1123.904328 59.028 fps ts mono/EoF
7 (7) [-] none 7 0 B 1123.921181 1123.921187 59.028 fps ts mono/EoF
8 (0) [-] none 8 0 B 1123.938122 1123.938211 59.028 fps ts mono/EoF
9 (1) [-] none 9 0 B 1123.955063 1123.955151 59.028 fps ts mono/EoF
10 (2) [-] none 10 0 B 1123.972007 1123.972095 59.018 fps ts mono/EoF
11 (3) [-] none 11 0 B 1123.988944 1123.989033 59.042 fps ts mono/EoF
12 (4) [-] none 12 0 B 1124.005886 1124.005974 59.025 fps ts mono/EoF
13 (5) [-] none 13 0 B 1124.022905 1124.022994 58.758 fps ts mono/EoF
14 (6) [-] none 14 0 B 1124.039845 1124.039934 59.032 fps ts mono/EoF
15 (7) [-] none 15 0 B 1124.056793 1124.056895 59.004 fps ts mono/EoF
16 (0) [-] none 16 0 B 1124.073649 1124.073737 59.326 fps ts mono/EoF
17 (1) [-] none 17 0 B 1124.090668 1124.090756 58.758 fps ts mono/EoF
18 (2) [-] none 18 0 B 1124.107609 1124.107697 59.028 fps ts mono/EoF
19 (3) [-] none 19 0 B 1124.124549 1124.124638 59.032 fps ts mono/EoF
Captured 20 frames in 0.355929 seconds (56.190939 fps, 0.000000 B/s).
8 buffers released.

 

Thank you,
so-lli1

0 Kudos
aoifem
Moderator
Moderator
560 Views
Registered: ‎11-21-2018

Hi @so-lli1 

Thank you for the information. 

Although videotestsrc has proper GRAY8 support, omxh264/5enc does not yet have proper GRAY8 support. For example this pipeline will fail because of omxh264/5enc: 

root@vcu_trd:~# gst-launch-1.0 videotestsrc ! video/x-raw, width=1280, height=720, format=GRAY8, framerate=30/1 ! omxh264enc ! fpsdisplaysink name=fpssink text-overlay=false 'video-sink=fakesink' sync=true -v
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
/GstPipeline:pipeline0/GstFPSDisplaySink:fpssink/GstFakeSink:fakesink0: sync = true
ERROR: from element /GstPipeline:pipeline0/GstVideoTestSrc:videotestsrc0: Internal data stream error.
Additional debug info:
../../../../git/libs/gst/base/gstbasesrc.c(3072): gst_base_src_loop (): /GstPipeline:pipeline0/GstVideoTestSrc:videotestsrc0:
streaming stopped, reason not-negotiated (-4)
ERROR: pipeline doesn't want to preroll.
Setting pipeline to NULL ...
Freeing pipeline ...

 

This is the reason for the low framerate. I am attaching a 'hack' which should fix this issue, however please be aware this is not an official patch and has not been fully tested. Development are looking at adding support for GRAY8 in a future release. 

PG252 page 203 seems to imply that GRAY8 is supported. I have flagged this with development as a mistake, and they will fix it in a future release. Some gstreamer elements (like videotestsrc) already support GRAY8 but encoder does not. 

Aoife
Product Application Engineer - Xilinx Technical Support EMEA


**~ Got a minute? Answer our Vitis HLS survey here! ~**

**~ Don't forget to reply, give kudos, and accept as solution.~**

View solution in original post