cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Adventurer
Adventurer
570 Views
Registered: ‎11-26-2016

VCU encoding framerate very low

I use the omxh264enc gstreamer plugin in order to interface the VCU. Encoding works, however, the framrate after encoding is very low even if the bandwidth of the system is high, the input video stream has low resolution and the targeted bitrate is very low.

The following pipeline without encoder works fine with 60fps, no frames are dropped. CPU ultilization is around 0.x%. This ensures that the video source is fine. frames are delivered as expected.

gst-launch-1.0 -v v4l2src device=/dev/video0 ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! videoconvert ! fpsdisplaysink text-overlay=false

Using the Encoder, the framerate drops to 14fps, no frames are dropped and CPU utilization is around 25% (using busybox top, seems like it uses 1Core to 100%)

gst-launch-1.0 -v v4l2src device=/dev/video0 ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! videoconvert ! omxh264enc ! fpsdisplaysink text-overlay=false

According to PG252 Page 311 I already checked:

  • "prefetch-buffer=true" -> No performance impact
  • "target-bitrate" very low -> No performance impact
  • "b-frames=0" -> No performance impact
  • "queue" -> No performance impact
  • SMMU is disabled in Device-Tree
  • Increased CMA Size from 1000MB to 1200MB

Since the omx module seems to make use of the dmaproxy module, I checkt that it is loaded and can be used. Looks fine to me.
When I unload it (rmmod dmaproxy), the pipeline issues a warning "MA channel is not available, CPU move will be performed", the framerate is still at 14fps with a CPU usage of 25%. So it does not seem to have any effect.

However, it seems that the limiting factor is the CPU. The current frequency for ACPU is set to 750MHz. Sure the core allows higher frequencies, but I doubt that such a high frequency is required for a low resolution image and suspect the problem somewhere else.

I would appreciate any help resolving the issue.

Thanks,
so-lli1

 

0 Kudos
12 Replies
Highlighted
Teacher
Teacher
525 Views
Registered: ‎06-16-2013

Hi @so-lli1 

 

Did you make sure io-mode in v4l2src and set proper QoS parameter to CCI-400 ?

 

Best regards,

Highlighted
Adventurer
Adventurer
480 Views
Registered: ‎11-26-2016

Hi @watari ,

Thanks for pointing out the io-mode setting of the v4l2src plugin.
Default setting is "auto", however using explicit settings like "dmabuf" (also used in some pipelines in pg252) did not change anything.

Regarding CCI-400, please elaborate. As far as I can tell HP0-3 are not connected to CCI and therefore it should not matter.
However, another look in the datasheet revealed that the QoS can also be set for HP0-3. Tried this also, but had no effect on the framerate.

Edit:

PG252 also states that all interrupts are served by CPU0. I checkt /proc/interrupts and moved the framebuffer and al5e interrupts to different CPUs using smp_affinity. Also without success.

Furthermore I set the DDR_QOS_CTRL Register of Port3-5 to Best Effort (BE). No effect on the framerate.

 

Quite frustrating. Any other ideas?

Thanks,
so-lli1

0 Kudos
Highlighted
Teacher
Teacher
445 Views
Registered: ‎06-16-2013

Hi @so-lli1 

 

Did you confirm pipeline diagram with dot file and CPU Usage via gsh-shark ?

If no, I suggest you to try them to investigate root cause.

 

https://developer.ridgerun.com/wiki/index.php?title=GstShark

 

Best regards,

0 Kudos
Highlighted
Teacher
Teacher
428 Views
Registered: ‎06-16-2013

0 Kudos
Highlighted
Adventurer
Adventurer
414 Views
Registered: ‎11-26-2016

Hi @watari ,

first of all, thank you for your support!

As you suggested I did a measurement of CPU usage with gst-shark. This reflects what top already presented - a single CPU has a very high load and seems to reduce the framerate:

0:00:02.132731370  4958   0x556d915b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)0, load=(double)0.000000;
0:00:02.132859080  4958   0x556d915b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)1, load=(double)2.000000;
0:00:02.132893020  4958   0x556d915b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)2, load=(double)0.000000;
0:00:02.132927750  4958   0x556d915b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)3, load=(double)100.000000;

Same pipeline but tracing the fps (note that the framerate is higher than mentioned in the initial post since I increased CPU frequency for testing):

0:00:04.359835630  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)sink_proxypad0, fps=(uint)23;
0:00:04.359958670  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)capsfilter0_src, fps=(uint)23;
0:00:04.359990370  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)queue0_src, fps=(uint)23;
0:00:04.360019900  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)queue1_src, fps=(uint)23;
0:00:04.360048340  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)videoconvert0_src, fps=(uint)23;
0:00:04.360077350  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)sink_proxypad1, fps=(uint)23;
0:00:04.360105310  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)queue2_src, fps=(uint)23;
0:00:04.360133110  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)omxh264enc_omxh264enc0_src, fps=(uint)23;
0:00:04.360162810  5116   0x558b25ab20 TRACE             GST_TRACER :0:: framerate, pad=(string)v4l2src0_src, fps=(uint)24;

This is the corresponding pipeline that I used for measurement. The queue elements where added in order to make sure that multiple threads are generated. I would have expected an even share of CPU load between the cores, but this is not the case.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="cpuusage" gst-launch-1.0 -v v4l2src device=/dev/video0 ! queue ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! queue ! videoconvert ! queue ! omxh264enc ! fpsdisplaysink text-overlay=false

Even more interesting, when I exchange the omxh264enc plugin with the software codec x264enc the framerate is 60fps and the load is spread:

0:00:03.124405800  5062   0x55845f0b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)0, load=(double)32.989689;
0:00:03.124558260  5062   0x55845f0b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)1, load=(double)47.959183;
0:00:03.124595280  5062   0x55845f0b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)2, load=(double)61.000000;
0:00:03.124627780  5062   0x55845f0b20 TRACE             GST_TRACER :0:: cpuusage, number=(uint)3, load=(double)52.999996;

Again the same pipeline but tracing the fps:

0:00:02.859839130  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)v4l2src0_src, fps=(uint)59;
0:00:02.859978350  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)capsfilter0_src, fps=(uint)59;
0:00:02.860010820  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)queue0_src, fps=(uint)59;
0:00:02.860040470  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)sink_proxypad0, fps=(uint)59;
0:00:02.860069990  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)queue1_src, fps=(uint)59;
0:00:02.860099890  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)videoconvert0_src, fps=(uint)59;
0:00:02.860129810  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)queue2_src, fps=(uint)59;
0:00:02.860158550  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)sink_proxypad1, fps=(uint)59;
0:00:02.860187670  5098   0x558d8afb20 TRACE             GST_TRACER :0:: framerate, pad=(string)x264enc0_src, fps=(uint)59;

The corresponding pipeline:

GST_DEBUG="GST_TRACER:7" GST_TRACERS="cpuusage" gst-launch-1.0 -v v4l2src device=/dev/video0 ! queue ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! queue ! videoconvert ! queue ! x264enc ! fpsdisplaysink text-overlay=false

 

In my case the software codec solution is working at a higher FPS than using the VCU.

Hope this helps and you have some idea where to look.

 

Regards,
so-lli1

Highlighted
Teacher
Teacher
378 Views
Registered: ‎06-16-2013

Hi @so-lli1 

 

Thank you for your sharing the result.

It's very interesting result for me.

 

It seems load balancing/cpu scheduling issue for CPU on SMP linux.

 

I'm not an expert about their regions.

But if possible to optimize their parameters, it works faster than previous.

 

Thank you for your sharing details, again.

Best regards,

0 Kudos
Highlighted
Adventurer
Adventurer
335 Views
Registered: ‎11-26-2016

Hi @watari 

I am not sure if this is really a balancing issue on SMP Linux, or if the CPU load is simply too high because something else is not designed as required.

Therefore I also attached my BD. Maybe you, or somebody else can spot a problem.

so-lli1_0-1596604944420.png

Thank you very much for your support.

Regards,
so-lli1

0 Kudos
Highlighted
Observer
Observer
230 Views
Registered: ‎06-12-2017

Hi @so-lli1 ,

I have a similar problem:

https://forums.xilinx.com/t5/Video-and-Audio/H-265-frame-rate-is-reduced-when-resolution-is-lowered/td-p/1140105

The image resolution has a huge impact to CPU usage. Attached is the log of GstShark of interlatency analysis. The log seems to show the delay between v4l2src and source pad of omxh265enc is too long (nearly 425[ms]).  Thus, I currently suspect there's a bug in frame buffer management, especially for non-normal frame resolutions (like 640x512).

Regards,

 

0 Kudos
Highlighted
Adventurer
Adventurer
224 Views
Registered: ‎11-26-2016

Hi @abstract ,

Your problem really reflects what I observed, but I never thought that the video resoultion might be the issue here. Thanks for letting me know.

I still hope somebody with more insight might be able to figure out what the problem is exactly. If you do, please keep me up to date.

Regards,

0 Kudos
Highlighted
Adventurer
Adventurer
94 Views
Registered: ‎11-26-2016

I also tried to use io-mode=5 (dmabuf-import), however I received "ERROR: from element /GstPipeline:pipeline0/GstV4l2Src:v4l2src0: Failed to allocate required memory."

gst-launch-1.0 -v v4l2src io-mode=5 device=/dev/video0 ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! videoconvert ! omxh264enc ! fpsdisplaysink text-overlay=false

According to /proc/meminfo, there should be enough memory available:

CmaTotal: 1024000 kB
CmaFree: 990252 kB

Using more Cma Memory (set using u-boot kernel commandline) did not change the behaviour.

 

 

0 Kudos
Highlighted
Teacher
Teacher
63 Views
Registered: ‎06-16-2013

Hi @so-lli1 

 

I suggest you to make sure whether it make sense or not on media graph when you encounter "Failed to allocate required memory." issue.

So, would you make sure it by media-ctl command ?

 

Best regards,

0 Kudos
Highlighted
Adventurer
Adventurer
49 Views
Registered: ‎11-26-2016

Hi @watari,

thank you again for constantly trying to help me out here, however, I do not understand your last post.
Can you elaborate please.

Regards,
so-lli1

0 Kudos