07-28-2020 03:20 AM
I use the omxh264enc gstreamer plugin in order to interface the VCU. Encoding works, however, the framrate after encoding is very low even if the bandwidth of the system is high, the input video stream has low resolution and the targeted bitrate is very low.
The following pipeline without encoder works fine with 60fps, no frames are dropped. CPU ultilization is around 0.x%. This ensures that the video source is fine. frames are delivered as expected.
gst-launch-1.0 -v v4l2src device=/dev/video0 ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! videoconvert ! fpsdisplaysink text-overlay=false
Using the Encoder, the framerate drops to 14fps, no frames are dropped and CPU utilization is around 25% (using busybox top, seems like it uses 1Core to 100%)
gst-launch-1.0 -v v4l2src device=/dev/video0 ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! videoconvert ! omxh264enc ! fpsdisplaysink text-overlay=false
According to PG252 Page 311 I already checked:
Since the omx module seems to make use of the dmaproxy module, I checkt that it is loaded and can be used. Looks fine to me.
When I unload it (rmmod dmaproxy), the pipeline issues a warning "MA channel is not available, CPU move will be performed", the framerate is still at 14fps with a CPU usage of 25%. So it does not seem to have any effect.
However, it seems that the limiting factor is the CPU. The current frequency for ACPU is set to 750MHz. Sure the core allows higher frequencies, but I doubt that such a high frequency is required for a low resolution image and suspect the problem somewhere else.
I would appreciate any help resolving the issue.
Thanks,
so-lli1
10-21-2020 03:08 AM
Hi @so-lli1
Thank you for the information.
Although videotestsrc has proper GRAY8 support, omxh264/5enc does not yet have proper GRAY8 support. For example this pipeline will fail because of omxh264/5enc:
root@vcu_trd:~# gst-launch-1.0 videotestsrc ! video/x-raw, width=1280, height=720, format=GRAY8, framerate=30/1 ! omxh264enc ! fpsdisplaysink name=fpssink text-overlay=false 'video-sink=fakesink' sync=true -v Setting pipeline to PAUSED ... Pipeline is PREROLLING ... /GstPipeline:pipeline0/GstFPSDisplaySink:fpssink/GstFakeSink:fakesink0: sync = true ERROR: from element /GstPipeline:pipeline0/GstVideoTestSrc:videotestsrc0: Internal data stream error. Additional debug info: ../../../../git/libs/gst/base/gstbasesrc.c(3072): gst_base_src_loop (): /GstPipeline:pipeline0/GstVideoTestSrc:videotestsrc0: streaming stopped, reason not-negotiated (-4) ERROR: pipeline doesn't want to preroll. Setting pipeline to NULL ... Freeing pipeline ...
This is the reason for the low framerate. I am attaching a 'hack' which should fix this issue, however please be aware this is not an official patch and has not been fully tested. Development are looking at adding support for GRAY8 in a future release.
PG252 page 203 seems to imply that GRAY8 is supported. I have flagged this with development as a mistake, and they will fix it in a future release. Some gstreamer elements (like videotestsrc) already support GRAY8 but encoder does not.
**~ Got a minute? Answer our Vitis HLS survey here! ~**
07-28-2020 02:37 PM
Hi @so-lli1
Did you make sure io-mode in v4l2src and set proper QoS parameter to CCI-400 ?
Best regards,
07-29-2020 05:27 AM - edited 07-29-2020 07:30 AM
Hi @watari ,
Thanks for pointing out the io-mode setting of the v4l2src plugin.
Default setting is "auto", however using explicit settings like "dmabuf" (also used in some pipelines in pg252) did not change anything.
Regarding CCI-400, please elaborate. As far as I can tell HP0-3 are not connected to CCI and therefore it should not matter.
However, another look in the datasheet revealed that the QoS can also be set for HP0-3. Tried this also, but had no effect on the framerate.
Edit:
PG252 also states that all interrupts are served by CPU0. I checkt /proc/interrupts and moved the framebuffer and al5e interrupts to different CPUs using smp_affinity. Also without success.
Furthermore I set the DDR_QOS_CTRL Register of Port3-5 to Best Effort (BE). No effect on the framerate.
Quite frustrating. Any other ideas?
Thanks,
so-lli1
07-29-2020 02:56 PM
Hi @so-lli1
Did you confirm pipeline diagram with dot file and CPU Usage via gsh-shark ?
If no, I suggest you to try them to investigate root cause.
https://developer.ridgerun.com/wiki/index.php?title=GstShark
Best regards,
07-29-2020 09:22 PM - edited 07-29-2020 09:23 PM
Hi @so-lli1
FYI, if you don't know and refer UG1449 yet.
https://www.xilinx.com/support/documentation/user_guides/ug1449-multimedia.pdf#page=78
https://www.xilinx.com/support/documentation/user_guides/ug1449-multimedia.pdf#page=79
Best regards
07-29-2020 11:09 PM
Hi @watari ,
first of all, thank you for your support!
As you suggested I did a measurement of CPU usage with gst-shark. This reflects what top already presented - a single CPU has a very high load and seems to reduce the framerate:
0:00:02.132731370 4958 0x556d915b20 TRACE GST_TRACER :0:: cpuusage, number=(uint)0, load=(double)0.000000; 0:00:02.132859080 4958 0x556d915b20 TRACE GST_TRACER :0:: cpuusage, number=(uint)1, load=(double)2.000000; 0:00:02.132893020 4958 0x556d915b20 TRACE GST_TRACER :0:: cpuusage, number=(uint)2, load=(double)0.000000; 0:00:02.132927750 4958 0x556d915b20 TRACE GST_TRACER :0:: cpuusage, number=(uint)3, load=(double)100.000000;
Same pipeline but tracing the fps (note that the framerate is higher than mentioned in the initial post since I increased CPU frequency for testing):
0:00:04.359835630 5116 0x558b25ab20 TRACE GST_TRACER :0:: framerate, pad=(string)sink_proxypad0, fps=(uint)23; 0:00:04.359958670 5116 0x558b25ab20 TRACE GST_TRACER :0:: framerate, pad=(string)capsfilter0_src, fps=(uint)23; 0:00:04.359990370 5116 0x558b25ab20 TRACE GST_TRACER :0:: framerate, pad=(string)queue0_src, fps=(uint)23; 0:00:04.360019900 5116 0x558b25ab20 TRACE GST_TRACER :0:: framerate, pad=(string)queue1_src, fps=(uint)23; 0:00:04.360048340 5116 0x558b25ab20 TRACE GST_TRACER :0:: framerate, pad=(string)videoconvert0_src, fps=(uint)23; 0:00:04.360077350 5116 0x558b25ab20 TRACE GST_TRACER :0:: framerate, pad=(string)sink_proxypad1, fps=(uint)23; 0:00:04.360105310 5116 0x558b25ab20 TRACE GST_TRACER :0:: framerate, pad=(string)queue2_src, fps=(uint)23; 0:00:04.360133110 5116 0x558b25ab20 TRACE GST_TRACER :0:: framerate, pad=(string)omxh264enc_omxh264enc0_src, fps=(uint)23; 0:00:04.360162810 5116 0x558b25ab20 TRACE GST_TRACER :0:: framerate, pad=(string)v4l2src0_src, fps=(uint)24;
This is the corresponding pipeline that I used for measurement. The queue elements where added in order to make sure that multiple threads are generated. I would have expected an even share of CPU load between the cores, but this is not the case.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="cpuusage" gst-launch-1.0 -v v4l2src device=/dev/video0 ! queue ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! queue ! videoconvert ! queue ! omxh264enc ! fpsdisplaysink text-overlay=false
Even more interesting, when I exchange the omxh264enc plugin with the software codec x264enc the framerate is 60fps and the load is spread:
0:00:03.124405800 5062 0x55845f0b20 TRACE GST_TRACER :0:: cpuusage, number=(uint)0, load=(double)32.989689; 0:00:03.124558260 5062 0x55845f0b20 TRACE GST_TRACER :0:: cpuusage, number=(uint)1, load=(double)47.959183; 0:00:03.124595280 5062 0x55845f0b20 TRACE GST_TRACER :0:: cpuusage, number=(uint)2, load=(double)61.000000; 0:00:03.124627780 5062 0x55845f0b20 TRACE GST_TRACER :0:: cpuusage, number=(uint)3, load=(double)52.999996;
Again the same pipeline but tracing the fps:
0:00:02.859839130 5098 0x558d8afb20 TRACE GST_TRACER :0:: framerate, pad=(string)v4l2src0_src, fps=(uint)59; 0:00:02.859978350 5098 0x558d8afb20 TRACE GST_TRACER :0:: framerate, pad=(string)capsfilter0_src, fps=(uint)59; 0:00:02.860010820 5098 0x558d8afb20 TRACE GST_TRACER :0:: framerate, pad=(string)queue0_src, fps=(uint)59; 0:00:02.860040470 5098 0x558d8afb20 TRACE GST_TRACER :0:: framerate, pad=(string)sink_proxypad0, fps=(uint)59; 0:00:02.860069990 5098 0x558d8afb20 TRACE GST_TRACER :0:: framerate, pad=(string)queue1_src, fps=(uint)59; 0:00:02.860099890 5098 0x558d8afb20 TRACE GST_TRACER :0:: framerate, pad=(string)videoconvert0_src, fps=(uint)59; 0:00:02.860129810 5098 0x558d8afb20 TRACE GST_TRACER :0:: framerate, pad=(string)queue2_src, fps=(uint)59; 0:00:02.860158550 5098 0x558d8afb20 TRACE GST_TRACER :0:: framerate, pad=(string)sink_proxypad1, fps=(uint)59; 0:00:02.860187670 5098 0x558d8afb20 TRACE GST_TRACER :0:: framerate, pad=(string)x264enc0_src, fps=(uint)59;
The corresponding pipeline:
GST_DEBUG="GST_TRACER:7" GST_TRACERS="cpuusage" gst-launch-1.0 -v v4l2src device=/dev/video0 ! queue ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! queue ! videoconvert ! queue ! x264enc ! fpsdisplaysink text-overlay=false
In my case the software codec solution is working at a higher FPS than using the VCU.
Hope this helps and you have some idea where to look.
Regards,
so-lli1
08-02-2020 10:04 PM
Hi @so-lli1
Thank you for your sharing the result.
It's very interesting result for me.
It seems load balancing/cpu scheduling issue for CPU on SMP linux.
I'm not an expert about their regions.
But if possible to optimize their parameters, it works faster than previous.
Thank you for your sharing details, again.
Best regards,
08-04-2020 10:23 PM
Hi @watari
I am not sure if this is really a balancing issue on SMP Linux, or if the CPU load is simply too high because something else is not designed as required.
Therefore I also attached my BD. Maybe you, or somebody else can spot a problem.
Thank you very much for your support.
Regards,
so-lli1
08-17-2020 10:31 PM - edited 08-17-2020 10:50 PM
Hi @so-lli1 ,
I have a similar problem:
The image resolution has a huge impact to CPU usage. Attached is the log of GstShark of interlatency analysis. The log seems to show the delay between v4l2src and source pad of omxh265enc is too long (nearly 425[ms]). Thus, I currently suspect there's a bug in frame buffer management, especially for non-normal frame resolutions (like 640x512).
Regards,
08-17-2020 10:47 PM
Hi @abstract ,
Your problem really reflects what I observed, but I never thought that the video resoultion might be the issue here. Thanks for letting me know.
I still hope somebody with more insight might be able to figure out what the problem is exactly. If you do, please keep me up to date.
Regards,
09-16-2020 03:20 AM
I also tried to use io-mode=5 (dmabuf-import), however I received "ERROR: from element /GstPipeline:pipeline0/GstV4l2Src:v4l2src0: Failed to allocate required memory."
gst-launch-1.0 -v v4l2src io-mode=5 device=/dev/video0 ! video/x-raw,format=GRAY8,width=640,height=512,framerate=60/1 ! videoconvert ! omxh264enc ! fpsdisplaysink text-overlay=false
According to /proc/meminfo, there should be enough memory available:
CmaTotal: 1024000 kB CmaFree: 990252 kB
Using more Cma Memory (set using u-boot kernel commandline) did not change the behaviour.
09-16-2020 03:01 PM
Hi @so-lli1
I suggest you to make sure whether it make sense or not on media graph when you encounter "Failed to allocate required memory." issue.
So, would you make sure it by media-ctl command ?
Best regards,
09-16-2020 10:42 PM
Hi @watari,
thank you again for constantly trying to help me out here, however, I do not understand your last post.
Can you elaborate please.
Regards,
so-lli1
09-23-2020 06:35 AM
Hello @so-lli1
1. Can you confirm what is the software version you are using for your VCU application?
2. Also can you tell us what is your complete video pipeline from capture ->encode -> Display?
3. Which IPs are part of your pipeline? are they all Xilinx IPs and are you using corresponding Xilinx IP drivers for the same?
4. Have you tried running yavta capture tool? if you are using Xilinx framebuffer WR then you can refer following link for trying yavta capture in to memory
https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842236/Video+Framebuffer+Write
Above information will be important to debug further.
With regards
Kunal
09-23-2020 07:16 AM
Hello @kvasantr,
I am using Petalinux 2019.2 with all the default tools and drivers.
The device-tree is auto generated by petalinux, however, I had to apply two patches in order to fix some inconsistencies between the TPG IP and petalinux 2019.2 as well as a patch to allow VCU usage with encoder/decoder only. Both patches are already used by Xilinx for newer Petalinux versions, so I think they are fine (referring to 306d604d323f99b27a5da643d8db8afe84112eb5 and fd4674c03f0df73d7a7ffd80978ddf0b2aef3093).
The pipeline consists of xilinx IP's only (please take a look a the post describing the BD). The pipeline on the software side is best described by the GStreamer pipeline and looks as follows:
v4l2src -> videoconvert -> omxh264env -> fpsdisplaysink
I was able to run yavta using /dev/video0 and the framerate is as expected. That's why I think the v4l2src should be fine:
root@xilinx-zcu104-2019_2:~# yavta --size 640x512 --format Y8 --capture /dev/video0 --capture=20
Device /dev/video0 opened.
Device `vcap_tp0 output 0' on `platform:vcap_tp0:0' is a video output (without mplanes) device.
Video format set: Y8 (59455247) 640x512 field none, 1 planes:
* Stride 640, buffer size 327680
Video format: Y8 (59455247) 640x512 field none, 1 planes:
* Stride 640, buffer size 327680
8 buffers requested.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 0/0 mapped at address 0x7fb0238000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 1/0 mapped at address 0x7fb01e8000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 2/0 mapped at address 0x7fb0198000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 3/0 mapped at address 0x7fb0148000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 4/0 mapped at address 0x7fb00f8000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 5/0 mapped at address 0x7fb00a8000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 6/0 mapped at address 0x7fb0058000.
length: 1 offset: 3744146864 timestamp type/source: mono/EoF
Buffer 7/0 mapped at address 0x7fb0008000.
0 (0) [-] none 0 0 B 1123.802673 1123.802766 29.442 fps ts mono/EoF
1 (1) [-] none 1 0 B 1123.819601 1123.819690 59.074 fps ts mono/EoF
2 (2) [-] none 2 0 B 1123.836476 1123.836565 59.259 fps ts mono/EoF
3 (3) [-] none 3 0 B 1123.853417 1123.853505 59.028 fps ts mono/EoF
4 (4) [-] none 4 0 B 1123.870358 1123.870446 59.028 fps ts mono/EoF
5 (5) [-] none 5 0 B 1123.887299 1123.887387 59.028 fps ts mono/EoF
6 (6) [-] none 6 0 B 1123.904240 1123.904328 59.028 fps ts mono/EoF
7 (7) [-] none 7 0 B 1123.921181 1123.921187 59.028 fps ts mono/EoF
8 (0) [-] none 8 0 B 1123.938122 1123.938211 59.028 fps ts mono/EoF
9 (1) [-] none 9 0 B 1123.955063 1123.955151 59.028 fps ts mono/EoF
10 (2) [-] none 10 0 B 1123.972007 1123.972095 59.018 fps ts mono/EoF
11 (3) [-] none 11 0 B 1123.988944 1123.989033 59.042 fps ts mono/EoF
12 (4) [-] none 12 0 B 1124.005886 1124.005974 59.025 fps ts mono/EoF
13 (5) [-] none 13 0 B 1124.022905 1124.022994 58.758 fps ts mono/EoF
14 (6) [-] none 14 0 B 1124.039845 1124.039934 59.032 fps ts mono/EoF
15 (7) [-] none 15 0 B 1124.056793 1124.056895 59.004 fps ts mono/EoF
16 (0) [-] none 16 0 B 1124.073649 1124.073737 59.326 fps ts mono/EoF
17 (1) [-] none 17 0 B 1124.090668 1124.090756 58.758 fps ts mono/EoF
18 (2) [-] none 18 0 B 1124.107609 1124.107697 59.028 fps ts mono/EoF
19 (3) [-] none 19 0 B 1124.124549 1124.124638 59.032 fps ts mono/EoF
Captured 20 frames in 0.355929 seconds (56.190939 fps, 0.000000 B/s).
8 buffers released.
Thank you,
so-lli1
10-21-2020 03:08 AM
Hi @so-lli1
Thank you for the information.
Although videotestsrc has proper GRAY8 support, omxh264/5enc does not yet have proper GRAY8 support. For example this pipeline will fail because of omxh264/5enc:
root@vcu_trd:~# gst-launch-1.0 videotestsrc ! video/x-raw, width=1280, height=720, format=GRAY8, framerate=30/1 ! omxh264enc ! fpsdisplaysink name=fpssink text-overlay=false 'video-sink=fakesink' sync=true -v Setting pipeline to PAUSED ... Pipeline is PREROLLING ... /GstPipeline:pipeline0/GstFPSDisplaySink:fpssink/GstFakeSink:fakesink0: sync = true ERROR: from element /GstPipeline:pipeline0/GstVideoTestSrc:videotestsrc0: Internal data stream error. Additional debug info: ../../../../git/libs/gst/base/gstbasesrc.c(3072): gst_base_src_loop (): /GstPipeline:pipeline0/GstVideoTestSrc:videotestsrc0: streaming stopped, reason not-negotiated (-4) ERROR: pipeline doesn't want to preroll. Setting pipeline to NULL ... Freeing pipeline ...
This is the reason for the low framerate. I am attaching a 'hack' which should fix this issue, however please be aware this is not an official patch and has not been fully tested. Development are looking at adding support for GRAY8 in a future release.
PG252 page 203 seems to imply that GRAY8 is supported. I have flagged this with development as a mistake, and they will fix it in a future release. Some gstreamer elements (like videotestsrc) already support GRAY8 but encoder does not.
**~ Got a minute? Answer our Vitis HLS survey here! ~**