cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
shahan.a
Participant
Participant
723 Views
Registered: ‎06-25-2019

VCU frame corruption with main profile

Hi,
I am trying to run h264/h265 encoding using the VCU currently for 1080p@60 fps using the GStreamer pipeline that sources 8 cyclic buffers from PS DDR coming from the SDI RX via the VDMA. 
 
appsrc->queue->omxh264enc->queue->filesink
 
This is the overview of the pipeline we are running with the omxh264enc configured to the following settings for better image quality. 
target-bitrate = 60000
number of slices = 8
gop-mode = basic
b-frames = 0
gop-length =16
profile = main
cpb size = 500
initial-delay =500
 
But we are seeing random kernel crashes when running nearly 1min, also frames are getting out of order, like seeing some prior frames from the encoder. 
We tried some possible experiments to rule out any issue with VDMA or pipeline. The kernel crash & frame corruption ( getting older frames at multiple instances) issues is coming when running in main or high.
 
With the baseline profile, we were rarely getting the kernel crash but no frame corruption issue is observed except for the bad quality of the image. 
 
Query:
Should we use the "prefetch-buffer" option using the encoder buffer in the VCU design in order to run the pipeline with the main profile? The VCU document specifies pipelines where we see the prefetch buffer is enabled.
 
What should be the encoder parameters to be used for main or high profiles for 1080p at 60 fps NV12 format?
 
Attaching the kernal Crash log

 

 

 

[   88.188616] Configured vdma with YUV frame addresses
[  152.819701] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
[  152.833423] Mem abort info:
[  152.836200]   ESR = 0x96000005
[  152.839238]   Exception class = DABT (current EL), IL = 32 bits
[  152.845140]   SET = 0, FnV = 0
[  152.848178]   EA = 0, S1PTW = 0
[  152.851303] Data abort info:
[  152.854167]   ISV = 0, ISS = 0x00000005
[  152.857986]   CM = 0, WnR = 0
[  152.860941] user pgtable: 4k pages, 39-bit VAs, pgdp = 000000003f6595a2
[  152.867543] [0000000000000018] pgd=0000000000000000, pud=0000000000000000
[  152.874325] Internal error: Oops: 96000005 [#1] SMP
[  152.879184] Modules linked in: dlnx(O) al5d(O) al5e(O) allegro(O) xlnx_vcu_clk xlnx_vcu xlnx_vcu_core mali(O) uio_pdrv_genirq [las]
[  152.892398] CPU: 0 PID: 5979 Comm: vcu-app Tainted: G           O      4.19.0-xilinx-v2019.1 #1
[  152.901083] Hardware name: xlnx,zynqmp (DT)
[  152.905252] pstate: 00000085 (nzcv daIf -PAN -UAO)
[  152.910030] pc : idr_find+0x8/0x20
[  152.913421] lr : find_vpid+0x44/0x50
[  152.916985] sp : ffffff8008003d90
[  152.920284] x29: ffffff8008003d90 x28: 0000000000000001 
[  152.925587] x27: ffffff8008d66928 x26: ffffff800921ac10 
[  152.930891] x25: ffffffc87bba1600 x24: ffffff8009198648 
[  152.936194] x23: 0000000000000038 x22: ffffff8008003f04 
[  152.941497] x21: 0000000000000000 x20: ffffff8009198648 
[  152.946801] x19: ffffff8000c24588 x18: ffffff80091a92c8 
[  152.952105] x17: 0000000000000000 x16: 0000000000000000 
[  152.957408] x15: 0000000000000000 x14: ffffffc875c43100 
[  152.962712] x13: ffffffc875c43000 x12: ffffffc875c43028 
[  152.968015] x11: ffffffc875c43101 x10: 0000000000000040 
[  152.973319] x9 : ffffff80091aafc8 x8 : ffffffc87b400268 
[  152.978622] x7 : 0000000000000000 x6 : ffffffc87b400240 
[  152.983926] x5 : ffffffc87b400428 x4 : 000000000000002c 
[  152.989230] x3 : 00000000ffffffff x2 : 0000000000000000 
[  152.994533] x1 : 000000000000082f x0 : 0000000000000008 
[  152.999838] Process vcu-app (pid: 5979, stack limit = 0x000000007e9b0f63)
[  153.006607] Call trace:
[  153.009040]  idr_find+0x8/0x20
[  153.012077]  find_vpid+0x44/0x50
[  153.015292]  irq_handler+0x70/0xd8 [dlnx]
[  153.019293]  __handle_irq_event_percpu+0x6c/0x168
[  153.023987]  handle_irq_event_percpu+0x34/0x88
[  153.028414]  handle_irq_event+0x40/0x98
[  153.032233]  handle_fasteoi_irq+0xc0/0x198
[  153.036313]  generic_handle_irq+0x24/0x38
[  153.040305]  __handle_domain_irq+0x60/0xb8
[  153.044385]  gic_handle_irq+0x5c/0xb8
[  153.048030]  el1_irq+0xb0/0x140
[  153.051156]  release_task.part.3+0x34c/0x478
[  153.055417]  do_exit+0x61c/0x980
[  153.058629]  __arm64_sys_exit+0x14/0x18
[  153.062450]  el0_svc_common+0x84/0xd8
[  153.066103]  el0_svc_handler+0x68/0x80
[  153.069834]  el0_svc+0x8/0xc
[  153.072701] Code: a8c17bfd d65f03c0 a9bf7bfd 910003fd (b9401002) 
[  153.078784] ---[ end trace 3817f32f6d49de58 ]---
[  153.083383] Kernel panic - not syncing: Fatal exception in interrupt
[  153.089722] SMP: stopping secondary CPUs
[  153.093636] Kernel Offset: disabled
[  153.097107] CPU features: 0x0,20802004
[  153.100838] Memory Limit: none
[  153.103879] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

 

 

 

Tags (2)
0 Kudos
6 Replies
watari
Professor
Professor
638 Views
Registered: ‎06-16-2013

Hi @shahan.a 

 

>Should we use the "prefetch-buffer" option using the encoder buffer in the VCU design in order to run the pipeline with the main profile? The VCU document specifies pipelines where we see the prefetch buffer is enabled.

 

I guess yes.

 

BTW, did you set QoS parameter before launch your encoring ?

At least, it seems that frame corruption is related with QoS parameter.

 

Best regards,

0 Kudos
shahan.a
Participant
Participant
566 Views
Registered: ‎06-25-2019

Hi Watari,

1. I tried setting the QoS settings for the AXI HP ports connected 

These are the values being set in the AFIFM register 

for the encoder connected AXI HP0 & HP1 port 

devmem 0xFD380008 w 0x3
devmem 0xFD38001C w 0x3
devmem 0xFD390008 w 0x3
devmem 0xFD39001C w 0x3
devmem 0xFD380004 w 0xF
devmem 0xFD380018 w 0xF
devmem 0xFD390004 w 0xF
devmem 0xFD390018 w 0xF

reference from https://www.xilinx.com/html_docs/registers/ug1087/ug1087-zynq-ultrascale-registers.html 

2. Enabled the Optional encoder buffer and tried with prefetch-buffer= TRUE 

 With both these changes again we are facing the frame tearing/getting older frames after sometimes in the encoding pipeline using Main profile.

FYI 

 Our frame is being sourced from the appsrc  Gstreamer element, having a custom module to program the VDMA and maps the 8 cyclic buffers to userspace, there is handshaking mechanism based on interrupt received from VDMA to read one frame behind from the 8 cyclic buffers VDMA writes periodically at 16.667 ms interval. 

With this implementation, it is not showing issues in baseline profile but in main profile.

This is the pipeline we are trying out:

Appsrc ! rawvideoparse width=1920 height=1080 format=NV12 framerate=60/1 ! queue ! omxh264enc prefetch-buffer=TRUE  b-frames=0 Target-bitrate=60000 gop-mode=basic control-rate=constant num-slices=8 initial-delay=250 cpb-size=500 ! video/x-h264,profile=main,alignment=au ! queue ! h264parse ! matroskamux ! filesink 

We have verified there is no issue from the handshaking mechanism, frames are properly read from VDMA in the appsrc.

Query: With setting both QoS port setting as well as enabling the prefetch-buffer as per your suggestion, we are still getting the frame corruption , Can you suggest what could be the change required, any parameter need to be changed? 

 

0 Kudos
watari
Professor
Professor
546 Views
Registered: ‎06-16-2013

Hi @shahan.a 

 

>Query: With setting both QoS port setting as well as enabling the prefetch-buffer as per your suggestion, we are still getting the frame corruption , Can you suggest what could be the change required, any parameter need to be changed? 

 

From your 1st post, I suspected transaction issue on internal bus.

So, I gave my suggestion to improve transaction rate as QoS setting and prefetch-buffer.

However, you are still facing an issue as frame corruption.

I guess this is caused by handshake procedure, especial performance on appsrc.

Would you try performance analysis to investigate the route cause on AXI4 bus and an internal crossbar switch ?

 

Best regards,

0 Kudos
shahan.a
Participant
Participant
513 Views
Registered: ‎06-25-2019

Hi Watari,

Can you please go through the procedure that we are following here, any possible suggestion out of that ?

BTW, We are very thankful for your involvement with our issue.

>Would you try performance analysis to investigate the route cause on AXI4 bus and an internal crossbar switch ?

Will try to figure out this, could you give some more context on this, I am actually somewhat new to these areas?

> I guess this is caused by handshake procedure, especial performance on appsrc.

We also don't think it can be a transaction issue, we are confused a bit based on the experiments we tried.

The procedure that we are following is:

We allocate 8 buffers for 1080p NV12 format in the PS DDR, configure the VDMA with the start addresses of these buffers to start the cyclic DMA transfer to these buffers. Every time w.r.t frame competition the VDMA triggers an interrupt which the custom driver handles & notify the appsrc so as to push frame one by one to encoder plugin.  We have the register from VDMA also to know which frame it has completed writing. Based on these inputs, the appsrc will be reading one frame behind the VDMA write index so as to prevent any overlap issue, this is verified that the appsrc is reading one frame behind the VDMA write pointer traversing through the cyclic buffers.

NOTE: With this implementation, we didn't saw the issue when trying out baseline profile, We were able to get encoding pipeline run more than 5 min using the baseline profile with the same handshaking mechanism in place, but when using the main profile it is showing this frame corruption/ ordering issue hardly after the 1 min or even less. 

If It was some basic handshaking issue, should we be getting the same issue with baseline profile also? 

The artifact we are seeing is like after some time, in the output we see frames jumping backwards and proceeding again from that point, this repeats like somewhere the old frame data is getting stored. We also observe that after sometime it is also getting back to normal. ( at the instant when it is back to normal,  it will jump directly to the latest frame location , we experimented with a stopwatch on the image scene)

FYI: 0 - 1 min , it is okay , at 1 min 5 sec the issue starts where it jumps 8 frames back and proceed, this fallback happens every 300 ms. This continues till 1min 33 sec and after that it suddenly jumps to 1 min 35 sec, seems like it eliminates the some frames to get to the current location.

 From  appsrc, we printed the write index from vdma and the read index passed from appsrc to encoder, this seems in sync. we are running this at 60fps without any frame index overlapp/ missing.

 

0 Kudos
watari
Professor
Professor
461 Views
Registered: ‎06-16-2013

Hi @shahan.a 

 

>>Would you try performance analysis to investigate the route cause on AXI4 bus and an internal crossbar switch ?

>Will try to figure out this, could you give some more context on this, I am actually somewhat new to these areas?

>> I guess this is caused by handshake procedure, especial performance on appsrc.

>We also don't think it can be a transaction issue, we are confused a bit based on the experiments we tried.

 

I suggest you to observe system transaction with System ILA IP first.

You can confirm performance on internal bus.

 

>FYI: 0 - 1 min , it is okay , at 1 min 5 sec the issue starts where it jumps 8 frames back and proceed, this fallback happens every 300 ms. This continues till 1min 33 sec and after that it suddenly jumps to 1 min 35 sec, seems like it eliminates the some frames to get to the current location.

 

Also, I suggest you to analyze this phenomenon with gstshark, too.

 

Would you try them ?

 

Best regards,

0 Kudos
shahan.a
Participant
Participant
426 Views
Registered: ‎06-25-2019

> I suggest you observe system transactions with System ILA IP first. You can confirm performance on the internal bus.

Sure, I will check on this.

Also, I suggest you analyze this phenomenon with gstshark, too

Yes, we are yet to analyze with gst shark, In our latest test we had increased the AXI memory mapped clock from 227MHz to 250 MHz. 

The current pipeline observation is that with sync=true, appsrc is pushing at 60fps but after some 40sec run, all frames drops happening between 4-5 sec, it seems like some bottleneck in the pipeline. And the 2K frame size 3.1 MB getting copied at the OMxencoder plugin using CPU cycles shouldn't get a bottleneck right? 

will check the performance analysis with gst-shark as well as system ILA to get performance on the internal bus.

Thanks

 

0 Kudos