03-21-2018 03:07 AM
I have some problems with passing GPU output to the PL and HDMI output on ZCU102 board.
What I need to do is run OpenGL application on Mali and transfer output video stream into the PL.
I have acces to HDMI FrameBuffer Example Design and I also looked at Zynq UltraScale MPSoC Base TRD and Zynq UltraScale+ MPSoC: Embedded Design Tutorial.
My first try was add X11 libraries into petalinux system from HDMI FrameBuffer Example Design and run tricube application from EDT on this design. This fails on "can't open display".
So I have done some research and I think that solution is use DRM to initialize EGL. But every tutorial I found is using GBM library and it looks like GBM can't be added to rootfs through petalinux-config.
Could someone give me some advice, please?
03-27-2018 02:52 PM
Take a look on TRD design http://www.wiki.xilinx.com/Zynq+UltraScale+MPSoC+Base+TRD+2017.4
This example has HDMI Tx
04-12-2018 12:09 AM
Thanks for your answer. I know about TRD, I'm writing about it in the original post. But I was unable to find solution in it too. This design is a bit complicated and if I understand it correctly, GPU is utilized by Qt only. I need a way how to make it work with pure OpenGL application.
I'm using a different way right now, DP live video output interface. I'm able to get video stream into the PL with this. But it needs a monitor connected to DP connector, so I guess I need to edit some drivers, or fake EDID on DP's DCC lines on my final PCB.
So I have some (partial) solution, but I would prefer the way shown in TRD.
04-13-2018 03:48 PM
In theory, what you are trying to do should be possible.
Maybe you need to use modetest to get the HDMI register and then set the proper $DISPLAY variable.
A few comments on the The Zynq UltraScale+ MPSoC ZCU102 design might look complicated, but if you break it down into the essential parts you need. You are using HDMI Tx, so you need to look and see what is feeding it. The input is the Video Mixer, which gets its input from 3 possible places, but these are all memory interface, and one of them comes form the Zynq UltraScale+ MPSoC block, which should give you access to the output of the GPU, which is being written to DDR memory. So you should be able prototype the software side using the ZCU102 TRD and then once you get it working you could try to make a simpler hardware setup that includes the keep IP in the Video Data Path.
Another idea is to take a look at the HDMI FrameBuffer Example Design. This is a bit simpler design but again you could review the data patch and probably do some software development here and see if you can accomplish what you want and then simplify the design to closer to what your end application will need
Last, I'd also like to point out a few things about the DisplayPort live interface:
04-16-2018 08:33 AM
@chrisar, what needs to happen to escalate this use case to where Xilinx will dedicate some engineering time to creating a true GPU accelerated, PL framebuffer solution. I think there are enough requests in the forum to justify it at this point.
05-13-2020 09:28 AM
Wow.. how nice and helpful would that be.. but they can't even respond to this post after 2 full years. I really don't get it.. their hardware guys must have worked VERY hard to get that GPU working.. probably for some key customers.. but startups.. they get NO support. Somehow someone seems to not realize that startups are 50% of their potential future business... I guess I'm just going to use a plain FPGA and a dedicated ARM.
05-13-2020 09:30 AM
All this thread tells me is that IT CAN NOT BE DONE. It is saying STAGE AWAY from GPU programming on the Ultrascale. Don't build products using this product... since it means a world of hurt.
I mean even the people helping are No Sure...
06-23-2020 11:29 AM
I got a version of this working by defining a fake display with a custom edid(we have a very special display) and by modifying the GPU driver to directly DMA into the PL. This really has a limited use case because you can't do offscreen rendering or even multiple apps because all the GPU output is sent directly to the PL. Also, the GPU does scattered writes across the "DMA buffer" and we discovered the DMA transfer gets slower the more triangles we try to draw.
Ultimately I don't think having the GPU output passed to the PL will work very well; more likely modifying the display driver to DMA to the PL at the point where it normally outputs to the display controller(like DP) might work.
06-23-2020 02:02 PM
Thanks for your feedback. That is what I was thinking.. replace the DP logic. Is The Display Port logic driven by the GPU (feeding row by row) or does it pull from the buffer at its own pace and address access? Sorry I'm new to this and trying to find out how best to invest my time to find a solution.. if at all. I'm working on a custom, non linear / non traditional display and need GPU buffer access. I would set this up as the traditional double buffered display.. one written to by the GPU (next frame) and one being read by PL (current Frame).
06-25-2020 10:33 PM
I haven't done the work yet so I can't tell you for sure. However, the memory that the GPU DMAs into are all passed in buffers(or addresses I should say) and as far as I could tell it's memory, not a memory mapped address. My guess is that when the GPU has written all it's data to a buffer, it notifies another layer/driver/whatever and that piece of software deals with getting the data to the final destination. For example, the DP pipeline provide a buffer and when the GPU is done the DP pipeline handles the final DMA to the DP controller in HW.
Like I said, I haven't done the work so this is just based on the work I did hacking the GPU driver to DMA directly into a PL BRAM. This is just good enough for where we are at right now but long term it will have to change, especially because of the latency of the DMA into the PL.
07-08-2020 10:10 AM
we used the live output of DP to get graphics in the PL. But is not plain graphics, it's already mixed with video from DP live input.
We would like to have different independent graphics, is there a way to do this?
07-16-2020 07:10 AM
I had posted this idea elsewhere.. but as I get up to speed with zynq dev I had the following idea that may work:
OpenGL is memory mapped (you get a handle). The only way to get buffer access is via glReadPixel. That will always work as a solution to get access to the final render target of the GPU.. BUT making that call stalls the pipeline and creates a CPU memcopy.. very very slow. some have tired this and get 15 fps down from 50-60fps or worse.
I'm new to this but I believe AXI memory access is "trusted" and can access any of the DDR space as initiated from the PL. Why not upload a specific pattern after allocation and then search for that pattern. Once you find the pattern you get the hard address of the OpenGL render target. Keep in mind that this target will be in GPU format (probably just byte aligned in some way and RGBA may not be in the same order).. but it should be accessible without the GPU / CPU having a clue. Moreover it will probably be the same address every time or in a similar area so finding it in the future would be very fast. It is clearly a hack.. but one that should hold up.
There should be no performance hit other than the extra read overhead at the DDR controller. It would be very similar to the hit you get from attaching a DP display.
Still.. would it not be nice if Xilinx posted a solution.. since probably 1/2 of all ultrascale users need this? Who does not want to us PL as part of the graphics pipeline?
08-07-2020 05:19 AM
There is a Xilinx supported way to pass GPU output to the PL and most of it is straight from the TRD as the original reply mentioned.
They key thing you need is a Linux DRM display device/pipeline that represents/controls your display PL. In the case of the TRD, they are using HDMI Tx in the PL as a display output. This would need to be replaced by your custom PL. If you are using a piece of completely custom IP, Xilinx actually has a base Linux DRM device driver called "pl-disp" that represents a DRM CRTC and plane. You may need to implement the additional DRM pieces (encoder/connector) to get the system to fully work correctly, but after your device appears as a display, then the Mali can render to it using any of the regularly supported methods (fbdev, X11, Wayland, etc).
In addition, the TRD also mentions that DMABUF is supported my the GPU:
"The libMali user-space library implements the OpenGLES 2.0 API which is used by the Qt toolkit for hardware-accelerated graphics rendering. The Mali driver also supports DMABUF which provides a mechanism for sharing buffers between devices and frameworks through file descriptors without expensive memory copies (0-copy sharing)."
Since it's supported by both the GPU and the DRM frameworks, this could be another more low level method of doing the same (and some of the former mentioned display server/compositors might be doing this under the hood already).
08-07-2020 07:13 AM
08-07-2020 08:03 AM
Yes unfortunately that's probably a true statement. Luckily though there are lots of good examples and documentation about the DRM system and Xilinx has done a lot of the more difficult parts by creating most of the display pipeline for you. Here are some good resources on the subject: