ZynqMP+ Mali T400 rendering speed limit


I am trying to find the practical limit of triangle / frames that the Mali T400 can render while keeping up at 60 FPS on a 1024x600 display with a wayland integration.

With the program and hardware setup described below, I could reach around 32 000 triangles per frame before performance dips below 60 FPS. This number is lower than I expected considering the "0.11 Mtriangles/sec/MHz" reported in the ZynqpMP+ datasheet (page 2). What steps could I take to render more triangles per frame?

To render as many triangle as possible, I reused the sample program "weston-simple-egl" from the Weston (wayland compositor) project. I changed the rendering to draw a fullscreen window (1024x600) with a GL_TRIANGLE_STRIP spanning around 95% of the screen. I tested the program with 32 bits per pix (bpp) and 16 bpp, but couldn't make any significant gain. The Mali GPU ont the system is clocked at 600MHz. The vertex and fragment shader are respectivly passing the vertices and the fragment as is.

The bottleneck seems to be the `eglSwapBuffers` call. It takes more and more time as the number of triangle rises. With 32 000 triangles, it can take up to 18 ms (!), which explains the FPS drop. Unfortunatly, eglSwapBuffers is implemented by the closed source library libmali, so I couldn't dig deeper. I assume the `eglSwapBuffers` call returns when an IRQ comes back from the GPU indicating that the queued jobs are done.

So, in summary, am I effectivly hitting an hardware limit at 32 000 triangles per frame under wayland or is there something I could do to improve performance?

