Showing results for 
Show  only  | Search instead for 
Did you mean: 
Registered: ‎07-09-2014

Zynq Baremetal PS DMA (PL330) working example here !

Hi all,


I want to share some knowledge, actually experience about Zynq PS DMA, a.k.a PL 330 IP of ARM and a working example for me, which communicates a custom IP in PL part. Then my purpose is from time to time improve the usefullness of the core and program for different applications.


I also want comments and additions from experienced users if any, and share experiences.


The story of the reason I post this subject is, I needed to sent data from PL side of the Zynq to the DDR RAM. The source of the data is a MAXIM IC (MAX2769), which gives raw GPS data, for maximum resolution and sampling rate it is total of 4-bit ADC output at 16.384 MHz clock. The architecture of the design is below in the picture.



I thought of first creating a custom IP core in Vivado having an AXI4-Lite slave interface. It buffers 16 bytes of data, then gives an interrupt to the PS side, in C code I wrote an ISR to that interrupt and use simple Xil_In functions to grab the data to the DDR RAM. However, when I test my system by using last byte of the buffer an incrementing counter, I saw in PS side that the system misses some samples. For example each time I grab the 16 bytes data I want last bytes increment 1, but sometimes
they increment 2, even 3. Which makes me think that the overhead of data transfer takes too much time.


After that I decided to try PS DMA IP of ARM. I created a custom IP again AXI4-Lite slave interface, buffers 16 bytes of data, gives an interrupt to the PS. In C code, I wrote an ISR and start DMA transfer for 16-bytes of data. Well when I check the counter values there was some improvement but still missing samples. I was looking for using AXI_DMA and HP port of Zynq and creating an AXI4-Stream interface custom IP. But then I realized the BW I want is not so much big! It is only 64 mbps, and the BW of PS DMA is well beyond this value. I understand I am not using full performance of the IP.


I decided to try buffering 128 bytes of data instead of 16 bytes. And guess what, It worked perfectly! My problem is solved but I become curious about PS-PL communication, BW requirements and implementations of PL and PS sides.


There is a great white paper from Xilinx "Leveraging Data-Mover IPs for Data Movement in Zynq-7000 AP SoC Systems"(WP453). It summarizes various methods of data transfer between PS-PL. Also for PS DMA there is a TRM from ARM "PrimeCell DMA Controller (PL330)". Zynq TRM also gives info about PS DMA. Even though all these documents are greatly prepared, inexperienced guys in the DMA area like me feels lack of various examples. Yes thanks to Xilinx they give an example code for both polling and interrupt based PS DMA usage, but I found only this one. Also there is a linux driver but not much suited for baremetal applications I think. Understanding the concept from linux drivers and other formum talks was not very easy for me.

A good discussion can also be found here: *


Also I want to mention Mr. Adam Taylor's chronicles for MicroZed. It is a great, maybe the greatest source for info and working examples for Zynq. It helped me a lot.


I have a working example with PS DMA for 64 mbps BW from a custom IP of mine. I attached the codes for both PL and PS side. My next aim is to find out:


- What is the performance limit of PS DMA? In Adam Taylor's chronicles it is said the GP port of Zynq has a theoretical 600 Mbps BW for read/write. But DMA has to be utilized. Otherwise the BW is limited to 25 Mbps. I will try to find out optimum burst size, data length and other things to achieve this.
- Which is a better option for programming aspects: PS DMA IP or PL AXI_DMA IPs? Pros & Cons of these IPs?
- Creating working examples for PS DMA, data transfer from PS to PL and PL to PS. Using alternative coding methods, utilizing DMA peripheral requests.
- Creating working examples for PL AXI_DMA IPs. Utilizing HP ports and ACP ports.


Any help is apprecieted.





Tags (3)
0 Kudos
1 Reply
Registered: ‎03-22-2016

@bbinb You should look at 

There is an entire wiki for performance analysis


But you should pay special attention to these documents, they were of special help to me:

XAPP1219 System Performance Analysis of an All Programmable SoC

XAPP792 Designing High-Performance Video Systems with the Zynq-7000 All Programmable SoC
 --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos