cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
rbriegel
Contributor
Contributor
856 Views
Registered: ‎09-11-2018

How to make PL-DDR accessible for Vitis (Vision) Kernels on ZCU104

Hi,

I am running into a memory bandwith problem on my ZCU104 board, so I am planning to install PL-RAM and configuring my Vivado Design accordingly. The first goal is to do this, so that I can read and write to the PL-DDR from PS via devmem successfully (currently waiting on the correct SODIMM to be delivered).
The next step would be to make this memory accessible to XRT/zocl so that my Vitis Kernels use the PL-DDR instead of PS-DDR. I already exposed AXI-Slave Ports connected to the MIG in vivado to be visible by vitis via:

set_property PFM.AXI_PORT {S01_AXI {memport "MIG" sptag "PL0" memory ""} S02_AXI {memport "MIG" sptag "PL1" memory ""} S03_AXI {memport "MIG" sptag "PL2" memory ""} S04_AXI {memport "MIG" sptag "PL3" memory ""}} [get_bd_cells /axi_interconnect_4]


My Understanding is, that in Vitis I can then for example give the following option to v++ linker to connect my Kernel to PL-DDR:

--connectivity.sp myKernel.input_1:PL0
--connectivity.sp myKernel.output_1:PL1


No my Questions are the following:


1. Do I need to configure my device tree so that my PS sees the PL-DDR as memory/reserved-memory?

I only want to use the PL-DDR for my Kernels, so I don't want linux to use/map stuff into that memory. On the otherhand, I need to access Input/Output Data from my Kernel via CL Buffers from User Space via clEnqueueMapBuffer().

The zocl github page states the following:
"zocl also supports memory management of PL-DDRs and PL-BRAMs. PL-DDR is reserved by zocl driver via device tree. Both PS Linux and PL logic can access PL-DDRs"

I can't find anything on how to modify the device tree entry for zocl. Currently it is looking like this:

&amba {
	axi_intc_0: axi-interrupt-ctrl {
		#interrupt-cells = <2>;
		compatible = "xlnx,xps-intc-1.00.a";
		interrupt-controller;
		reg = <0x0 0xa9000000 0x0 0x10000>;
		xlnx,kind-of-intr = <0x0>;
		xlnx,num-intr-inputs = <0x20>;
		interrupt-parent = <&gic>;
		interrupts = <0 89 1>;
	};
        zyxclmm_drm {
                compatible = "xlnx,zocl";
                status = "okay";
                reg = <0x0 0xA0000000 0x0 0x800000>;
		interrupt-parent = <&axi_intc_0>;
		interrupts = <0  4>, <1  4>, <2  4>,   4>,
			     <4  4>, <5  4>, <6  4>, <7  4>,
			     <8  4>, <9  4>, <10 4>, <11 4>,
			     <12 4>, <13 4>, <14 4>, <15 4>,
			     <16 4>, <17 4>, <18 4>, <19 4>,
			     <20 4>, <21 4>, <22 4>, <23 4>,
			     <24 4>, <25 4>, <26 4>, <27 4>,
			     <28 4>, <29 4>, <30 4>, <31 4>;
        };
....

};

 

2. What else do I need to do to make this work ?


 Thanks in Advance!

0 Kudos
12 Replies
stephenm
Moderator
Moderator
804 Views
Registered: ‎09-12-2007

The DTG will handle the memory node for you (see system-top.dts). So, you dont need to do anything here

0 Kudos
rbriegel
Contributor
Contributor
792 Views
Registered: ‎09-11-2018

Hi stephen,

Thanks for replying!
Unfortunately the system-top.dts did not change:

 

* CAUTION: This file is automatically generated by Xilinx.
 * Version:  
 * Today is: Thu Oct  1 11:37:31 2020
 */


/dts-v1/;
#include "zynqmp.dtsi"
#include "zynqmp-clk-ccf.dtsi"
#include "zcu104-revc.dtsi"
#include "pl.dtsi"
#include "pcw.dtsi"
/ {
	chosen {
		bootargs = "earlycon clk_ignore_unused";
		stdout-path = "serial0:115200n8";
	};
	aliases {
		ethernet0 = &gem3;
		i2c0 = &i2c1;
		serial0 = &uart0;
		serial1 = &uart1;
		spi0 = &qspi;
	};
	memory {
		device_type = "memory";
		reg = <0x0 0x0 0x0 0x7ff00000>;
	};
};
#include "system-user.dtsi"

 

in the pl.dtsi an entry is generated:

 

ddr4_0: ddr4@500000000 {
compatible = "xlnx,ddr4-2.2";
reg = <0x00000005 0x00000000 0x00000001 0x00000000>;
};

 


My RAM came in the mail today, and I could verifiy that the hardware part is working by writing und reading to the PL-RAM Adresses via devmem.
Linux does not see the RAM as memory, which makes sense when I look at the generated device trees. This is OK for me as long as I can use the PL-RAM for my vitis accelerators anyway.
1. Is this correct as it is or do I have to modify the device trees?

In Vitis I can see and link to the exposed PL interfaces (see initial post), but when v++ tries to link with them I get the following errors:

 

CFGEN 83-2230: No interfaces defined for sp {PL2}, specifically PL2	Hardware	/test	 	C/C++ Problem
make: *** [sd_card] Error 1	Debug	/test_system	 	C/C++ Problem
make: *** [vitis_accel.xclbin] Error 1	Hardware	/test	 	C/C++ Problem
recipe for target 'vitis_accel.xclbin' failed	makefile	/test/Hardware	line 122	C/C++ Problem
SYSTEM_LINK 82-36] [15:22:10: cfgen failed	Hardware	/test	 	C/C++ Problem
SYSTEM_LINK 82-79: Unable to create system connectivity graph	Hardware	/test	 	C/C++ Problem
SYSTEM_LINK 82-96: Error applying explicit connections to the system connectivity graph	Hardware	/test	 	C/C++ Problem
v++ 60-626: Kernel link failed to complete	Hardware	/test	 	C/C++ Problem
v++ 60-661: v++ link run 'run_link' failed	Hardware	/test	 	C/C++ Problem
v++ 60-703: Failed to finish linking	Hardware	/test	 	C/C++ Problem

 

Regarding this, I have a few  more questions:

2. Is this maybe because I did not specifiy the "memory" portion in this vivado command?

set_property PFM.AXI_PORT {S01_AXI {memport "M_AXI_GP" sptag "PL0" memory ""} S02_AXI {memport "MIG" sptag "PL1" memory ""} S03_AXI {memport "MIG" sptag "PL2" memory ""} S04_AXI {memport "MIG" sptag "PL3" memory ""}} [get_bd_cells /axi_interconnect_4]

3. What would be the right string to put in there? Is it "zynq_ultrascale_ps_e_0 CO_DDR4_S_AXI" or "zynq_ultrascale_ps_e_0 CO_DDR4_ADDRESS_BLOCK" or "zynq_ultrascale_ps_e_0 SEG_ddr4_0_C0_DDR4_ADDRESS_BLOCK" or "ddr4_0 C0_DDR4_ADDRESS_BLOCK" or something else ?

4. When exposing Master AXI Interfaces in Vivado to Vitis via eg. the following command, do those Master ports have to be on the same interconnect where the Slave AXI Interfaces are exposed and the PL-DDR is connected to?:

set_property PFM.AXI_PORT {M10_AXI {memport "M_AXI_GP" sptag "GP"}} [get_bd_cells /axi_interconnect]



Thanks!

0 Kudos
stephenm
Moderator
Moderator
782 Views
Registered: ‎09-12-2007

The Linux on the V++ are two different things. 

If you want the Linux to be able to use your PL_DDR, then you would need to update the memory node in the DTS. You can do this in the system-user.dtsi

Reference:

https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842412/Accessing+BRAM+In+Linux

 

You need to updated the PFM memory properties.

 

Can you share the XSA file

0 Kudos
rbriegel
Contributor
Contributor
775 Views
Registered: ‎09-11-2018

Hi Stephen,

thanks for the prompt reply!

I understand that those are different. I just wanted to confirm that it is not necessary for XRT/zocl/VITIS Accelerators to see the PL-Memory in Linux at runtime in order to address the buffers correctly.

I understand that I need to change the memory settings in the PFM, I just don't know how exactly (questions number 2 and 3 are my major concern, maybe you could have a look again).

I could upload my XSA at the beginning of the next week, do you have a cloud link where I can upload it for your eyes only or should I pm you a download link?

Thanks

0 Kudos
doonny
Explorer
Explorer
530 Views
Registered: ‎07-28-2013

hi @rbriegel have you solved this problem ?   I am also working on this, and would you like to share your results ?

0 Kudos
rbriegel
Contributor
Contributor
518 Views
Registered: ‎09-11-2018

Hi @doonny ,

Sorry I forgot to come back to this post after I solved this!

The S-AXI Interface you want your Kernels attach to, has to be on an AXI-Interconnect, where another S-AXI is connected to an M-AXI ZYNQ port, and the M-AXI of the Interconnect is connected to your DDR4 Controller. The trick then is to name the sptag something containing the word "MIG". The XRT is looking for this string in the sptag and not (like any sane person would imagine) in the memport variable, although this has to be "MIG" as well. The XRT will fail to create the buffers in the PL-DDR at runtime if the name of the sptag does not contain "MIG". The memory variable in my case had to be "ddr4_0 C0_DDR4_ADDRESS_BLOCK" (you can get this from the Address Editor in Vivado).

Hope this helps!

0 Kudos
doonny
Explorer
Explorer
500 Views
Registered: ‎07-28-2013

@rbriegelthx for the reply.  Could do share your experience on how to create buffers on the PL-DDR side, and transfers data between the host memory and the PL buffers ?  Will MigrateMemObjects API still work  on buffers created on the PL-DDR side ?

0 Kudos
rbriegel
Contributor
Contributor
464 Views
Registered: ‎09-11-2018

Everything is handled by OpenCL and the underlying XRT, so nothing changes codewise in your application. Be sure to follow the steps above though, otherwise XRT will treat your buffers wrong.

0 Kudos
doonny
Explorer
Explorer
442 Views
Registered: ‎07-28-2013

@rbriegelthx, I could run the program with no problem now.   Have you tested the efficiency of host2PL-ddr data transfer speed ?   My results show that copying data from host memory to PL-ddr throw the MIG interface is extremely slow (only around 200MB/s) compared to migrating buffers in the PS memory (up to 2000MB/s):

 

pl-ddr.png

Hi@stephenm  Is there any way to improve the host 2 PL-DDR data transfer speed ?  or I am not using the MIG interface correctly ?

0 Kudos
rbriegel
Contributor
Contributor
420 Views
Registered: ‎09-11-2018

Can your Increase Your AXI-Bus sizes? One thing we did to increase speed further is to add and AXI-Cache in front of the PL-DRR.

0 Kudos
doonny
Explorer
Explorer
410 Views
Registered: ‎07-28-2013

The size of the MIG axi-bus is restricted to 128-bit on the zcu102 board.  By axi-cache, do you mean the system Cache IP ?

 

bus.PNG

0 Kudos
rbriegel
Contributor
Contributor
405 Views
Registered: ‎09-11-2018

Oh is it really? I just checked on the ZCU104 it is 512.  Do you have an Interconnect between the ZYNQ and the DDR? Maybe it's the ZYNQs 128 bit Data width which is blocking you from using the full 512 bit?

Yep, System Cache IP is what I mean. It's a very buggy IP, not every combination of lines and size will compile.

0 Kudos