01-29-2020 08:30 AM
How can I get the RPU to be I/O Coherent with the APU on a Zynq Ultrascale+ MPSoC device?
My goal is to use the I/O coherency of the RPU to see updated DDR locations that reside in the APU's L2-cache but not yet been written back to DDR. I want cache's enabled for both the RPU and APU, and I want the shared DDR loctions of interest to be cacheable. We currently use Cache flushes and invaidates for this, but would instead like to use the built-in CCI features of the MPSoC+ device.
I was able to get the example design from AR69446 runing just fine after porting it to Vivado/SDK v2018.2. I run the xaxicdma_example_simple_poll.c code on A53_0 as a baremetal application. I added xil_printf statements to the xaxicdma_example_simple_poll.c code and see that SrcBuffer is located in DDR at 0x20380 and DstBuffer is located in DDR at 0x420380. I also enabled caches on A53_0. When the CDMA transfers complete, I use XSCT to conenct to A53_3 (which does not have an application running on it) to look at the locations starting at DstBuffer ("mrd 0x420380 10" shows the first 10 locations) and I see that A53_3 sees DDR locations starting at 0x420380 have indeed been modified correctly by the CDMA, whereas the locations starting at 0x20380 have not been modified in DDR since
What I want to do next is to somehow get the RPU (R5_1 in my case) to see the SrcBuffer (0x20380) locations that have been modified by A53_0. The TRM (ug1085) indicates that the RPU's can be made I/O coherent to the APU's, and I intrepret that to mean that there is a way for the RPU's to be able to snoop into the L2-cache of the APU's. How do I do that?
After looking at the "Zynq UltraScale MPSoC Cache Coherency" Xilinx Wiki page, I added the following at the start of the xaxicdma_example_simple_poll.c code
I use an XSCT TCL script to run my applications on the ZCU104 board, I set the lpd_apu_LPD_SLCR Regsiter to enable broadcasting of inner and outer shareable transactions ("mwr 0xff41a040 7"). I also set the Snoop_Control_Register_S3(CCI400) Register ("mwr 0xFD6E4000 0x3") to enable issuing of snoop requests on port S3 of the CCI (port S3 connects the APU-L2 to the CCI) in this same TCL script. I also have the "Enable RPU Coherency" option checked in the Advanced_Configuration->CCI_Enablement configuration tool in Vivado.
I also run the following baremetal code on R5_1:
When I run xaxicdma_example_simple_poll.c on A53_0 and the above code on R5_1, then use XSCT to connect to R5_1 to examine DDR locations 0x20380 (SrcBuffer) and 0x420380 (DstBuffer). I see that DDR locations starting at 0x42080 have indeed been modified by the CDMA transfer, but R5_1 does not see the modified locations at 0x20380 (SrcBuffer). What do I need to do for R5_1 to see the locations in the L2-cache that have been modified by A53_0 but not yet been written back to DDR?
02-12-2020 04:06 PM
Hi @pfuchsstratasys ,
You need to modify a register to send R5 transactions to the MPSoC CCI rather than directly to DDR.
Please set the COHERENT bit in the register that corresponds to the RPU instance you want to make coherent with the A53.
Please keep in mind this only allows the R5 to snoop the A53 caches. It is not possible for the A53 to snoop the R5 caches.
If you haven't found it, we also have this Wiki. It outlines how to do some of the steps for enabling coherency when the A53 is running Linux. Linux doesn't allow the user to just modify MMU settings.