UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer pstootman
Observer
8,498 Views
Registered: ‎03-29-2015

boot.S for OpenAMP mmu/cache config (Zynq A9, SDK2015.4)

With Zynq 7Z020 (dual A9), and BSP for PS1 generated by SDK2015.4 (I will have USE_AMP=1 defined);

Grateful for advice on this part of the boot.S code (in the 'standalone' part of the BSP);

What I mean is, why is the virtual=>physical non-cacheable mapping being done ?

 

#ifdef SHAREABLE_DDR
	/* Mark the entire DDR memory as shareable */
	ldr	r3, =0x3ff			/* 1024 entries to cover 1G DDR */
	ldr	r0, =TblBase			/* MMU Table address in memory */
	ldr	r2, =0x15de6			/* S=b1 TEX=b101 AP=b11, Domain=b1111, C=b0, B=b1 */
shareable_loop:
	str	r2, [r0]			/* write the entry to MMU table */
	add	r0, r0, #0x4			/* next entry in the table */
	add	r2, r2, #0x100000		/* next section */
	subs	r3, r3, #1
	bge	shareable_loop			/* loop till 1G is covered */
#endif

	/* In case of AMP, map virtual address 0x20000000 to 0x00000000  and mark it as non-cacheable */
#if USE_AMP==1
	ldr	r3, =0x1ff			/* 512 entries to cover 512MB DDR */
	ldr	r0, =TblBase			/* MMU Table address in memory */
	add	r0, r0, #0x800			/* Address of entry in MMU table, for 0x20000000 */
	ldr	r2, =0x0c02			/* S=b0 TEX=b000 AP=b11, Domain=b0, C=b0, B=b0 */
mmu_loop:
	str	r2, [r0]			/* write the entry to MMU table */
	add	r0, r0, #0x4			/* next entry in the table */
	add	r2, r2, #0x100000		/* next section */
	subs	r3, r3, #1
	bge	mmu_loop			/* loop till 512MB is covered */
#endif

 

Background; I have 1GB DDR, I like to use bottom 256MB for access by PS1(FreeRTOS) and some PL peripherals and the VRING buffers used between PS0<=>PS1, and the top 768MB dedicated to PS0(linux). In my linux device tree, the memory reserved for the remote is the bottom 256MB, like this;

 

&amba {
        remoteproc0: remoteproc@0
        {
                compatible = "xlnx,zynq_remoteproc";
                reg = < 0x00000000 0x10000000 >; /* memory reserved for the remote's
                /* firmware and shared memory
                /* between the two cores*/
                firmware = "firmware";
/*
                interrupts = < 0 37 4 0 38 4 >;
                interrupt-parent = <&intc>;
 		ipino = <8>;
*/
                vring0 = <15>; /* the soft interrupt ID for the master core */
                vring1 = <14>; /* the soft interrupt ID for the remote core */
        };
};

 

So the 512MB/512MB split in 2015.4 boot.S doesn't match my intended split between PS0/PS1, is that important?

What is the idea behind the SHAREABLE_DDR option (should it be on or off for AMP)  ?

 

I followed this older post

https://forums.xilinx.com/t5/OpenAMP/AMP-hangs/m-p/652410#M15

and the boot.S used with AMP with SDK 2014.4 looks more like this;

 

 

;	; In case of AMP, map virtual address 0x20000000 to 0x00000000  and mark it as non-cacheable
;#if USE_AMP==1
;	ldr	r3, =0x1ff			; 512 entries to cover 512MB DDR
;	ldr	r0, =TblBase			; MMU Table address in memory
;	add	r0, r0, #0x800			; Address of entry in MMU table, for 0x20000000
;	ldr	r2, =0x0c02			; S=b0 TEX=b000 AP=b11, Domain=b0, C=b0, B=b0
;mmu_loop:
;	str	r2, [r0]			; write the entry to MMU table
;	add	r0, r0, #0x4			; next entry in the table
;	add	r2, r2, #0x100000		; next section
;	subs	r3, r3, #1
;	bge	mmu_loop			; loop till 512MB is covered
;#endif

	; In case of AMP, mark address 0x00000000 - 0x2fffffff DDR as unassigned/reserved
	; and address 0x30000000 - 0x3fffffff DDR as non-shared inner cached only 
#if USE_AMP==1
	ldr	r3, =0x2ff			  ; 768 entries to cover 768MB DDR 
	ldr	r0, =TblBase			; MMU Table address in memory 
	ldr	r2, =0x0000			  ; S=b0 TEX=b000 AP=b00, Domain=b0, C=b0, B=b0 
mmu_loop:
	str	r2, [r0]			    ; write the entry to MMU table 
	add	r0, r0, #0x4			; next entry in the table 
	add	r2, r2, #0x100000	; next section 
	subs	r3, r3, #1
	bge	mmu_loop			    ; loop till 768MB is covered 

	movw r2, #0x4de6			; S=b0 TEX=b100 AP=b11, Domain=b1111, C=b0, B=b1
	movt r2, #0x3000      ; S=b0, Section start for address 0x30000000
mmu_loop1:              
	str	r2, [r0]			    ; write the entry to MMU table 
	add	r0, r0, #0x4			; next entry in the table 
	add	r2, r2, #0x100000	; next section 
	subs	r3, r3, #1
	bge	mmu_loop1			    ; loop till 128MB is covered 
#endif

 

So here the virtual mapping has been deliberately commented out and replaced; grateful for advice on this second boot.S also. When is 'non-shared inner cache only' the preferred setup ?

 

I'm diving down into all this is because I'm experiencing AMP hanging problems with PS0 and PS1 locking up together (and the JTAG 'attach to running target' also fails after the chip has stalled, with a DAP access error, so I'm not quite sure where to start the debugging). The hanging can happen after the PS1 and PL are asked to become very active (the PL has datamover IP in it that acts a master to move data via HPx ports between PL and the bottom part of the DDR (PS1 and PL need to operate on this same data). If PS1 and PL idle about (running but not doing very much), everything seems fine. If I compile my baremetal PS1 code to run on PS0 in non-AMP mode instead (cut the linux out of the system temporarily), then PS0(baremetal)/PL can work very actively together just fine. So I think the lockup is an AMP-related issue at this stage.

 

Also I'm wondering if the higher layers (the 2015.4 BSP code in xil_cache.c/h, and then above that the remoteproc/rpmsg initialisation functions) needs to match what is done in boot.S;

 

I see xil_cache.c functions avoid L2 cache, with this comment Xilinx put in xil_cache.c;

 

* 5.03	 pkp 10/07/15 L2 Cache functionalities are avoided for the OpenAMP slave
*					  application(when USE_AMP flag is defined for BSP) as master CPU
*					  would be utilizing L2 cache for its operation

It would be great though if I could set up for PS0 and PS1 to both use the L2 cache, for better performance of the PS1 code... if possible... why is L2 cache disabled in this way for PS1 in AMP ? I'm thinking that PS0/linux and PS1/FreeRTOS shall never be accessing DDR within the same 32-byte cache line, so why shall PS1 be deprived of the benefit of L2 cache ? Is it just to avoid possibility of thrashing ? I can see thrashing L1 cache may be big performance issue... but this won't happen (due to AMP at least) as each CPU has its own L1, so L2 cache might get 'a bit of a work out alternately helping populate the L1 of PS0 and PS1' but not 'completely thrashed' . Anyway this 'workout' of L2 cache could happen just the same with SMP... so why is it deemed a problem only for AMP

 

Sorry a lot of questions but this whole MMU/cache/memorysplit topic for AMP is not clear to me and I couldn't find a good document on how to actually progress from running the echo_test.c example, to running a customised full-blown PS0/PS1/PL system that does something more substantial.

 

Many thanks.

 

0 Kudos