UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Adventurer
Adventurer
13,349 Views
Registered: ‎05-01-2012

Reset one of the Zynq APUs

Jump to solution

I am running an AMP system with bare-metal apps on both cores.  I want one of the cores to be a supervisor of the other, let's say cpu0 will watch cpu1 and reset it if needed.

 

I currently have the reset functionality working fine, per the note in section 3.7 of the TRM.  However, the issue is that after the reset, the core seems to be stuck at 0x00000004 (according to the debugger) rather than be executing code at 0xFFFFFE00 waiting for for an event.  I have cpu0 issue the SEV, but nothing changes.  So, what's the proper way to get cpu1 running again?

 

I followed the updated instructions from XAPP1079 and I'm using version 14.7 of the tools.

0 Kudos
1 Solution

Accepted Solutions
Xilinx Employee
Xilinx Employee
17,943 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution

I've been updating xapp1079 for 2014.1 so I tried reseting and capturing cpu1 instead of interacting with the wfe loop at ~0xffffff00.

 

So I place fsbl at 0x00000000 (and will use its reset vector to capture cpu1), app_cpu0 at 0x01000000 and app_cpu1 at 0x02000000. In order to 'capture' cpu1 after a reset, I modified the standalone bsp. I use asm_vectors.S to provide a fixed location address that can be changed. This address will be written with the starting address for cpu1 before resetting cpu1. When cpu1 resets, it jumps to reset vec at 0, runs some code that determines which cpu the code is running on, and then jumps to the address stored at the end of the vector table. So addr 0x000000020 holds the address used for cpu0 to jump to and addr 0x00000024 holds the addr that cpu1 will jump to. By default, the FSBL BSP will contain the addr for 'OKToRun' at 0x000000020 and the addr for 'EndlessLoop0' at 0x000000024. Both those labels can be found in boot.S.

 

The first code that the cpu runs is found in the bsp boot.S file. I modified this code to load and jump to the address that was stored at 0x00000020 or 0x00000024 depending on which cpu the code is running on. Here's a snippet:

 

_boot:
/* Test which processor is running and jump to the catch address */

    mrc    p15,0,r1,c0,c0,5

    and    r1, r1, #0xf

    cmp    r1, #0

    bne    NotCpu0

    ldr    r0, =_cpu0_catch

    b cpuxCont NotCpu0:

    cmp    r1, #1

    bne    EndlessLoop0

    ldr    r0, =_cpu1_catch

    b cpuxCont

EndlessLoop0:

    wfe

    b EndlessLoop0

/* Jump to address pointed to by cpux_catch */

cpuxCont:

    ldr lr, [r0]

    bx    lr
OKToRun:

    mrc     p15, 0, r0, c0, c0, 0        /* Get the revision */

 

So now that fsbl contains code to capture cpu1 and jump to the app for cpu1. Note that in order to write to 0x00000024, you either need to disable cache for that region or flush cache after the write. The main() function for cpu0 will contain something like the following to re-start cpu1:

 

 

main()

{

.

.

    //Disable cache on fsbl vector table location

    Xil_SetTlbAttributes(0x00000000,0x14de2);           // S=b1 TEX=b100 AP=b11, Domain=b1111, C=b0, B=b0 .

.

.

        Xil_Out32(CPU1_CATCH, APP_CPU1_ADDR);

        Xil_Out32(XSLCR_UNLOCK_ADDR, XSLCR_UNLOCK_CODE);

        /* Assert and deassert cpu1 reset and clkstop using above sequence*/

        RegVal =     Xil_In32(A9_CPU_RST_CTRL);

        RegVal |= A9_RST1_MASK;

        Xil_Out32(A9_CPU_RST_CTRL, RegVal);

        RegVal |= A9_CLKSTOP1_MASK;

        Xil_Out32(A9_CPU_RST_CTRL, RegVal);

        RegVal &= ~A9_RST1_MASK;

        Xil_Out32(A9_CPU_RST_CTRL, RegVal);

        RegVal &= ~A9_CLKSTOP1_MASK;

        Xil_Out32(A9_CPU_RST_CTRL, RegVal);

        /* lock the slcr register access */

        Xil_Out32(XSLCR_LOCK_ADDR, XSLCR_LOCK_CODE);

 

As you mentioned, TCF has problems when the clock is stopped on cpu1. You can still debug the code but set a breakpoint on cpu0 after cpu1 is released from reset and continue to the breakpoint. Also, you can pause the app on cpu1 (probably wfe loop) and set a breakpoint at 0 to debug the boot.S code.

 

I've attached the completed cpu0 app, and customized standalone bsp that can be used for fsbl, and standalone on cpu0 and 1.

 

 

0 Kudos
30 Replies
Xilinx Employee
Xilinx Employee
13,332 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution

As soon as you correctly reset either cpu (stop/start clock and hit reset register), the cpu will jump to 0x00000000 which is the reset vector. You need to have some sort of code there. Even if it's as simple as a jump to the wfe loop.

 

I doubt the processor is stuck at 0x00000004. I believe this location is the undefined vector table entry. So the cpu is getting an undefined instruction after reset at address 0x00000000 so then it jumps to 4. At 0x00000004, there is another undefined instruction so it ends up looping forever.

0 Kudos
Adventurer
Adventurer
13,329 Views
Registered: ‎05-01-2012

Re: Reset one of the Zynq APUs

Jump to solution

But the code should already be there, since that should be the same code that ran from the power-on reset, right?  Why would this APU reset behave any differently?

 

I added the manual reset to the XAPP1079 code and when I debug that, cpu1 is executing at 0x110 which just jumps back to the wfe at 0x10c, and it just loops there.  That should all be handled by the compiler and linker, so how would I get this to run back to main()?

0 Kudos
Xilinx Employee
Xilinx Employee
13,312 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution

Not necessarily. The bootrom will use 0x00000000 at powerup, but then bootrom will load fsbl and jump to it. Normally, fsbl will place vectors at 0x00000000 (depending on the linkerscript).

 

When the standalone bsp starts to run for your app, it exists somewhere else in memory (depending on linkerscript). boot.S will detect where the application is loaded and configure VBAR to point to the 'vector_base' address as defined in the linkerscript. So it is at this point that your application vector table may exist somewere other than 0x00000000. But, a reset will set VBAR back to 0x00000000.

 

It sounds like the wfe that you are refering to is in boot.S where the standalone bsp checks to see if it was built for the correct cpu. Either you created the bsp for cpu0 but are running it on cpu1 or I recently saw a message on this forum stating that there is an outstanding CR if you are using gdb and it suggests to use system debugger.

Adventurer
Adventurer
13,309 Views
Registered: ‎05-01-2012

Re: Reset one of the Zynq APUs

Jump to solution

When I reset cpu0 from cpu1, it certainly does re-run the fsbl, but resetting cpu1 from cpu0 doesn't.  It seems to me like the fsbl should only run on a system reset, not an APU reset, but that's what I've observed.

 

So, is the issue that on power-up or system reset, the interrupt vectors at 0x00000000 are empty and so everything runs normally, but at an APU reset, those vectors are set to something so we get vectored off to something that now isn't initialized?  So that cpu1 is vectoring off to something in cpu0's memory space because its memory mapping has been undone by the reset?

 

It just seems like this should be an issue that Xilinx has solved already, how to configure things so that a watchdog reset on cpu1 will have it properly come back up.  I assume that it would have to do with having the watchdog reset interrupt handle something, but what does it need to do?  This shouldn't be application specific.

0 Kudos
Xilinx Employee
Xilinx Employee
13,305 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution

The reason cpu1 doesn't rerun fsbl after a reset is because the fsbl BSP (specifically boot.S) will test to make sure cpu0 is running this piece of code and if not, it will throw the cpu into a wfe loop forever.

 

If you try to sev, cpu1 will just continue the wfe loop in boot.S.

 

From power-up or system reset, the bootrom sort of does the equivalent of boot.S except it will send cpu1 up to the wfe loop at the top of memory. You could mimic the same behaviour by modifying the fsbl bsp boot.S file to send cpu1 back to the wfe loop at the top of memory.

 

Handling cpu1 reset is application dependant. Linux will either keep cpu1 within SMP or it will 'catch' cpu1 and send it to another memory location using remoteproc. Standalone/baremetal currently only runs on cpu0 or cpu1.

 

I don't recall if you were running standalone or Linux on cpu0. If you are running Linux, take a look at remoteproc. If you are running standalone, I suggest modifying boot.S where the cpu number is tested.

0 Kudos
Adventurer
Adventurer
13,301 Views
Registered: ‎05-01-2012

Re: Reset one of the Zynq APUs

Jump to solution

I think I understand what you're saying.  On power-on, cpu1 executes BootRom code which does almost nothing but have it sit in a WFE at 0xFFFFFFF0 until cpu0 sets the jump address and then sends SEV.  But on cpu reset, it just starts at 0x00000000, which is cpu0's entry point and the cpu0 bsp has code which checks to make sure the right cpu is running, and since it's not, it sits in a WFE loop forever.

 

So, you're saying that I can just change the FSBL boot.S to branch to 0xFFFFFFF0 after the wfe instead of branching to EnlessLoop0?  And shouldn't 0xFFFFFFF0 already have the proper address in it since app_0 had to write that after power-on to start cpu1 initially?

 

0 Kudos
Adventurer
Adventurer
13,277 Views
Registered: ‎05-01-2012

Re: Reset one of the Zynq APUs

Jump to solution

Ok, I've tried doing this in the SDK with JTAG, using the system debugger and not GDB.  I can see the WFE instruction at 0xFFFFFF24 and the B -16 instruction at 0xFFFFFF28, so what I don't understand is how writing the address at 0xFFFFFF20 gets it to actually go there.  I don't see any branch that would use that value.  So, since I want it to branch to 0x00200000 in order to start, isn't that the instruction I should write to 0xFFFFFF20?  But it seems like it should have this problem on power-up too, so what am I missing?

0 Kudos
Adventurer
Adventurer
13,271 Views
Registered: ‎05-01-2012

Re: Reset one of the Zynq APUs

Jump to solution

The other problem is that when I'm trying to debug this, I try stepping through the reset sequence on cpu0 and after it turns off the clock to cpu1, the program counter jumps to another place in the code.  I have no idea if it's executing the instructions in between those two points or not.  Is there something special that needs to be done in order to JTAG this issue?

0 Kudos
Xilinx Employee
Xilinx Employee
17,944 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution

I've been updating xapp1079 for 2014.1 so I tried reseting and capturing cpu1 instead of interacting with the wfe loop at ~0xffffff00.

 

So I place fsbl at 0x00000000 (and will use its reset vector to capture cpu1), app_cpu0 at 0x01000000 and app_cpu1 at 0x02000000. In order to 'capture' cpu1 after a reset, I modified the standalone bsp. I use asm_vectors.S to provide a fixed location address that can be changed. This address will be written with the starting address for cpu1 before resetting cpu1. When cpu1 resets, it jumps to reset vec at 0, runs some code that determines which cpu the code is running on, and then jumps to the address stored at the end of the vector table. So addr 0x000000020 holds the address used for cpu0 to jump to and addr 0x00000024 holds the addr that cpu1 will jump to. By default, the FSBL BSP will contain the addr for 'OKToRun' at 0x000000020 and the addr for 'EndlessLoop0' at 0x000000024. Both those labels can be found in boot.S.

 

The first code that the cpu runs is found in the bsp boot.S file. I modified this code to load and jump to the address that was stored at 0x00000020 or 0x00000024 depending on which cpu the code is running on. Here's a snippet:

 

_boot:
/* Test which processor is running and jump to the catch address */

    mrc    p15,0,r1,c0,c0,5

    and    r1, r1, #0xf

    cmp    r1, #0

    bne    NotCpu0

    ldr    r0, =_cpu0_catch

    b cpuxCont NotCpu0:

    cmp    r1, #1

    bne    EndlessLoop0

    ldr    r0, =_cpu1_catch

    b cpuxCont

EndlessLoop0:

    wfe

    b EndlessLoop0

/* Jump to address pointed to by cpux_catch */

cpuxCont:

    ldr lr, [r0]

    bx    lr
OKToRun:

    mrc     p15, 0, r0, c0, c0, 0        /* Get the revision */

 

So now that fsbl contains code to capture cpu1 and jump to the app for cpu1. Note that in order to write to 0x00000024, you either need to disable cache for that region or flush cache after the write. The main() function for cpu0 will contain something like the following to re-start cpu1:

 

 

main()

{

.

.

    //Disable cache on fsbl vector table location

    Xil_SetTlbAttributes(0x00000000,0x14de2);           // S=b1 TEX=b100 AP=b11, Domain=b1111, C=b0, B=b0 .

.

.

        Xil_Out32(CPU1_CATCH, APP_CPU1_ADDR);

        Xil_Out32(XSLCR_UNLOCK_ADDR, XSLCR_UNLOCK_CODE);

        /* Assert and deassert cpu1 reset and clkstop using above sequence*/

        RegVal =     Xil_In32(A9_CPU_RST_CTRL);

        RegVal |= A9_RST1_MASK;

        Xil_Out32(A9_CPU_RST_CTRL, RegVal);

        RegVal |= A9_CLKSTOP1_MASK;

        Xil_Out32(A9_CPU_RST_CTRL, RegVal);

        RegVal &= ~A9_RST1_MASK;

        Xil_Out32(A9_CPU_RST_CTRL, RegVal);

        RegVal &= ~A9_CLKSTOP1_MASK;

        Xil_Out32(A9_CPU_RST_CTRL, RegVal);

        /* lock the slcr register access */

        Xil_Out32(XSLCR_LOCK_ADDR, XSLCR_LOCK_CODE);

 

As you mentioned, TCF has problems when the clock is stopped on cpu1. You can still debug the code but set a breakpoint on cpu0 after cpu1 is released from reset and continue to the breakpoint. Also, you can pause the app on cpu1 (probably wfe loop) and set a breakpoint at 0 to debug the boot.S code.

 

I've attached the completed cpu0 app, and customized standalone bsp that can be used for fsbl, and standalone on cpu0 and 1.

 

 

0 Kudos
Adventurer
Adventurer
9,781 Views
Registered: ‎05-01-2012

Re: Reset one of the Zynq APUs

Jump to solution

I really appreciate this.  I got this implemented, and it works fine.  However, in my debugging of this, I did find that I had code that was overwriting 0x00000000 and that may have been causing problems with how it was working before, so this particular change may not have been necessary.  However, this code is a lot easier to understand what's going on and debug.

0 Kudos
9,443 Views
Registered: ‎10-12-2009

Re: Reset one of the Zynq APUs

Jump to solution

HI guys,

 

I also running an AMP system with bare-metal apps on both cores and want to cpu0 be supervisor for cpu1. I am using zed board and Xilinx 14.6.

I used your proposition for resetting and capturing cpu1 instead of interacting with the wfe loop at ~0xffffff00.

fsbl is at 0x00000000; app_cpu0 is at 0x01000000 and LENGTH = 0x1FC00000; app_cpu1 is at  0x1FC00000, LENGTH = 0x00300000.

Then, I modified the standalone bsp for fsbl (on cpu0 and cpu1 use standard bsp) according to the your proposition and execute cpu1 reset.

Cpu1 resets properly and works fine.

In my system, cpu0 running USB device application based on xusbps_intr_example, and after cpu1 reset it crashed in manner that app is still running (printf in while loop in main are executed), but USB traffic is broken. I believe that I have issue with interrupts and reset vector.

 

I checked reset vector for:

   fsbl on address 0x00000000 and only location 0x00000024 is changed from 0xCC to 0x1FC00000,

   cpu0 on address 0x01000000 is unchanged,

   cpu1 on address 0x1FC00000 is unchanged.

 

 Reset over A9_CPU_RST_CTRL is only cpu reset, it not reflect on peripherals as I understood from zynq TRM.

 

 q1: After cpu1 reset is executed is cpu0 is unattached?

 q2: As you mention in previous post

 .. When cpu1 resets, it jumps to reset vector at 0, runs some code that determines which CPU the code is running on, and then jumps to the address stored at the end of the vector table..

 it jumps to the fsbl reset vector at 0? what is with cpu1 reset vector?

 What can cause that behave on cpu0?

 

0 Kudos
Adventurer
Adventurer
9,416 Views
Registered: ‎08-05-2012

Re: Reset one of the Zynq APUs

Jump to solution

Thank you for this explanation John, in order to get the downloaded xapp1079 artifacts running on the Zedboard with Vivado 2014.2, I had to modify the boot.S, to comment out the part where the virtual address 0x20000000 is mapped 0.  You had these lines commented out in the ISE versions of the tutorial, but for some reason, the latest update for 2014.2 did not comment them out.  I've been documenting my experience with xapp 1079 in my Zynq blog.  I am trying to understand the low level code thoroughly, and I got very confused in app_cpu0.c about why you are writing APP_CPU1_ADDR to 0x24.  This thread at least confirmed what I suspected: 0x24 contains the address of the EndlessLoop0 (I don't yet understand the .word assembler syntax, so I am inferring from what you wrote in this thread), and you want to change it.  But because I haven't dug extensively into BSP before, things are still not crystal clear.  Can you please help me understand:

 

Firstly, Zynq TRM (ug585-Zynq-7000-TRM.pdf) section 3.7.2, "APU State After Reset" says that CPU1 is kept in  a WFE state while executing code located at address 0xFFFFFE00 to 0xFFFFFFF0.  And TRM section 6.3: “When CPU1 receives a system event, it immediately reads the contents of address 0xFFFFFFF0 and jumps to that address.  The steps for CPU0 to start an application on CPU1 are as follows:

  1. Write the address of the application for CPU1 to 0xFFFFFFF0.

  2. Execute the SEV instruction to cause CPU1 to wake up and jump to the application.”

Is there a reason you did it the way you did?  I know that the example works, and you want to get CPU1 to jump to the OKToRun label.  But if TRM is correct, CPU1 is actually stuck in an WFE loop around 0xFFFFFE00.  Why would CPU1 even start executing at address 0 (_vector_table) on its own, when those contents are not even copied to RAM until the FSBL copies the app0 image?  If CPU1 starts running on its own, I would think that it would find garbage at address 0, because FSBL on CPU0 is just starting to initialize at that time.

 

Like I said, I saw the example running on my Zedboard; I just don't understand HOW it works.  Thanks for any tip!

0 Kudos
Xilinx Employee
Xilinx Employee
9,387 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution
  1. Quite a few people had questions on how to reset cpu1 and control it. So instead of relying on the wfe loop at the top of OCM, I used the reset cpu1 method. The wfe loop works fine after the bootrom has placed cpu1 into the wfe loop but if you told cpu1 to start running code and for some reason wanted to restart cpu1, there is no guarantee that cpu1 is running the wfe loop. Resetting of cpu1 is documented somewhere on ARM's website.

So after the bootrom runs, it will create the wfe loop and then send cpu1 to the loop. The bootrom will then load the fsbl from the boot device. Note: FSBL lives at the bottom of memory and it is actually the FSBL's BSP that is responsible for populating the vector table at 0x00000000. If the FSBL source ever changes, one thing is for sure, the vector table will always have a known base address at 0x00000000. So the best place to provide an address that is used to re-direct either cpu0 or cpu1 is at the end of the vector table. That is why the magical address of 0x00000024.

 

Now at this point, cpu1 is still parked at the wfe loop at the top of memory and cpu0 is running the fsbl. FSBL will load cpu0 and cpu1 applications to memory and then jump to the address of the first application loaded to memory. This is why it is important that cpu0's application is next in-line within BOOT.BIN.

 

Once fsbl finishes and cpu0 starts running it's application, then cpu0 will stroke cpu1's reset. A reset will always force the cpu to jump to 0.

0 Kudos
Xilinx Employee
Xilinx Employee
9,387 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution

The reset vector will always end up at 0x00000000 after applying a reset. The base address of the vector table can be relocated but the relocation will be reset to 0 after a reset.

 

a1: after cpu1 reset, cpu0 should still continue running as expected. While porting xapp1078, I did find that the BSP for cpu1 would cause the global timer to be reset. Could it be possible that the usb intr example is using the global timer?

a2: When either cpu0 or cpu1 is reset, it will always jump to 0. The fsbl contains the vector table that is located at 0. If you take a look at the first few lines of code that are run by the BSP (boot.S), you will see code that checks what cpu is excersising the code. This check is then used to send the cpu to the appropriate address as configured at 0x00000020 for cpu0 and 0x00000024 for cpu1.

 

What you may want to do is use SDK system debugger to stop cpu1 at the beginning of the application (0x1fc00000). Once cpu0 issues the reset, SDK will hit the break point for cpu1 but cpu0 should continue working. If the usb app is still working, start stepping through cpu1's code. This action would help determine if cpu1's BSP is re-initializing something that cpu0 is using.

0 Kudos
Highlighted
9,310 Views
Registered: ‎10-12-2009

Re: Reset one of the Zynq APUs

Jump to solution

Tanks John,

I solved my problem with interrupts.

It was a Issue with ICD distributor. I configured ICD to pass interrupts to from CPU0 to CPU1 (INTR ID 91 to 89) over ICDIPTR22 register (0xF8F01858 ZYNQ TRM) by writing 0x02020202UL instead 0x01010101UL).

 

Now I trying to control (reset) CPU0 and CPU1 over MicroBlaze. I successfully port USB app on MB and now both CPUs are available for General purpose. I used your proposition for CPU1 reset and it works as I mentioned earlier. But, when I reset CPU0 system is crashed, even PL lost configuration. I suppose that FSBL is unsuccessfully executed.

 

My goal is to only rerun app for CPU0 not to rerun FSBL. Is it possible to configure system in that way?

Tags (2)
0 Kudos
Xilinx Employee
Xilinx Employee
9,292 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution

There should be no reason why you couldn't reset just cpu0.

 

One thing to keep in mind is that you want to cleanly shutdown the cpu. Otherwise, if bus activity is occuring and you kill the cpu in the middle of an access, you can run into problems since axi doesn't have a timeout mechanism.

 

A simple test that would allow you to ignore bus activity is connect to cpu0 using xmd and issue a 'stop'. Then try your cpu0 reset.

0 Kudos
9,249 Views
Registered: ‎10-12-2009

Re: Reset one of the Zynq APUs

Jump to solution

Using the same method as for cpu1 with 

Xil_Out32(CPU0_CATCH, APP_CPU0_ADDR);

and executing reset I properly reset cpu1.

 

I will have that in mind.

 

But I have another issue.

If both CPUs code and data segment are places at DDR only on CPU runs.

If one is in DDR and another is in OCM it works fine.

What can be issue? 

0 Kudos
Adventurer
Adventurer
8,078 Views
Registered: ‎08-05-2012

Re: Reset one of the Zynq APUs

Jump to solution

Hi John, I am reading through xapp1079 again.  I understand that OCM is 256 KB starting at 0xFFFC0000.  Is there a reason why you use a smaller range: 0xFFFF0000 ~ 0xFFFFFE00?

0 Kudos
Adventurer
Adventurer
8,075 Views
Registered: ‎08-05-2012

Re: Reset one of the Zynq APUs

Jump to solution

Here is a partial explanation if you are running Linux: Linux kernel suspend (part of pm subsystem) runs the last stage of suspend from OCM (after powering off the DDR?).  In ADI kernel's arch/arm/mach-zynq/pm.c zynq_pm_suspend_init(), zynq_sys_suspend_sz number of bytes are copied into the OCM base.  zynq_sys_suspend_sz is calculated in <kernel>/arch/arm/mach-zynq/suspend.S:

 

ENTRY(zynq_sys_suspend_sz)
.word . - zynq_sys_suspend

 

which means: zynq_sys_suspend_sz is the size of the assembly function that starts at ENTRY(zynq_sys_suspend) in the same file (line 50).  Just counting the lines from that point to the .word label above (line 182), and subtracting empty and comment lines, I'd say it's about 100 lines of assembly, so I'd ballpark the suspend code to be ~400 bytes (assuming this code is ARM--I don't see anything that indicates the code is THUMB).

 

I would guess it'a good practice to avoid the 1st page of the OCM.  Therefore, I will try to constrain my usage of OCM to start at 0xFFFC400

0 Kudos
Adventurer
Adventurer
7,503 Views
Registered: ‎08-05-2012

Re: Reset one of the Zynq APUs

Jump to solution
Hi John, I am using your changes to boot.S for USE_AMP=1. But I still do not understand why you have to turn off L2 cache for CPU1's memory. Since CPU1 and CPU0 do NOT overlap in RAM, can't the MMU HW (which has already been initialized by Linux on CPU0) also service CPU1's memory address? I cannot find anything in TRM that explains why this won't work. Please excuse if this should be obvious...
0 Kudos
Xilinx Employee
Xilinx Employee
7,487 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution

L2 cache is a shared resource. If cpu1 decided to flush L2, it could flush L2 that contains cpu0 code. There is also a very specific order in which L1/L2 cache needs to be flushed and this order could be broken if both cpu0 and cpu1 tried to flush L2 at the same time. There is a document somewhere on ARMs website that describes the ordering of L2 actions.

 

You are correct that cpu0 and cpu1 application do not overlap in RAM. But, take the example where cpu1 is not running, and cpu0 has been running for a long time. The L2 cache could be full of cpu0 code. Now cpu1 starts to run and it will try to use L2 cache for some of its code. With coherency enabled, the SCU will correctly manage L2. The real problem is if cpu1 tries to control L2.

 

I have never tried it, but you can lock down the L2 cache such that half is used only for cpu0 and the other half is only used for cpu1.

0 Kudos
Xilinx Employee
Xilinx Employee
7,484 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution

The OCM can be mapped to high or low memory in 64KB chunks. Take a look at the 'Address Map' section in the TRM.

 

By default, the bootrom leaves the first three 64KB of OCM at low memory (0x00000000-0x0002FFFF) and maps the fourth 64KB to 0xFFFF0000-0xFFFFFFFF. The FSBL continues to use this mapping and runs from the low 3x64KB and places its stack in the high 64KB. Also, the bootrom will send cpu1 to a wfe loop up around 0xFFFFFF20.

 

So, unless your BSP/OS/whatever remaps the low 3/4 of OCM to high memory, there is no guarantee that OCM will be available at 0xFFFC00000.

0 Kudos
Xilinx Employee
Xilinx Employee
7,484 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution

That is a hard question to answer without more details.

 

Make sure cache isn't getting in the way. If you are using the catch method, I presume you are running baremetal on both CPUs.

 

You also need to make sure the initial vector table, from the FSBL, is located at 0x00000000.

 

To debug, you can use SDK system debugger to connect to both cpus and then pause cpu1. After cpu0 runs the reset code, single step cpu1 and make sure it is running valid code from the reset vector.

0 Kudos
Adventurer
Adventurer
7,424 Views
Registered: ‎08-05-2012

Re: Reset one of the Zynq APUs

Jump to solution

Thanks for the reply John, I am running CPU1 with L2 disabled for now.  But the application seems to run pretty slow.  My crude test is to toggle the MIO7 LED on the Zedboard (the one right next to the OLED panel) after delaying for 10M cycles in the for loop.  I see something like 3~4 blinks per second, which would mean CPU1 is running 6~8 times 10M delay cycles per second.  Naively, wouldn't it mean CPU1 is executing < 100M instructions/sec?  I suppose the possiblities are:

  1. CPU1 is running slower than CPU0?
  2. The delay code doesn't fit in L1 cache (can't be!) so CPU1 is stalling?

Have you ever measured the AMP C1 performance?

0 Kudos
Xilinx Employee
Xilinx Employee
7,367 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution

I haven't measured C1 performance. I agree that the delay code should fit in L1 cache.

 

Something else worth looking into is the S (share) attribute in the MMU. If the bit is set, then the specific address range will be activating the coherency logic. Take a look at boot.S for xapp1078 for 2014.4. I made some changes to disable sharing and outer cache.

 

Or, the simplest method is to modify translation_table.S to disable sharing and outer cache.

 

I've seen mixed reports of why to disable SMP so it is currently enabled. You can disable by editing boot.S and commenting out the line:

    orr    r0, r0, #(0x01 << 6)        /* set SMP bit */

 

 

0 Kudos
Adventurer
Adventurer
7,365 Views
Registered: ‎08-05-2012

Re: Reset one of the Zynq APUs

Jump to solution

Thank you for the reply John,

I believe I already picked up your latest mod through the wiki page.  I know that your boot.S was targeting 1 GB DRAM, so I just changed the numbers a little for Zedboard's 512 MB.  I am very sure that for the standalone_bsp_1's boot.S, I marked the 1st 510 MB (for Linux) as reserved, and left L2 cache for the remaining 2 MB as off.  But I'll check again and make sure that the MMU setting is non-shared for that last 2 MB.

 

I was wondering what that SMP bit is for; would you mind showing me the links that argue the benefit of turning it off?

0 Kudos
Xilinx Employee
Xilinx Employee
7,359 Views
Registered: ‎02-01-2008

Re: Reset one of the Zynq APUs

Jump to solution

It was a while ago when I dug around regarding the SMP bit. Try googling. There are forums that discuss it and docs from ARM. I believe it was one of ARM's forums that mentioned the SMP bit can still be usefull in an AMP environment because it can assist with AMP cpus accessing a common peripheral.

0 Kudos
Adventurer
Adventurer
7,256 Views
Registered: ‎08-05-2012

Re: Reset one of the Zynq APUs

Jump to solution

Thanks John,

Just now I successfully detached myself from the Xilinx BSP library, and ran bare metal C++ code on CPU1 to get a better feel for how many operations it is doing per second.  It is still not as accurate as using a hardware timer and counting CCNT (which I started to do but I have not yet set up bare metal timer interrupt yet), but gave me some idea.  I used a delay loop just like before, but since I wrote the GPIO bit-bang code just now, I knew roughly how many assembly statements CPU1 was running through.  Running the debug code (no-inlining, -g, -O0), I counted around 63 assembly statements.  Since Each 5 second period blink consisted of 0x2000000 delays, I figured it's 5E6 usec/0x2000000 loops = 0.149 usec/loop = 0.149 usec / 63 assembly statements, yielding above 400 M assembly statements per second.  Since Zynq CPU is "800 MHz"--is that the clock?, I figured I am in the ballpark.  With pipelining, I expected to get > 800 assembly statements/second, but if I am off by 2, I don't think there is a massive L1 cache starvation.

 

I think the takeaway is that for small code application, the performance will be close to the theoretical maximum (everything runs out of L1).  In case anyone wonders if any practical application is THAT small, I am building the QP Dining Philosopher Problem example, and the code size is about 40 KB--even for the debug version, as you can see:

 

arm-xilinx-eabi-size cpu1app.elf |tee "cpu1app.elf.size"

text data bss dec hex filename
42480 32 9148 51660 c9cc cpu1app.elf

0 Kudos
Adventurer
Adventurer
6,626 Views
Registered: ‎08-05-2012

Re: Reset one of the Zynq APUs

Jump to solution
Hi John, I am debugging the OCM communication. I find that sometimes CPU0 does not see what CPU1 wrote. I am reading your app note for xapp1079, and cannot find the code for where you "configures the MMU to disable cache for OCM access in the range of 0xFFFF0000 to 0xFFFFFFFFF...". In boot.S, you enable L1 only for 0x1FE00000 to 0x1FFFFFFF, but I don't see where you do anything with OCM address range...
0 Kudos