UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Participant lar_fer
Participant
422 Views
Registered: ‎04-25-2017

R5 CPU clock frequency

Hello,

I'm facing a issue with the actual R5 clock frequency in our system. We have an MPSoC with a PS_REF_CLK of 50 MHz and the following values for IOPLL_CTRL and CPU_R5_CTRL.

For IOPLL_CTRL (0xFF5E0020) the value is 0x00013C00, which as far as I understand from the register description, means that the PS_REF_CLK is multiplied by 60 and divided by 2. A total of 1500 MHz

For CPU_R5_CTRL (0xFF5E0090) the value is 0x03000302. This means that the source clock is IOPLL and a division by 3 is applied, resulting in a 500 Mhz frequency for the R5 core.

The attached image captured from vivado project is aligned with the avobe.

With this configuration, y create a simple R5 baremetal application where switch on and off a led driven by a gpio. In the middle of the led on and led off statements I place 100 "nop" instructions. No optimization is performed by the compiler and I checked that the generated dissasembly code has 100 nops. When I run the code and measure the time between the on and off operations with an oscilloscope I get that it takes 2 us to perform the 100 nops, which means that the actual R5 CPU clock is 50 MHz, not 500 MHz.

I cannot figure out why I am getting that CPU speed given the configuration we have. Any help would be appreciated.

vivado_clock_output.png
0 Kudos
11 Replies
Xilinx Employee
Xilinx Employee
340 Views
Registered: ‎09-01-2014

Re: R5 CPU clock frequency

GPIO is written by the 100MHz clock, so you should not use this method.
Just use a timer to measure the loop in your C code will more precise.
0 Kudos
Participant lar_fer
Participant
295 Views
Registered: ‎04-25-2017

Re: R5 CPU clock frequency

Hello,

Thanks for your answer. As you suggested, I used a timer to measure the time that takes 1000 nop intrsuctions in my system. I used a xilinx axi timer with a 50 MHz (measured with the oscilloscope) to count the time.

The code has 1000 nops and the counter reaches 1025 counts (25 are needed for the axi read and write operations of the timer core registers) every time.

This means that the actual frequency of the R5 is 50MHz.

There is still some configuration I am missing and I don't know where. I want the R5 with a 500MHz clock. Any idea?

regards,

Alex.

0 Kudos
Scholar drjohnsmith
Scholar
282 Views
Registered: ‎07-09-2009

Re: R5 CPU clock frequency

Jumping in as I have a interest in this in the near future, 

   sorry no answer,

 

Interesting, does it scale if you go fror 1000 nops to 10 000 nops ?

0 Kudos
Participant lar_fer
Participant
272 Views
Registered: ‎04-25-2017

Re: R5 CPU clock frequency

Hello,

The result scales from to 100 nops to 1000 nops. Those are the two tests I have done.

 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
259 Views
Registered: ‎09-01-2014

Re: R5 CPU clock frequency

What’s the actual frequency of R5 showing in your PCW clock configuration? Is it 500MHz?

I think nop is not always ruining with the same cycle.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489h/Cjafcggi.html

you can use clock monitor to measure the R5 frequency.
Clock Monitor Programming Example
http://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
Participant lar_fer
Participant
200 Views
Registered: ‎04-25-2017

Re: R5 CPU clock frequency

Hello,

Thanks for the reply.

The PCW clock configuration shows a 500 MHz frequency for the R5. I attach a image captured from vivado.

About the "nop" not always running with the same cycle, it's diffucult for me to test this right now. The results scaled quite well from 100 nops to 1000 nops.

About the clock monitoring. Thanks a lot for pointing me to this feature. I read the documentation in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf and found it really helpfull. I did the example test in page 1109 "clock monitor programming example" and worked (same data as the example applies in our platform; PS_REF_CLK = 50MHz and APB LPD bus clock is 100 MHz).

I coded a very simple app in SDK 2018.3 from the baremetal helloworld template for the R5: 

#include <stdio.h>
#include "platform.h"
#include "xil_printf.h"

#include <bitops.h>

#define CLKA_MUX_CTRL				GENMASK(3,1)
#define LPD_LSBUS_CLK				(BIT(0) | BIT(1)) << 1
#define RPU_CLK						0 << 1
#define	PS_REF_CLK_SRC				BIT(5)
#define CLK_MON_ENABLE				BIT(0)
#define CLK_MON_START_SINGLE		BIT(8)

#define CLR_APB_BASEADDR			0xFF5E0000
#define CLKMON_ENABLE_OFFSET		0x148
#define CHKR0_CLKA_UPPER_OFFSET		0x160
#define CHKR0_CLKA_LOWER_OFFSET		0x164
#define CHKR0_CLKB_CNT_OFFSET		0x168
#define CHKR0_CTRL_OFFSET			0x16C


int main()
{
	uint32_t *p_clr_apb = (uint32_t *)(CLR_APB_BASEADDR + CHKR0_CTRL_OFFSET);

	init_platform();

	/* program the clock sources */
	*p_clr_apb = (LPD_LSBUS_CLK & CLKA_MUX_CTRL) & (~PS_REF_CLK_SRC);

	/* program the counter values */
	p_clr_apb = (uint32_t *)(CLR_APB_BASEADDR + CHKR0_CLKB_CNT_OFFSET);
	*p_clr_apb = 0x4000000;
	p_clr_apb = (uint32_t *)(CLR_APB_BASEADDR + CHKR0_CLKA_UPPER_OFFSET);
	*p_clr_apb = 0x8000010;
	p_clr_apb = (uint32_t *)(CLR_APB_BASEADDR + CHKR0_CLKA_LOWER_OFFSET);
	*p_clr_apb = 0x7FFFFF0;

	/* prime the pump */
	p_clr_apb = (uint32_t *)(CLR_APB_BASEADDR + CHKR0_CTRL_OFFSET);
	*p_clr_apb |= CLK_MON_ENABLE;
	/* start the clock monitor */
	*p_clr_apb |= CLK_MON_START_SINGLE;

	printf("Done\n\r");

	while (1);

	cleanup_platform();
	return 0;
}

I tested the APB LPB bus clock and it's ok. I load a 0x4000000 value for the clk b (50 MHz) so the actual count in the monitored clock (100 MHz) should be double (from 0x8000010 to 0x7FFFFF0). Checking the 0xFF5E0140 register i see that the count is ok. The bit 0 remains 0.

Then I changed the monitored clock to the RPU, wich is supposed to be 500 MHz. I made minor changes to the app. Like this:

#include <stdio.h>
#include "platform.h"
#include "xil_printf.h"

#include <bitops.h>

#define CLKA_MUX_CTRL				GENMASK(3,1)
#define LPD_LSBUS_CLK				(BIT(0) | BIT(1)) << 1
#define RPU_CLK						0 << 1
#define	PS_REF_CLK_SRC				BIT(5)
#define CLK_MON_ENABLE				BIT(0)
#define CLK_MON_START_SINGLE		BIT(8)

#define CLR_APB_BASEADDR			0xFF5E0000
#define CLKMON_ENABLE_OFFSET		0x148
#define CHKR0_CLKA_UPPER_OFFSET		0x160
#define CHKR0_CLKA_LOWER_OFFSET		0x164
#define CHKR0_CLKB_CNT_OFFSET		0x168
#define CHKR0_CTRL_OFFSET			0x16C


int main()
{
	uint32_t *p_clr_apb = (uint32_t *)(CLR_APB_BASEADDR + CHKR0_CTRL_OFFSET);

	init_platform();

	/* program the clock sources */
	*p_clr_apb = (RPU_CLK & CLKA_MUX_CTRL) & (~PS_REF_CLK_SRC);

	/* program the counter values */
	p_clr_apb = (uint32_t *)(CLR_APB_BASEADDR + CHKR0_CLKB_CNT_OFFSET);
	*p_clr_apb = 0x4000000;
	p_clr_apb = (uint32_t *)(CLR_APB_BASEADDR + CHKR0_CLKA_UPPER_OFFSET);
	*p_clr_apb = 0x28000100;
	p_clr_apb = (uint32_t *)(CLR_APB_BASEADDR + CHKR0_CLKA_LOWER_OFFSET);
	*p_clr_apb = 0x27FFFF00;

	/* prime the pump */
	p_clr_apb = (uint32_t *)(CLR_APB_BASEADDR + CHKR0_CTRL_OFFSET);
	*p_clr_apb |= CLK_MON_ENABLE;
	/* start the clock monitor */
	*p_clr_apb |= CLK_MON_START_SINGLE;

	printf("Done\n\r");

	while (1);

	cleanup_platform();
	return 0;
}

If I load a 0x4000000 count int the 50 MHz clock and the monitored clock is 500 MHz y have to multiply x10. That's why now the upper and lower values look like this. This example works, which would mean that the actual RPU clock is 500 MHz. But still ( sorry for my stubbornness ) I feel (know) that the code running in the R5 is not working at 500Mhz. If anyone knows what I'm missing I would really appreciate. I'm stuck regards, Alex.

Participant lar_fer
Participant
180 Views
Registered: ‎04-25-2017

Re: R5 CPU clock frequency

The effect of what I'm seeing is as if I would have enbaled the bypass por the PLLs as shown in the MPSoC manual page 1098. I attach a picture.

But when I check if the PLL sourcing the RPU clock is bypassed, the register read says that it is not bypassed

 

plls.png
0 Kudos
Xilinx Employee
Xilinx Employee
167 Views
Registered: ‎09-01-2014

Re: R5 CPU clock frequency

500 MHz is measured by the clock monitor, it should be no problem. Maybe the interconnect or the memory are using the lower frequency.
Please check if the actual frequency matches the requested freq from PCW clock configuration(Advance clocks -> Low Power Domian -> Interconnect and Switch clocks)
Participant lar_fer
Participant
135 Views
Registered: ‎04-25-2017

Re: R5 CPU clock frequency

Thanks for the reply. I attach a capture of the Advance clocks -> Low Power Domian -> Interconnect and Switch clocks. They are all at the max frequency.

I have made some other tests with the test software with nops and I finally discovered that I actually get the proper speed only when I place the code in the TCMs. I just changed the linker script and it started working at the desired speed. Now it all makes sense.

The memoru sections I have are:

MEMORY
{
   psu_ocm_ram_0_MEM_0 : ORIGIN = 0xFFFC0000, LENGTH = 0x40000
   psu_qspi_linear_0_MEM_0 : ORIGIN = 0xC0000000, LENGTH = 0x20000000
   psu_r5_0_atcm_MEM_0 : ORIGIN = 0x0, LENGTH = 0x10000
   psu_r5_0_btcm_MEM_0 : ORIGIN = 0x20000, LENGTH = 0x10000
   psu_r5_ddr_0_MEM_0 : ORIGIN = 0x100000, LENGTH = 0x7FE00000
   psu_r5_tcm_ram_0_MEM_0 : ORIGIN = 0x0, LENGTH = 0x40000
}

the .text section was in ddr and it seems that running code from ddr in the R5 is extremely slow!

I tried also OCM and is much faster than DDR, but slower than TCM (as expected)

 

Now I'm facing the problem that I want a FreeRTOS App running in the TCMs, but just a simple helloworld code is bigger than the size of the TCM

mpsoc_lpd_clocks.png
0 Kudos
Xilinx Employee
Xilinx Employee
72 Views
Registered: ‎09-01-2014

Re: R5 CPU clock frequency

DDR's latency is much bigger than TCM.
Here is the result for your reference.
R5:TCM 1 Cortex-R5 cycle
R5:OCM 18 Cortex-R5 cycles
R5:DDR 45 Cortex-R5 cycles

Do you enable the L1 cache?
0 Kudos
Participant lar_fer
Participant
38 Views
Registered: ‎04-25-2017

Re: R5 CPU clock frequency

Hi,

Thanks for the reference, is really helpful.

Yes, startup code is enabling L1 cache. If I understood correctly, R5 has a dedicated instruction cache of
32 KB configured with a cache line of 8-word.

Taking into account the part of the 'nops' where we are trying to calculate I still don't understand why we are
getting TCM going up 10x faster than DDR.

So my point is, and correct me if I am wrong, please.

With 8-word cache line, the block size is 32 bytes. This means every miss in instruction cache will get a 32 bytes
block from DDR memory. Each block can hold 8 instructions. So we are going to have one miss every eight instructions (1/8 = 0.125).

Lets say that:

- Cp: processor cycles
- Cw: stall cycles.
- T: clock period.
- IC: instruction count
- ni: number of accesses to instructions.
- m: failure rate (1/8 = 0.125)
- Pm: penalty for failure (Penalty cycles for DDR = 45)

With this said, we get in the best case which is to use TCM:

Tcpu = (Cp + Cw) * T = (IC * CPI + 0) * T = IC * T

For the DDR we'll get penalties for failure, so we'll get:

Cw = IC * ni * m * Pm = IC * 1 * 0.125 * 45 = 5.625 * IC

and the Tcpu should be:

Tcpu = (IC * 1 + IC * 5.625) * T = 6.625 * IC * T

So the response should be 6.625 times faster using the TCM instead of DDR which is faster than the 10x I am getting, because even I have to
add here the time to activate the gpio etc it is really far of the theorically expected.

It is obvious I am not understanding why I am not getting the expected performance.

Thanks in advance for clarification.

0 Kudos