cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
joe306
Scholar
Scholar
800 Views
Registered: ‎12-07-2018

How Fast to Toggle an Pin

Jump to solution

Hello, I have a Zynq Ultrascale+ MPSOC FPGA. The PS_REF_CLK is 33.33Mhz. The ARM A53 Cores are running at 1.2Ghz and the R5 Cores are running at 533.33Mhz. I wanted to see how fast can I toggle a pin connected to the PS side and also toggle a pin connected to the PL side. In both cases the ARM cores would toggle the pin.

My code looked like this:

while(1)
{
set pin high
usleep_A53(delay_time)
set pin low
usleep_A53(delay_time)
}

or

while(1)
{
set pin high
usleep_R5(delay_time)
set pin low
usleep_R5(delay_time)
}

In order to toggle the PL side pin I need to go through an AXI interface. The AXI interface is running a 200Mhz.

I used a o-scope to measure the high time and low time.

Here are my results:

DelayTime.jpg

I expected the PS side pin to do better because I am not going through any AXI interfaces and the pin is a GPIO pin on the PS Side.

In both cases the measured data is very close but always seem to be 600ns more. I can't explain this and also when the delay is 200us the pulse is not over 200us but under at 192us. So I'm thinking that my usleep_A53() is not linear or I just don't understand how it works.

The purpose of this experiment is to help me understand data latency of the ARM cores.

Does anyone know why the data is showing the usleep() is taking longer?

Thank you

Joe

0 Kudos
1 Solution

Accepted Solutions
joe306
Scholar
Scholar
403 Views
Registered: ‎12-07-2018

Thank you the suggestion. I think I've confused some people. I first wanted to toggle a pin and now I'm only trying to read a register on the PL side. I used a AXI GPIO IP and set it for inputs. I then have a counter that I write to. Here is my code:

Data_latency.jpg

Here I'm only grabbing the Global Timer Value, doing a AXI read, grabbing the Global Timer Value. I then take the difference of the two Global Timer values and it should be the number of CPU clocks. 

When I get back in the office I can show some ILA plots. 

Respecfully,

Joe

 

 

 

 

View solution in original post

0 Kudos
15 Replies
joe306
Scholar
Scholar
783 Views
Registered: ‎12-07-2018

Here is the sleep function that I am using:

Timer_Function.jpg

0 Kudos
joancab
Teacher
Teacher
766 Views
Registered: ‎05-11-2015

A few things to bear in mind when doing these things with the PS:

- Each of your functions (turn pin on, off, call sleep function) implies an overhead. At 500M / 1 GHz, I suppose it will be a bunch of ns, but stil there. Also the while adds some overhead. That could explain at least part of the 100s of ns you observe.

- The sleep function with the argument in usec may not be accurate to the ns.

- From the software action to turn the pin on and off (writing a register) to the pin physically changing, I would expect another delay, presumably similar in the rise and fall cases. If not, that would distort the frequency and duty cycle.

- PS and PL GPIO drivers are frequency limited, I would expect them to respond up to 100 - 300 MHz, especially the PL ones, but definitely are below the software frequencies.

- Routing an EMIO to PL IOs adds latency but shouldn't affect the frequency.

- If you run that with an OS (Linux, FreeRTOS) then there are interrupts every now and then that will add jitter to any timing done that way. The way to do things in a timely manner is to use a timer and its interrupt, even though, if the interrupt is triggered when the CPU is servicing another interrupt, it will be delayed as well. 

For a precise toggling of a pin, I would use a PL timer, maybe loaded by the PS with AXI-Lite, etc. otherwise it will hoard all the precious capabilities of the mighty cores. 

 

joe306
Scholar
Scholar
740 Views
Registered: ‎12-07-2018

Hello, and thank you for responding to my message. I will try to implement your suggestions.

Below is the code to toggle a AXI_GPIO pin.

PL Side Pin Control CodePL Side Pin Control Code

Below is the code to toggle a PS Side GPIO Pin

PS Side Pin Control CodePS Side Pin Control Code

 

0 Kudos
joe306
Scholar
Scholar
737 Views
Registered: ‎12-07-2018

Hello, are you aware of any documentation that can help me determine PS-PL Register read/write latency? That is my goal with this experiment. I also want to show how deterministic the ARM cores are.

Thank you

0 Kudos
joancab
Teacher
Teacher
733 Views
Registered: ‎05-11-2015

I honestly think it's a waste of time. You want the cores for something else than toggling a pin. And when you do that something else you can't run your while to toggle a pin. So it becomes exclusive. Processors do many things "at once" by quickly switching between tasks. Maybe with FreeRTOS and a quick main interrupt you can harmonize a fast toggling (few us) with an acceptable performance at the other tasks, but the faster you want the toggling the more CPU is wasted in switching tasks.

joancab
Teacher
Teacher
731 Views
Registered: ‎05-11-2015

I would say registers are read/ written in 1 clock cycle, but the register is not the physical pin, you write to a bunch of silicon that translates bits into transistor states and that is something I've never seen or heard of... why do you want that? I may understand you want the delay between two electrical signals but that delay is between software inside the core and the external world, what for? Trying to time software can make some sense for very simple applications in very simple microcontrollers, like 8-bit type from the 80s, but not in a mighty A53 that you use it for many things and with complex software (many execution paths) so the lapse between an input and its output is variable.

0 Kudos
joe306
Scholar
Scholar
702 Views
Registered: ‎12-07-2018

Hello, thank you for responding to my message. It does not have to be a pin. It could be a register on the PL side that I want to read. I want to know how many cycles it would take to do one register read. I only used the output pin so I could measure something.

Do you know how long that would take and how would I determine or calculate that time?

Thank you

0 Kudos
dpaul24
Scholar
Scholar
692 Views
Registered: ‎08-07-2014

@joe306 ,

Do you know how long that would take and how would I determine or calculate that time?

You can start a timer just before any register read or GPIO read and then stop the timer immediately after it. That timer value will give you the number of cycles the ARM  core uses to fetch that value from the PL. Now from the datasheet you know at what freq the ARM core is operating. Now you can calculate the delay.

------------FPGA enthusiast------------
Consider giving "Kudos" if you like my answer. Please mark my post "Accept as solution" if my answer has solved your problem
Asking for solutions to problems via PM will be ignored.

0 Kudos
watari
Teacher
Teacher
659 Views
Registered: ‎06-16-2013

Hi @joe306 

 

I suggest you to make sure cpu cycle by ex. assemble code and AXI transaction latency (time) by System ILA.

You might understand it.

 

Best regards,

0 Kudos
joe306
Scholar
Scholar
653 Views
Registered: ‎12-07-2018

Hello, thank you very much for responding to my post. How does this look:

Data_latency.jpg

Here I have a register at address location,"XPAR_AXI_GPIO_COUNTER_BASEADDR" located in the axi_gpio_counter block in the diagram below.

AXI_READ.jpg

If I have things right, I used a Global Timer to capture the time before the read and after the read. The difference is 12. Now I multiply by 1/1.2Ghz.

Which is 10 nano seconds.

Does this sound right?

The ARM A53 are running at 1.2GHz.

Thank you very much,

Joe

0 Kudos
joe306
Scholar
Scholar
640 Views
Registered: ‎12-07-2018

Will do. Thank you

0 Kudos
dgisselq
Scholar
Scholar
579 Views
Registered: ‎05-21-2015

@joe306 ,

I did a study very similar to this some time ago.  I started with the bus speeds reported for both MicroBlaze and ARM GPIO toggle speeds, but then dug in much deeper into how fast a CPU can toggle a GPIO pin using an open sourced ZipCPU--something where all of the latencies could be documented.  You might find reading that report valuable to understanding what's going on here.

Dan

joe306
Scholar
Scholar
473 Views
Registered: ‎12-07-2018

Thank you. I'm also trying to determine how long it takes the ARM to read a register on the PL side. Above I show it took 10nsec or 15 clock cycles. Does that sound right?

 

Thank you

0 Kudos
dgisselq
Scholar
Scholar
459 Views
Registered: ‎05-21-2015

@joe306 ,

No, it doesn't sound right at all.  Xilinx's GPIO takes 4 cycles, and the interconnect will take a minimum of another 2 if not 3.  Since you said your AXI interface was running at 200MHz, that's a minimum time of 30ns in the programming logic alone.

Within the ARM, you'll need a minimum of another two clock ticks for the hard interconnect, and another four (if not six) to cross clock domains--and this is all from the time the STOre instruction is issued.  These will be at the ARM's clock frequency, plus another two clock ticks (or three) to cross clock domains to the PL's clock frequency.

Worse, your time function is ... well, it's not going to do what you are hoping.  If you are lucky, you might run one instruction per ARM clock cycle, and 2-5 clock cycles for each branch.  Knowing how much that is will require you digging into the assembly.  There, you'll discover that your time checking while loop has a resolution of several clock cycles--perhaps about 10 at best.  Just to test this, try adjusting the wait time by the smallest possible increment, and see how many wait times end up averaging to the same time increment.  Similarly, the jump to the time-checking loop may cost you anywhere between 2-5 cycles, and another 2-5 cycles to return--assuming all of your logic fits within the cache.  (I'm guessing here from my own experience in processor design, based upon the article cited above and the lessons learned within it.  I have no knowledge of the internals of the ARM processor ...)

Personally, I'd remove the microsecond wait timer and just look at how fast I could toggle the pin in general.  The difference between the ON and OFF time of the timer should teach you about the number of cycles required by the loop, and the number of cycles required to go from one STOre instruction to the next.  Even better, I'd place a logic analyzer on the pin's output, and a bus analyzer on the bus outputs of the ARM, and I'd measure the time from one response to the next request, and from one request to its associated response.  Only after characterizing things without any delays would I consider adding them back in.

Basic engineering principles apply here: First, minimize the number of unknowns in your process.  Keep things simple.  Make your measurements as simple as possible.  Only once you understand the simple and the basic does it make sense to try to expand into the more complex--such as your timing loop.

Dan

joe306
Scholar
Scholar
404 Views
Registered: ‎12-07-2018

Thank you the suggestion. I think I've confused some people. I first wanted to toggle a pin and now I'm only trying to read a register on the PL side. I used a AXI GPIO IP and set it for inputs. I then have a counter that I write to. Here is my code:

Data_latency.jpg

Here I'm only grabbing the Global Timer Value, doing a AXI read, grabbing the Global Timer Value. I then take the difference of the two Global Timer values and it should be the number of CPU clocks. 

When I get back in the office I can show some ILA plots. 

Respecfully,

Joe

 

 

 

 

View solution in original post

0 Kudos