UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor jan.h
Visitor
733 Views
Registered: ‎04-04-2018

Low overhead communication from A53 to peripheral on Zynq Ultrascale+

Hi,

I am developing an instrumentation technique where I want to send information from lots
of places in an application with very low overhead to a peripheral where the peripheral does
some processing on the information. I am doing this on a Zynq Ultrascale+. I need to
communicate two 64b values at a time to the peripheral for which I use a stp (store pair)
instruction.

The peripheral has a 128b wide AXI full interface and is connected via AXI SmartConnect
to the PS block. The peripheral, the A53 cores, and AXI SmartConnect all run at the same
clock frequency.

I measured that it takes about 10 cycles to execute a stp that writes two values to the
peripheral. This is a bit long for the purpose that I want to use it for. The device driver
of the peripheral uses pgprot_noncached() to mmap the registers of the peripheral in
user address space. pgprot_noncached() uses the DEVICE_nGnRnE memory attributes of ARM v8a.
I also tried other memory attribute settings: DEVICE_nGnRE gives the same 10 cycles,
DEVICE_nGRE gives the strange result of 61.3 cycles per stp, and DEVICE_GRE causes that
most of the data stored by stp does not arrive at the peripheral.

I was hoping that DEVICE_nGnRE would give better results than DEVICE_nGnRnE because
then the A53s should not wait on a write acknowledgement from the peripheral.

Does anybody have suggestions on how I can reduce the 10 cycles that it takes to execute
a stp to the peripheral? Is there a better configuration of the A53 cores, PS system
or interconnect possible? Or is the reason that DEVICE_nGnRE is not performing
better than DEVICE_nGnRnE a limitation of A53 and are more advanced v8a cores doing
better?

Regards,
Jan.

0 Kudos
4 Replies
Scholar drjohnsmith
Scholar
709 Views
Registered: ‎07-09-2009

Re: Low overhead communication from A53 to peripheral on Zynq Ultrascale+

its probably because your jumping bus's, hence you traverse the clock domain changer and the fifos'

 

theres no real way of getting around this if you are doing a read modify write type operation,

 

Cache coherency,  DMA etc can hide things, but your crossing the clocks of the Ghz processor to the slower internal bus, 

     its going to have latency.

 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Visitor jan.h
Visitor
705 Views
Registered: ‎04-04-2018

Re: Low overhead communication from A53 to peripheral on Zynq Ultrascale+

I am only doing stores to the peripheral. So no loads.

 

My expectation of DEVICE_nGnRE is that the A53 would sent the data to be stored over the AXI interconnect and directly continue executing successive instructions without waiting from an acknowledgement of the target. Apparently it still waits approx. 10 cycles.

 

Information from ARM:

 

"Early Write Acknowledgement (E or nE)

This determines whether an intermediate write buffer between the processor and the slave device being accessed is allowed to send an acknowledgement of a write completion. If the address is marked as non Early Write Acknowledgement (nE), then the write response must come from the peripheral. If the address is marked as Early Write Acknowledgement (E), then it is permissible for a buffer in the interconnect logic to signal write acceptance, in advance of the write actually being received by the end device. This is essentially a message to the external memory system."

 

0 Kudos
Scholar drjohnsmith
Scholar
702 Views
Registered: ‎07-09-2009

Re: Low overhead communication from A53 to peripheral on Zynq Ultrascale+

Uhm, 

 

beyond me I'm afraid, 

 

I'm interested if you get a response though,

 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Xilinx Employee
Xilinx Employee
681 Views
Registered: ‎02-01-2008

Re: Low overhead communication from A53 to peripheral on Zynq Ultrascale+

I haven't looked closely at the address space attributes on the A53 but I will try to throw a few comments at what might be happening.

 

  1. since you are probably writing to the same address space, the mmu may be waiting for the previous access to complete. But you may see a second access start (AWVALID/READY) before the first access completes with BRESP.
  2. Hopefully the slave is AXI4 in order to support 'outstanding' accesses
  3. the axi interconnect does have the capability to buffer accesses and to buffer bursts and may provide early BRESP
  4. datawidth conversion will throw away axi ID and may become a limitation
  5. ordering may cause back preasure if there are multiple outstanding axi transactions

Do you have a screen capture of the axi signals that can be shared?

0 Kudos