cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Participant
Participant
11,856 Views
Registered: ‎11-06-2007

Latency on AXI Interconnect

Hi, 

I have created a system with one master and one slave on AXI Interconnect. Slave here is a Block RAM connected to the AXI interconnect using AXI BRAM controller.

 

With asynchronous AXI master and interconnect clocks, the write and read transactions latency is around 42 clocks. With this configuration the system runs on a validation platform we have (with master on ZYNQ FPGA and interconnect and slave on Virtex 7 FPGA).

 

By changing clock settings to synchronous, the latency in simulation reduced to 6 clocks but the system hangs on the validation platform. 

 

Are there any recommended settings for AXI interconnect for a synchronous clocks system? 

 

Note that I am using Plan Ahead XPS 14.5 tool to generate the system.

 

Regards,

Bhargava

0 Kudos
13 Replies
Highlighted
Xilinx Employee
Xilinx Employee
11,850 Views
Registered: ‎08-01-2008

You may look into PG085 clocking section.
http://www.xilinx.com/support/documentation/ip_documentation/axis_infrastructure_ip_suite/v1_0/pg085-axi4stream-infrastructure.pdf
Thanks and Regards
Balkrishan
--------------------------------------------------------------------------------------------
Please mark the post as an answer "Accept as solution" in case it helped resolve your query.
Give kudos in case a post in case it guided to the solution.
0 Kudos
Highlighted
Teacher
Teacher
11,835 Views
Registered: ‎07-09-2009

are you using the AXI stream interface or the lite / full version of the memory maped type ?

 

I hve a customer who has found very slow read / writes to the PL from the PS side of a zynq

   seems that setting up and executing a single access ps to pl is slow, its only the bursts that give high speed, once they are set up. 

 

I'm interested in your results, 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Highlighted
Scholar
Scholar
11,829 Views
Registered: ‎02-27-2008

Note that transfers to/from DDR memory,

 

will experience 'slow' times while the memory changes row or column addresses, or refreshes.  So burst are fast, and sequential access is generally fast, but random access can be very slow.

 

As streaming is generally sequential, it is generally fast, but there is still overhead for the DDR address changes, and refresh cycles.

 

This all assumes perfect software, and all settings are correct.

 

 

Austin Lesea
Principal Engineer
Xilinx San Jose
0 Kudos
Highlighted
Participant
Participant
11,817 Views
Registered: ‎11-06-2007

I am using AXI4 protocol. Do you have any idea as to why the PS to PL single access is very slow? 

0 Kudos
Highlighted
Participant
Participant
11,816 Views
Registered: ‎11-06-2007

Hi Austin,

 

I am not using DDR memory. Access from ZYNQ PS is to a block RAM on Virtex 7 FPGA. Still we see nearly 42 clocks latency for memory write or memory read.

0 Kudos
Highlighted
Teacher
Teacher
11,808 Views
Registered: ‎07-09-2009

my client is seeing the same sort of numbers,

 

the hello world program running on the zynq

   a single gpio register on the pl side,

       

nothing else inthe design,

  

no ddr, no os running, 

    just a poiunter read and write to the PL gpio from the arm in the ps,

 

interesting all the examples I have seen on the sites are for streaming between the ps and pl, 

     veyr little examples of speed of accessing single registers / memory on the PL side from the PS.

 

to put things in context,

   the Virtex PPC version of the design  with the old IBM interconnect can access the gpio faster than the 666 MHz ARM 9 in the zynq can ..

 

 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Highlighted
Scholar
Scholar
11,797 Views
Registered: ‎02-27-2008

0 Kudos
Highlighted
Teacher
Teacher
11,789 Views
Registered: ‎07-09-2009

thanks austin

 

the link poinrts to a ddr system , or have I read that wrong.

 

neither of the systyems here use ddr,

 

hello world runs out of on chip memory on the zynq arm, 

   and the pointer access to the pl side register is the only user code,

 

its the access from the ps to the pl that seems to be taking forever.

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Highlighted
Teacher
Teacher
11,788 Views
Registered: ‎03-31-2012

Your comparison is not one to one. In Zynq, there are some GPIO bits on the PL/PS interface. Connect some of these to FPGA IOs and measure how fast Zynq is able to toggle them. This is a test similar to PPC test. If you have an AXI GPIO, you are also going through the Zynq AXI master and the AXI slave to get to the toggling pin which adds quite a bit of latency. You can help this a little bit by implementing the AXI GPIO slave better but I am not sure how much. More characterization is needed.
- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Highlighted
Teacher
Teacher
4,960 Views
Registered: ‎07-09-2009

Hi muzaffer

 

happy new year to you,

 

re power pc / zynq comparison

   The PLB bus on th epower pc, goes to a slower bus via a bridge, and then to the gpio,

        ( cant remeber the name of the slower bus, was it OPB ? )

 

both the plb bus, the bridge , the opb and the gpio are in logic, (  the old EDK system )

    the code runs out of on chip memory on the PLB bus.

 

 

on the zynq, the ARM has its bus with memory that the code runs from, a hardware bridge to the PL side, and a software gpio peripheral,

 

the two systems sounds rather similar to me, if anything the arm, running faster and having a proper hardware / silicon optimised bridge should be faster than the old PPC at accessing gpio, but from the test we see, this is not the same.

 

 

at the end of the day the question is not the comparison PPC to Zynq, its trying to access gpio peripherals in the PL as fast as possible,  and so far its very slow, 

 

must be a way of improving the system,

 

 

 

 

     

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Highlighted
Participant
Participant
4,955 Views
Registered: ‎11-06-2007

Austin,

 

My focus is on just latency, and not throughput for the time being.

 

I am using AXI4 memory mapped interface between ZYNQ PS FPGA which is the master and Block RAM in Virtex7 FPGA which is the slave.

 

Transfer size parameters : Burst Len - 1, Burst size 32 bits.

 

In Software, I am simple memcopy function in which i will write into all locations in the BRAM and then read back.

 

With asynchronous clock setting between master and AXI interconnect, system works fine with 42 clocks latency.

However if i change it to synchronous clock setting, the system hangs. Write/read doesn't go through.

 

Regards,

Bhargava

0 Kudos
Highlighted
Teacher
Teacher
4,954 Views
Registered: ‎07-09-2009

My client has the same problems of latency on PS to PL to a memory location in the PL.

 

They are using pointers for peeking and poking, but the same sort of times are comming out.

 

 

I have seen this link, but my clients are not seeing anything like the speeds indicated here for a single read / write to the same address, 

 

http://www.xilinx.com/support/answers/47266.htm

 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
Highlighted
Voyager
Voyager
451 Views
Registered: ‎08-02-2019

Hi @drjohnsmith ,

I tried advice in your link and it make axi-lite slave 4X faster than before. I strongly recommend to use it.

I added only 3 lines of code in Bare Metal CPU and 20 clk cycle decreased to 5 clk cycle.

MyXil_SetTlbAttributes(0x43C00000, 0xC06);
mtcp(XREG_CP15_INVAL_UTLB_UNLOCKED, 0);
dsb();

 

If you interested with:

Easily replacing AXI-Lite Master with 16X faster AXI-Full Master interface. You can look my other post, that is marked as solution

Thanks a lot.

Saban

 

<------------------------------------------------------------------------------>

if(solves_problem) mark_as_solution <= 1 else if(helpful) Kudo <= Kudo + 1

<--- If reply is helpful, please feel free to give Kudos, and close if it answers your question --->
AXI-Lite_Slave_Write_Timing_5_Clock_Cycle_After_Changing_Memory_As_Shareable.png
AXI-Lite_Slave_Write_Timing_20_Clock_Cycle_Without_Changing_Memory_As_Shareable.png