01-02-2014 08:07 PM
I have created a system with one master and one slave on AXI Interconnect. Slave here is a Block RAM connected to the AXI interconnect using AXI BRAM controller.
With asynchronous AXI master and interconnect clocks, the write and read transactions latency is around 42 clocks. With this configuration the system runs on a validation platform we have (with master on ZYNQ FPGA and interconnect and slave on Virtex 7 FPGA).
By changing clock settings to synchronous, the latency in simulation reduced to 6 clocks but the system hangs on the validation platform.
Are there any recommended settings for AXI interconnect for a synchronous clocks system?
Note that I am using Plan Ahead XPS 14.5 tool to generate the system.
01-02-2014 08:51 PM
01-06-2014 07:30 AM
are you using the AXI stream interface or the lite / full version of the memory maped type ?
I hve a customer who has found very slow read / writes to the PL from the PS side of a zynq
seems that setting up and executing a single access ps to pl is slow, its only the bursts that give high speed, once they are set up.
I'm interested in your results,
01-06-2014 01:24 PM
Note that transfers to/from DDR memory,
will experience 'slow' times while the memory changes row or column addresses, or refreshes. So burst are fast, and sequential access is generally fast, but random access can be very slow.
As streaming is generally sequential, it is generally fast, but there is still overhead for the DDR address changes, and refresh cycles.
This all assumes perfect software, and all settings are correct.
01-06-2014 09:01 PM
I am not using DDR memory. Access from ZYNQ PS is to a block RAM on Virtex 7 FPGA. Still we see nearly 42 clocks latency for memory write or memory read.
01-07-2014 12:16 AM
my client is seeing the same sort of numbers,
the hello world program running on the zynq
a single gpio register on the pl side,
nothing else inthe design,
no ddr, no os running,
just a poiunter read and write to the PL gpio from the arm in the ps,
interesting all the examples I have seen on the sites are for streaming between the ps and pl,
veyr little examples of speed of accessing single registers / memory on the PL side from the PS.
to put things in context,
the Virtex PPC version of the design with the old IBM interconnect can access the gpio faster than the 666 MHz ARM 9 in the zynq can ..
01-07-2014 08:32 AM
01-07-2014 10:56 AM
the link poinrts to a ddr system , or have I read that wrong.
neither of the systyems here use ddr,
hello world runs out of on chip memory on the zynq arm,
and the pointer access to the pl side register is the only user code,
its the access from the ps to the pl that seems to be taking forever.
01-07-2014 10:57 AM
01-07-2014 11:28 AM
happy new year to you,
re power pc / zynq comparison
The PLB bus on th epower pc, goes to a slower bus via a bridge, and then to the gpio,
( cant remeber the name of the slower bus, was it OPB ? )
both the plb bus, the bridge , the opb and the gpio are in logic, ( the old EDK system )
the code runs out of on chip memory on the PLB bus.
on the zynq, the ARM has its bus with memory that the code runs from, a hardware bridge to the PL side, and a software gpio peripheral,
the two systems sounds rather similar to me, if anything the arm, running faster and having a proper hardware / silicon optimised bridge should be faster than the old PPC at accessing gpio, but from the test we see, this is not the same.
at the end of the day the question is not the comparison PPC to Zynq, its trying to access gpio peripherals in the PL as fast as possible, and so far its very slow,
must be a way of improving the system,
01-08-2014 02:30 AM
My focus is on just latency, and not throughput for the time being.
I am using AXI4 memory mapped interface between ZYNQ PS FPGA which is the master and Block RAM in Virtex7 FPGA which is the slave.
Transfer size parameters : Burst Len - 1, Burst size 32 bits.
In Software, I am simple memcopy function in which i will write into all locations in the BRAM and then read back.
With asynchronous clock setting between master and AXI interconnect, system works fine with 42 clocks latency.
However if i change it to synchronous clock setting, the system hangs. Write/read doesn't go through.
01-08-2014 04:44 AM - edited 01-08-2014 05:55 AM
My client has the same problems of latency on PS to PL to a memory location in the PL.
They are using pointers for peeking and poking, but the same sort of times are comming out.
I have seen this link, but my clients are not seeing anything like the speeds indicated here for a single read / write to the same address,
06-05-2020 02:36 AM - edited 06-11-2020 04:38 AM
Hi @drjohnsmith ,
I tried advice in your link and it make axi-lite slave 4X faster than before. I strongly recommend to use it.
I added only 3 lines of code in Bare Metal CPU and 20 clk cycle decreased to 5 clk cycle.
MyXil_SetTlbAttributes(0x43C00000, 0xC06); mtcp(XREG_CP15_INVAL_UTLB_UNLOCKED, 0); dsb();
If you interested with:
Easily replacing AXI-Lite Master with 16X faster AXI-Full Master interface. You can look my other post, that is marked as solution
Thanks a lot.
if(solves_problem) mark_as_solution <= 1 else if(helpful) Kudo <= Kudo + 1