02-11-2019 08:44 AM
Hi to All,
I have a design with two AXI full master IPs for Zynq US+. One is reading data from DDR4 memory in the PS part and is connected through HP0 port, and the other one is writing to the DDR4 memory in the PS and is connected through HP3 port. I need high bandwidth and low latency for both read and write operations and because of that, I used the HP ports. The used burst length is 256 for both ports and the AXI data width is 128 bits. For the writing process, there is no gap between the time that the write address is on the bus and the start time of data writing. However, there is a big latency (around 40 clock cycles) between the read address time and the time in which the first beat is on the read data line. It can be seen in the attached timing diagram which is recorded by ILA.
May I know how I can decrease this latency? Or maybe it is quite normal?
Thank you very much in advance.
02-11-2019 09:08 AM
There are a number of things here to look at, and to be up front, I don't have the answer.
a) How are you getting the data to the PS ? DMA engine ?
b) Check how many FIFO's are there in the path ?
c) look at the DDR4 , does it not have to do row / bank / pre charge stuff et all ?
d) are yo doing back to back reads, or read and writes ?
e) What bus's are blocked on reads ?
Bottom line, whats the actual data rate for read and write your getting ? how does that compaer with the expected for the DDR4 and clocks your using ? Things like ras and cas timmings have great affects on speeds.
02-11-2019 11:58 PM
Thank you for your reply.
As I wrote, I have two IP cores with AXI master interface for transferring data between PL and PS. One IP, which is connected to HP0 port, is ONLY reading the data from PS DDR memory and converts to a special bus defined by our company. Once the first beat is available on the RDATA line, I am reading the 256 beats continuously in 256 clocks. The other IP is connected to HP3 ports and it is ONLY writing data to the PS DDR memory. For this IP also the whole burst can be done in 256 clocks without any wait cycle between. I developed both IPs and there is no latency at the beginning of the bursts in my design.
Actually, my problem is the long latency at the beginning of the burst of reading IP and not during the burst, as it can be seen in the figure.