UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Adventurer
Adventurer
351 Views
Registered: ‎10-12-2018

Read Latency of HP Ports

Jump to solution
Hi to All,
 

I have a design with two AXI full master IPs for Zynq US+. I developed the two IPs. The first one is reading data from DDR4 memory in the PS part and is connected through HP0 port, and the other one is writing to the DDR4 memory in the PS and is connected through HP3 port. I need high bandwidth and low latency for both read and write operations and because of that, I used the HP ports. The burst length is 256 for both ports and the AXI data width is 128 bits. For the writing process, there is no gap between the burst request time (by putting the write address) and the time that WREADY goes high for the first beat. Also, there is no gap between the beats, and a complete burst takes 256 clocks.

But for the other IP, reading process through HP0, there is a big latency (around 40 clock cycles) between the read request time (putting read address on the bus) and the first beat time. Once the first beat starts, there is no gap between two beats, but there is a big latency at the beginning. It can be seen in the attached timing diagram which is recorded by ILA. 

May I know how I can decrease this latency? Or maybe it is quite normal?

Thank you very much in advance.

Kindes regards,

Amir

timing.PNG
0 Kudos
1 Solution

Accepted Solutions
Scholar jg_bds
Scholar
300 Views
Registered: ‎02-01-2013

Re: Read Latency of HP Ports

Jump to solution

 

QoS will help prioritize a flow over others, but won't necessarily increase response time (on average) unless that flow is currently competing with other flows (that you can provide lower service to).

To see if it will help, measure the response time of a few hundred AXI reads*, and see what the variance is among those responses. At best, QoS will allow you move the average response time down closer to the lowest measured response time.

-Joe G.

* To do this:

  • Create a free-running 32-bit (or larger) counter--a "debug timebase"--and add it to your ILA. Make sure your ILA has the "Capture Control" feature enabled.
  • Set your ILA to capture only events that matter in the response-time calculation: ARREADY && (rising edge of) ARVALID and RREADY && (rising edge of) RVALID.
  • Capture the debug timebase value at the time of each event.
  • Export the data from a capture to a CSV file, which you can import into a spreadsheet program to make calculations and perform statistical analysis.
3 Replies
Scholar jg_bds
Scholar
325 Views
Registered: ‎02-01-2013

Re: Read Latency of HP Ports

Jump to solution

 

Naturally, writes from the PL to the PSU will appear to be more impressive (performance-wise), since you're probably only watching the data disappear into the PSU through the S_AXI_HPn interface. You say to the interface: "Take this data, and store it at this address." All you care about is the one-way timing, and you're far less concerned with the delay until the B-channel response because, let's face it: it's coming back as "OK". Besides, you probably set the write transaction as "Bufferable", so a satisfying response was returned to you as soon as the command and data crossed into the PSU.

Reads, on the other hand, suffer from an entire round-trip delay. Let's take a look at a Read:

You wave good-bye to your Read command as it gets thrust into the PSU. The command gets accepted, but things are far from over--and you must wait. The Read command is going to take just as long to get to the DDR controller as the Write command did, but now that delay is relevant--because you haven't yet received your precious data.

So then there's the DDR Controller. (Stinkin' shared resource.) You have to wait for the DDR controller to get-around to forwarding your command to the DDR memory. Maybe it even had to sneak-in a good-ol' refresh command in front of your Read request. (Stupid dynamic RAM...)  And then you have to wait for the DDR memory to look-up the data in its memory banks and then send it to the DDR Controller. (Stupid CAS latency...) And then the data must wend its way back through the PSU, through all of the clock-domain crossings, to emerge finally at the AXI HP interface. Hooray!

So don't look at the Read taking ~40 more clocks than the Write. It just appears that way as you look at data pass a single point in the path. Consider, rather, the whole path.

To be honest, I can't say if your results are particularly bad or simply normal. I don't know the speeds of your clocks or your DDR. If you messed-up something within the read request, perhaps you're waiting a couple more clocks than you need to. But overall, it seems right to me.

-Joe G.

0 Kudos
Adventurer
Adventurer
309 Views
Registered: ‎10-12-2018

Re: Read Latency of HP Ports

Jump to solution

Hi @jg_bds,

Thank you for the story-like comprehensive explanation ;).

I am using Xilinx ZCU102 evaluation board. The PS part DDR4 memory component is 64bits wide and 2133MHz frequency. Also, the frequency of PL-PS is 300MHz.

One question! How Quality of Service (QOS) can improve this delay?

Regards,

Amir

0 Kudos
Scholar jg_bds
Scholar
301 Views
Registered: ‎02-01-2013

Re: Read Latency of HP Ports

Jump to solution

 

QoS will help prioritize a flow over others, but won't necessarily increase response time (on average) unless that flow is currently competing with other flows (that you can provide lower service to).

To see if it will help, measure the response time of a few hundred AXI reads*, and see what the variance is among those responses. At best, QoS will allow you move the average response time down closer to the lowest measured response time.

-Joe G.

* To do this:

  • Create a free-running 32-bit (or larger) counter--a "debug timebase"--and add it to your ILA. Make sure your ILA has the "Capture Control" feature enabled.
  • Set your ILA to capture only events that matter in the response-time calculation: ARREADY && (rising edge of) ARVALID and RREADY && (rising edge of) RVALID.
  • Capture the debug timebase value at the time of each event.
  • Export the data from a capture to a CSV file, which you can import into a spreadsheet program to make calculations and perform statistical analysis.