04-05-2020 07:29 PM
I am trying to use AXI performance Monitor.
I am using it in Profile Mode. I have connected one of the slots to AXI VDMA MM2S channel.
I am observing that for read Channel, the Read Latency count is showing a unusual high number.
I am following Programming Sequence section of profile mode from the specs
Any hint on how to correctly use the Read Latency Count Metric
04-05-2020 07:53 PM
What do you consider to be an "unusually high" number? And what is the data path, to include any and all clock rate conversions, data width conversions, and the slave at the end of the path?
04-09-2020 08:40 PM
Please find a snapshot of how the performance monitor is connected in the design.
Also, I did a simulation and I am observing that outstanding Read Transactions are happening on HP0 port to which Slot1 of the performance monitor is attached.
In Performance Monitor, Read latency is calculated as the time from the start of read address issuance to the
ending of the read data transaction. In outstanding Read Transaction case, address issuance of two transactions is happening a few cycles apart, so Read latency number is more for the second transaction as its address is issued much earlier. For Real read latency, it seems I have to divide the read latency calculated by performance monitor by 2
In AXI Performance Monitor Specs (Page 17), it is mentioned that
Supports a maximum outstanding transaction depth of 32. The write/read latency
metrics are affected by the outstanding transactions
04-10-2020 04:29 AM
Yes, that latency is horrible.
My first question, though, would be: why isn't RREADY held high? That signal is produced within your logic, and it should be something that you can control (somewhat--the interconnect can impact it as well.)
My next question would be whether or not the AXI performance monitor was set up to collect statistics on all ID's, or did only some of your ID's get monitored.
04-10-2020 06:43 AM
Thank you for looking into this issue.
I have following logic in the design
CUSTOM_BLOCK -> VDMA -> ZYNQ_HP0
Performance Monitor is being used in Profile mode and its Slot1 is connected to HP0 Port.
Custom Block reads image data from DDR using MM2S channel of VDMA. The VDMA MM2S channel is configured for 640 HSIZE and 480 VSIZE to read the image data from DDR memory. Each pixel is 16 bit.
The CUSTOM block can load and process only one ROW at a time. So it de-asserts READY to VDMA after one row of 640 pixels is read. So for MM2S interface of VDMA, ready goes low after each row is read and goes high when the row is processed.
ID based filtering is only available in Advanced Mode, in my case Performance monitor is used in Profile mode, so it ignores ID for metric calculation.
What I am observing from Sims is that there is one outstanding Read Transaction. I am using PYNQ based setup for validation of my application on the board. In Validation, I am observing that the Read latency value that I get for a frame, if I divide it by 2 and multiply by my clock cycle time, it gives me a number which seems reasonable for the time it takes to read a frame from DDR memory.
04-10-2020 07:04 AM
How wide is the data bus of your custom block? Did you match the interconnect at 128-bits? Also, are you matching the clock rate of the PS? My guess is a no on the second, since video can be notoriously difficult to match sample rates with, but how about the data width?
04-10-2020 07:22 AM
CUSTOM Block Streaming interface to VDMA is 64 bit. VDMA AXI master interface is also 64 bit. ( So MM2S channel is 64 bit)
It is the axi interconnect through which VDMA is connected to Zynq HP0, which does the 64 bit to 128-bit conversion.
Frequence of Clock is 200Mhz