11-01-2019 10:56 AM
I am having an issue with a design using the AXI HP ports on a Zynq-7000 where the latency of the reads is very variable on a time critical workload.
The design is fetching a large number of relatively small bursts of data over the HP AXI interface. The proprietary design I am porting to the platform runs off a fixed slot timer which gives me about 1000 of the AXI memory clocks (250MHz) to do the work in. There are 4 other AXI channels (5 total - 3 read and 2 write) which have been mapped onto 3 HP ports (0,1 and 2). The workload of the other AXI masters is very linear and is a much lower load.
At the start of the slot the block issues a number of read requests until the queue is full and ARREADY on the channel goes low, which is as expected. As data come is in and the other side of the designs FIFO is drained down, more requests are issued as soon as possible. In some cases though the time that ARREADY is de-asserted for is in excess of 200 cycles. See capture below.
In the above capture the yellow marker is the start of the work slot. The red marker is the start of the next one and the trigger point where I added some logic to detect the overrun where not all the data had been fetched in time. As you can see, there is barely any traffic on the rest of the HP ports, but HP0 is getting very long latencies for some of the bursts. Looking to the slot before the yellow marker, this data was all fetched in plenty of time. The amount of data to fetch in each slot is roughly the same, but the alignment of accesses could be quite different.
I have tried to optimise the memory configuration as much as I can but it is still causing these overruns about once a second, presumable when the accesses that have been requested fall on boundaries between banks and rows or are not ideal in terms of alignment.
HP0/1 is set as High priority
HP2/3 is set as medium and the other DDR ports for the processor are configured as Low priority.
In addition I have enabled the HPR queue with a 16/16 split for HP0 to try and help the situation.
The design in question was previously used on an AXI MIG implementation in a Kintex-7 K160. While it was a challenging workload, it is stable and working in that implementation. In this case each of the 5 ports were given a priority, instead of an aging counter value. Which is why I tried to use the HPR queue to make sure the HP0 reads were prioritised.
Is there any other tricks I am missing in the configuration of the system to improve the read latency?
Is there a way to work out if the extra latency is being caused by long accesses by the processor that are low priority but taking a long time on the bus?
Any help or pointers would be much appreciated!
11-18-2019 07:38 AM
It would be interesting to see the traffic on HP3 as well. Also, not only the HP ports will have an impact but also the processor might be trying to access the memory
You might want to look at the advanced Quality of Service (QoS) settings in the Zynq TRM UG585 .
11-19-2019 04:55 AM
Thanks for the reply.
I deliberately left out HP3 as it's totally disconnected in the design.
I have gone through the aribtration with a fine tooth comb.
For the port that I'm having trouble with (high read latencies)...
ARQOS set to "1000" whereas all other ports read and write ports have QoS seeting of 0.
The read port for HP0 is setup to use High-Priority queue in the DDR controller.
The HPR is enabled with a 24/8 split.
I have an open SR which @debrajr was looking at for me. I sent a dump from the ILA with all the transcations leading up to the long latencies. Am waiting on a response to see if that helps narrow down if it's the transaction address patterns that are causing an issue.
Is there any way to peek at the CPU side of the DDR controller and see if it is causing long bursts on it's bus which might be delaying the HP ports?
11-19-2019 06:03 AM
Remember that there are 2 level of QoS. One to control the priorities between the PL to PS port and one to control the port of the DDR controller itself (so compare priority with CPU).
I am not sure if there is a way to check the PS load on tge DDR controller. But I wouls suggest to try to avoid any access from the CPU just as a test to see if it helps on tge PL to pS interface
11-21-2019 08:56 AM
Could you point me in the right direction as to the prioritisation between the PS and PL. My understanding was the HP0/1/2/3 ports were on DDRC port 2 and 3 and the PS section masters were on ports 0 and 1 (i.e. L2 Cache, OCM etc).
I have already de-prioritised those. They are set to the "Low" setting. See pic below.
I did a test where I loaded the processor by writing a large amount of data to tempfs using dd. This did have the effect of causing the slow down issue to happen more often, but it wasn't much more. I am writing video data in and out of the memory and it did cause a momentary glitch when it was running but wasn't completely killing it.
It would suggest that the CPU cycles on the memory are getting in the way occasionally. Is there a way to disable the LRG algorithm and always prioritise the HP ports over the CPU regardless of transaction age?
11-22-2019 08:06 AM
The prioritization between PS and PL looks correct on the memory controller.
I think this is the limit of my konwledge. @debrajr might be able to give you more details if there is a way to give even less priority to PS. I assume there is a way of reducing the number of outstanding transactions which could help