10-06-2018 08:16 AM
I'm recently trying to implement an IP in Vivado HLS and by testing it on a zynqmp platform I have noticed that adjusting the interface latency can make a huge difference in the overall throughput. HLS User Guide (ug902) mentions interface latency only in a small paragraph, where it is stated that it is used to initiate a bus request before a read/write is expected and that by setting a low latency value the design may stall waiting for the bus while by setting a high value the bus may stall waiting the design. So my question is if there is a way to calculate the interface latency correctly to maximize the throughput.
10-08-2018 07:18 AM
I´m assuming you are talking about the AXI4 Master interface.
The overall latency depende on where you store and get the data from and how many interconnects are in between.
If you are using the AXI4 interface to read/write data from/to a BRAM you get a low latency of about only some clock cycles.
If you want to use the DDR3 Memory over some interconnects you get/store (randomly) data your get dozens of clock cycles latenzy for every access.
In some cases it make sens to work with an (HLS-) buffer engine.
10-09-2018 06:55 AM
Thanks for your response. Yes I am talking about AXI4 Master.
I have written an IP with the dataflow pragma enabled and I have tried to follow the tutorial workflow, where there is a function for reading the values from memory and storing them to BRAM, a function for calculating and a third function for writing the BRAM values back.
Also, only by altering the latency value in interface pragma I have experienced different total execution times. I have tried to analyze the design with an ILA core and I see differences in the reading patterns (I see split bursts or big/small pauses between bursts) which is something I cannot explain because the accelerator's RREADY is always 1 and the ARVALID sending pattern is similar between the different designs. So I assume that it has something to do with the sharing between the dataflow functions.
What I would like to know is, if there is a way to find the ideal latency value in order to minimize the execution time.
Also what is the (HLS-) buffer engine that you mentioned?
10-09-2018 07:13 AM
If you only work with (small) RAMs inside the FPGA use the BRAM Interface instead of AXI.
The BRAM-Interface has an deterministic latency, AXI not.
I think BRAM interfaces are also smaler then AXI.
What I mean with buffer engine is to split the job in two or three HLS core.
One core is doing the calculation with values from the BRAM and store it back to an other BRAM and one (or two) IP-Core to tranfere the data from/to the BRAM to DDR memory.
10-11-2018 02:43 AM
I 've chosen to work with AXI reading from DRAM because the dataset is pretty big, that's why I would like to know if there is a way to determine the average reading-writing AXI latency.
10-11-2018 04:32 AM