How to improve transfer efficiency when bw utiliztion is already 100%
I have simple design that uses one DataLoad kernel to fetch data and then passes the stream to a DataStore kernel which then stores than back to gmem. Both kernels uses 521-bit wide gmem interfaces and from vitis_analyzer, it was shown that bandwidth utilization was 100% already. But on "the top level kernel transfer" page, it shows that transfer efficiency only reaches to 25%. What is the difference between this two numbers and how could I further improve transfer efficiency when bw utilization was already 100% ?