03-25-2020 07:19 AM
I'm a beginner with HLS and acceleration on FPGA and as such I wanted to implement a simple loopback with HLS to test the maximum bandwith.
I get satisfying results with the communications host <=> global memory and global memory <=> kernel.
My problem is that I measure around 1GBps of throughtput with my host program.
Here is the question : Can I do better or is it the maximum with one compute unit working ?
I'm on the Alveo U250 acceleration card and with the version 2018.3 .
04-01-2020 08:14 AM
I think it might be easier for me to assist you if I knew where you are coming from.
Can you tell me what you are measuring when you say you measure 1Gb throughput on your host program?
What are you asking to do better?
Can you describe your data flow and what you are comparing?
Have you looked at the Alveo / Vitis getting started example projects? They have host -> global and kernel -> global bandwidth memory tests.
05-03-2020 08:12 PM
First sorry for the long silence, I didn't get any notification of the answer.
I'm measuring the time betwen the enque of parameters until the end of the memory to host transfert. That with the size of the data sent I supposedly get the throughput.
The flow is very simple :
- I send around 4Mb to the Alveo (as an array)
- the kernel (HLS) is written so that what is read in the input buffer (in memory bank 0) is written back in the output buffer (memory bank 1)
Until now I only looked at the dma test but I will take a look
05-04-2020 10:08 AM
Make sure to check out the Vitis Analyzer and the profiling options for kernels during kernel build time.
We've made these tools to make it easier for new users instead of instrumenting their own counters.