cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Adventurer
Adventurer
469 Views
Registered: ‎09-11-2019

Questions about HBM bandwidth

Jump to solution

The HBM supports about 8GB/s bandwidth in the U280, and meanwhile the kernel ports have a maximum width of 512 bits. I want to know how this two factors interact with each other.

For example, if  I use an ap_int<1024> point to read 1024 bits from HBM in a loop at the frequency which doesn't reach 8GB/s bandwidth, how does it work? Will it influence the II of the loop? if over 8GB/s?

0 Kudos
Reply
1 Solution

Accepted Solutions
Highlighted
Adventurer
Adventurer
429 Views
Registered: ‎03-01-2020

The AXI width per channel for HBM is 256 bits. If you use 1024 bit accesses/vectors, then a width converter would be inserted by the compiler to convert the 1024 accesses to 256 bit. The compile-time II of the loop will never be impacted by the amount of memory bandwidth your loop/memory access tries to use since that is not really something the compiler can reliably estimate. What will happen is that if the memory bandwidth required by your loop exceeds the available memory bandwidth, the pipeline will stall at run-time and wait for the data to arrive from memory before continuing onto the next loop iteration; this would essentially result in the loop running with a higher II which can dynamically change at run-time.

View solution in original post

6 Replies
Highlighted
Adventurer
Adventurer
430 Views
Registered: ‎03-01-2020

The AXI width per channel for HBM is 256 bits. If you use 1024 bit accesses/vectors, then a width converter would be inserted by the compiler to convert the 1024 accesses to 256 bit. The compile-time II of the loop will never be impacted by the amount of memory bandwidth your loop/memory access tries to use since that is not really something the compiler can reliably estimate. What will happen is that if the memory bandwidth required by your loop exceeds the available memory bandwidth, the pipeline will stall at run-time and wait for the data to arrive from memory before continuing onto the next loop iteration; this would essentially result in the loop running with a higher II which can dynamically change at run-time.

View solution in original post

Highlighted
Adventurer
Adventurer
420 Views
Registered: ‎09-11-2019

Thanks for your help! And I also want to if the kernel works at 350MHz, and I read 28 32-bit numbers from only one HBM through 3 256-bit ports and 1 128-bit port within one clock, then the requested bandwidth should be bigger than 28*4B*350MHz=39.2GB/s, which is higher than the actual 8GB/s one HBM supports. How does it work in this situation?

0 Kudos
Reply
Highlighted
Adventurer
Adventurer
411 Views
Registered: ‎03-01-2020

I am not sure which HBM board you are using but Alveo U280 provides a total of 460 GB/s with 32 channels, which gives 14.375 GB/s per channel. Similarly, Alveo U50 provides a total of 316 GB/s and 9.875 GB/s per channel. However, the HBM stack is comprised of only 8 physical banks, each of which provides 1/8th of the total bandwidth, and I believe it should be possible to fully saturate the bandwidth of each bank through only one channel if you use wide-enough accesses and your kernel runs at a high-enough operating frequency.

In the end it doesn't make any difference how many accesses you have, as long as you connect each access to a different m_axi bundle (accesses connected to the same m_axi bundle can affect loop II). At run-time, the memory controller will decide which of the multiple accesses going to the same physical memory bank goes through and every other access will get stalled, which stalls the pipeline, too.

I recommend checking this article for more information on the Xilinx HBM boards:

https://developer.xilinx.com/en/articles/maximizing-memory-bandwidth-with-vitis-and-xilinx-ultrascale-hbm-devices.html

0 Kudos
Reply
Highlighted
Adventurer
Adventurer
406 Views
Registered: ‎09-11-2019

mmexport1603770711494.png

So the bandwidth given by xbutil dmatest is the actual bandwidth of HBM[0] and HBM[1] in my project rather than the maximum bandwidth HBM[0] and HBM[1] can support?

Is HBM[0] the example of one "channel" you mentioned?

 

0 Kudos
Reply
Highlighted
Adventurer
Adventurer
400 Views
Registered: ‎03-01-2020

No, the xbutil bandwidth is independent of design; however, the bandwidth reported by the dmatest is limited by PCI-E transfers and not HBM, and essentially shows the maximum bandwidth provided by the PCI-E connection between your host machine and the FPGA board with is much lower than HBM bandwidth.

0 Kudos
Reply
Highlighted
Adventurer
Adventurer
390 Views
Registered: ‎09-11-2019

Ok, I got it. Thanks very much for your help!!!

0 Kudos
Reply