cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
s.corda
Observer
Observer
344 Views
Registered: ‎02-27-2019

HBM bandwidth example

Jump to solution

Hi,

I am running the HBM bandwidth example on an Alveo U50 (https://github.com/Xilinx/Vitis_Accel_Examples/tree/master/host/hbm_bandwidth).

This board should support 28 HBM channels. Indeed, when I am trying to generate the bitstream for 8 units (4 channels * 8 kernels = 32channels) Vitis is reporting that it is not possible to map more than 30 channels.

I am able to generate the bitstream for 6 kernels (24 channels) and 7 kernels (28 channels). When I am running them I am getting "bus error (core dumped)". I do not have this issue running 3 kernels. How can I run with 24/28 channels?

 

Thanks

0 Kudos
1 Solution

Accepted Solutions
JohnFedakIV
Moderator
Moderator
159 Views
Registered: ‎09-04-2020

Hi @s.corda ,

The temperatures are within range, with this being said, I would recommend to run the card with the airflow requirements listed in the datasheet. The limitation is the 10W power maximum from the 3.3V PCIe bus. During the tests, take a look at the 3V3 PEX and 3V3 PEX CURR values (reported in mV and mA).

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *

View solution in original post

3 Replies
JohnFedakIV
Moderator
Moderator
305 Views
Registered: ‎09-04-2020

Hi @s.corda ,

I've moved this post to the Alveo forum board as I believe you are running into a HW limitation. The U50 HBM is powered by the 10W PCIe 3.3V rail, there is also converter efficiency to take into account, so you may not get a full 10W.

To take a look at the power during the test, the Alveo Card Debug Guide provides a section on monitoring card power rails: https://xilinx.github.io/Alveo-Cards/master/debugging/docs/common-steps.html#monitor-card-power-and-temperature. Near the bottom of that section, there is a script that can be run in a second terminal that will monitor the power:
https://xilinx.github.io/Alveo-Cards/master/debugging/docs/scripts/loop_query.sh
Make sure to modify the script loop for the time required (this is based on the loop count in line 17 and the seconds of delay in between calls on line 19).

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *
s.corda
Observer
Observer
182 Views
Registered: ‎02-27-2019

Hi @JohnFedakIV ,

I tried the script, and I created more tests: 3,4,5,6,7 units

I managed to run up to 4 units mapping the first 16 HBM channels (SRL0) with a BW of 210GB/s  (the nominal BW in the U50 datasheet, the peak should be 310GB/s).

I also used the script you mentioned, but I did not notice any differences when I was running 5 units. 

The  FPGA temperature is at the limit (48/49 degrees), while the HBM is around 43 degrees. Currently, the board takes fresh air from one small fan. Could that be the problem?

Would a different channels mapping be beneficial? e.g. 3*4 channels in SRL0 and 3*4 in SRL1?

 

Thanks

 
 
0 Kudos
JohnFedakIV
Moderator
Moderator
160 Views
Registered: ‎09-04-2020

Hi @s.corda ,

The temperatures are within range, with this being said, I would recommend to run the card with the airflow requirements listed in the datasheet. The limitation is the 10W power maximum from the 3.3V PCIe bus. During the tests, take a look at the 3V3 PEX and 3V3 PEX CURR values (reported in mV and mA).

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *

View solution in original post