cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Adventurer
Adventurer
493 Views
Registered: ‎09-30-2019

512 or 1024 bit interfaces to HBM?

I am working on quite complex c++ kernels that uses 8 or 16 HBM ports.

All of my processing is done using ap_uint<1024>, and right now so are my interfaces to HBM banks.

Now I wonder.. would there be any benefit (either area or performance) changing the interfaces to 512bits?

Of course the rest of the processing will remain 1024.

 

PS: Working on Alveo U280 on Linux with Vitis

 

Tags (3)
0 Kudos
7 Replies
Highlighted
Xilinx Employee
Xilinx Employee
485 Views
Registered: ‎10-19-2015

Hi @benedetto73

The HBMs are AXI3- 256 bit interfaces trying to run at 450MHz. 

If you go with a 512 bit interface from your kernel, you might have a harder time closing timing, or since the V++ engine will frequency scale, you might close timing at a lower rate. 

If you go with 1024 you will use more logic, so depending on the density of your design, it might be harder to place. 

So yes, I believe your consideration of area and space are the only considerations I'd see also. You want it to go the fastest possible, so I'm thinking the speed that the kernels run at is going to be a result of this choice. 

#Edit: I think we should first consider what speed your 1024 kernel closes timing at and then decide if it would be beneficial to try and close timing twice or more faster when downsizing to 512. Or Maybe your 1024 kernel produces more data than the HBM can consume, that might be causing a bottleneck that you can free up by downsizing that interface. 

Regards,

M

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Highlighted
Observer
Observer
447 Views
Registered: ‎02-20-2008

Hi @benedetto73 

If you happen to implement both versions, 

would it be possible if you could share if you had any benefit in effective DRAM BW in using 1024b vs 512b?

I couldn't get anything routed using 1024b with U50, so I was curious.

Thanks.

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
442 Views
Registered: ‎10-19-2015

@mice101 - completely agree, please share @benedetto73 

 @mice101 - Have you tried routing adjacent Kernels using different SLRs? 

Or were you unable to place a single kernel using 1024? 

Regards,

M

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Highlighted
Adventurer
Adventurer
423 Views
Registered: ‎09-30-2019

@mice101 , sure I will share the results in here. Consider, though, that I am working with HBM only, not DDR.

@mcertosi  I am able to place a single 1024-bit kernel: Actually at the moment I am trying to place 2 such kernels.

0 Kudos
Highlighted
Adventurer
Adventurer
332 Views
Registered: ‎09-30-2019

I tried changing all interfaces to 128bit (which is the lenght of my words).
Results:
- area reduced by 30%.
- performance in HW Emulator improved by 30%.
- but... the kernel just hangs in HW... (and I don't have time right now to investigate)
0 Kudos
Highlighted
Observer
Observer
277 Views
Registered: ‎02-20-2008

Thanks for sharing your result, @benedetto73 . I suppose you reduced the number of compute units by half?

 

One suggestion for your problem: Could you check if the frequency of the bitstream matches the actual programmed frequency?

For my U50, I realized they were different, and hangs when the difference is too large. I think there is a bug there.

 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
241 Views
Registered: ‎10-19-2015

Hi @benedetto73 

When you get to it, see if the hang is reported in xbutil query or in dmesg as an axi firewall trip. I bet its not, in which case you are likely running into an ERT bug. 

If you want a quick test, create a file called xrt.ini and place it in the same directory as the executable that programs the xclbin to the card. Make the contents of that file as follows:

[Runtime]
ert = false

to confirm if the change has made it into your design check in dmesg for a message similar to the following: 

[ 2571.861208] xocl 0000:d8:00.1: dev ffff92c28d5ee098, exec_cfg_cmd: scheduler config ert(0), dataflow(0), slots(16), cudma(0), cuisr(0), cdma(0), cus(2)

ert(0) is the key you are looking for.

This bug seems to be fixed in the upcoming release of XRT + Platforms. 

*Note: The *.INI file is only read by XRT when a new xclbin is programmed onto the card, so rerunning a program or running a program that uses the same xclbin as the last will not force the platform's scheduler to reconfigure itself. 

-M

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos