UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
1,347 Views
Registered: ‎06-16-2018

HBM actual bandwidth

Hi all,

I'm using HBM on VU37P FPGA. Just wonder that is there anyone out there using it too? Since I built a bandwidth testing block, the result is not so good as below:

Access pattern : {row,bank,col} : 40 % bandwidth

Access pattern:  {row,col,bank} : 60 % bandwidth

Access pattern: {row,col,bank}, Read write simultanously : 25 % bandwidth.

 

Is that because of technology limitation or my access pattern. Need advice.

 

Thanks

0 Kudos
8 Replies
Xilinx Employee
Xilinx Employee
1,315 Views
Registered: ‎08-21-2007

回复: HBM actual bandwidth

The performance depends on your data pattern greatly. You can run IP example design and compare with the provided traffic.
0 Kudos
1,302 Views
Registered: ‎06-16-2018

回复: HBM actual bandwidth

hi @kren,

did u mean address pattern ? I think data does not play a role in this case?

 

 

0 Kudos
Xilinx Employee
Xilinx Employee
1,293 Views
Registered: ‎08-21-2007

回复: HBM actual bandwidth

Yes, it should be the address pattern.
0 Kudos
Visitor jvl3
Visitor
1,023 Views
Registered: ‎04-10-2018

回复: HBM actual bandwidth

Short version:

I also see low efficiency. What are some guidelines for achieving a reasonable efficiency?

 

Long version:

I also am seeing low efficiency. I’m wondering if I just don’t understand what settings or tasking scheme is need to get a reasonable efficiency. Since it is essentially DRAM, I would have thought 70%-80% would be reasonable.

I tried the IP example design as suggested with all default settings. I modified the testbench stimuli input “csv” file to tell the testbench to write 2048 transfers, and then read 2048 transfers on AXI interface 0 “TG-0”.

 

 0.png

 

This gives a transfer size of 1MB, which should take ~68us if 100% efficient. Details:

  • The transfer size should be 2048 bursts * 16 words/burst * 256 bits / 8bits/byte = 1 MB of consecutive data transfers.
  • I would think the 100% theoretical transfer rate would be 460GB/s / 32 pseudo-channels (or 32 AXI interfaces) = 14.375 GB/s per interface.
  • Time to transfer 1 MB if 100% efficient would be: 1MB / (14.375 *1024*1024*1024) = 6.79e-5 seconds (67.9 microseconds).

 

What I see in the testbench is:

137 uS for 1MB consecutive writes, then 154 uS for 1MB consecutive reads.

 So in the ballpark of 45%-50% efficient.

 

1.jpg

 

Zooming in on the waveform, I can see roughly the first 45 write bursts going at full rate, then ALL subsequent bursts (~2000 of them) slow down to about half rate. "wlast" goes from ~37ns between wlasts to ~62ns between wlasts.

I also see prolonged periods of no activity but I assume that is due to refresh, bank switch, or similar. That only accounts for maybe 15% of the time.

 

2.jpg

 

I have confirmed that the addresses are sequential and increment by 0x200 for each AXI burst.

 

What I want to understand is if this is the wrong way to access the memory. If so, what are some guidelines for getting up to a reasonable efficiency? Or did I just miscalculate or misunderstand something?

0 Kudos
Observer davidh1901
Observer
876 Views
Registered: ‎07-27-2008

回复: HBM actual bandwidth

I am also very interested in the actual bandwidth from the HBM for different burst sizes and access patterns. I'm trying to design for the vu37p part but I don't yet have a board to test it on.

Is there any information available ? The document https://www.xilinx.com/support/documentation/ip_documentation/hbm/v1_0/pg276-axi-hbm.pdf references the performance and resource use web page, but it looks like that page doesn't exist.

0 Kudos
Moderator
Moderator
860 Views
Registered: ‎02-11-2014

回复: HBM actual bandwidth

Hello @davidh1901,

 

I have a change request open to get this website created and linked properly into PG276 in a future revision.

 

Thanks,

Cory

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
Advisor evgenis1
Advisor
601 Views
Registered: ‎12-03-2007

Re: HBM actual bandwidth

Hi @haiphongpham1995

 

>> Is that because of technology limitation or my access pattern. Need advice.

 

I think it's both.

HBM is physically implemented as 8 memory controllers and 4 memory tiles per stack, as shown in the attached screenshots.

So if your design continuously writes to the same memory address from all 32 channels, you'd get 1/32th of the peak bandwidth.

The peak performance is achieved when accesses are spread out to non-overlapping addresses within address range of each channel (no crossbar accesses).

 

Thanks,

Evgeni

Tags (1)
hbm_layout2.jpg
hbm_layout1.jpg
0 Kudos
Observer davidh1901
Observer
198 Views
Registered: ‎07-27-2008

回复: HBM actual bandwidth

The new version of

https://www.xilinx.com/support/documentation/ip_documentation/hbm/v1_0/pg276-axi-hbm.pdf

now has a section explaining that the crossbar switch is limited to about 50% of the maximum bandwidth. If you configurate the memory to access all of it through a single port then you are likely using the crossbar - maybe this is what you are seeing.

Our simulations show 70% or better bandwidth.

0 Kudos