I have been running some speed tests on the HBM (VU37P device) and have some doubts. Note: for 8 GB HBM, byte-level addressing is done using 33 bits.
1a) The AXI protocol requires that bursts which cross the 4 KB boundary should be split into multiple (shorter) commands. However, I have tested writing and reading with bursts that cross the boundary (i.e. addr_start[32:12] != addr_stop[32:12]) and I have never encountered errors. Is it safe to assume that the AXI interface on the HBM IP does not worry about boundary crossings?
1b) For a single AXI port, performing sequential memory read, sweeping from address 0 to 2^23-1 using burst length of 15, takes a certain amount of time. If 4K boundary crossings are verified and commands are split into multiple commands, this time increases. This reinforces what was encountered in (1a).
2) For a single AXI port, performing sequential memory read, sweeping from address 0 to 2^23-1 using burst length of 16, takes a certain amount of time. If the same test is done, but now sweeping from address 1 to 2^23-1, it takes slightly longer. This then implies that depending on where we are crossing there is a more costly overhead for the HBM (to open/close pages, pre-charge, etc.). So this would go against what was encountered in (1b).
3) For a single AXI port, performing sequential memory read, sweeping from address 0 to 2^23-1 using burst lengths varying from 1-16 and without boundary crossing verification, throughput results were non-linear w.r.t. the burst length. In other words, burst length of 16 was slightly worse than 15, 14 was worse than 13, 12 was worse than 11, etc. This once again shows that there must be some issue with boundaries since we would expect larger burst lengths to produce higher throughput since fewer handshakes would have to take place. The following table shows these results.