I am trying to simulate a custom VHDL AXI master connected to Virtex Ultrascale HBM. I am using Vivado 2020.2 and Mentor Questa 2020.4 and RedHat 7. The libraries compiled OK once I used the late version of gcc shipped with Questa (rather than the RHEL 7 default).
The AXI master performs read-add-write cycles with single data beat transactions and fairly random addresses. The custom AXI master has been simulated and tested in hardware with a DDR4 memory interface and largely works. Actually I get data errors at the rate of perhaps 1 in 10E9, which go away if I drop the memory clock rate from 1200 MHz to 1000 MHz. I intend t orevist this once the IP has been tested in other platforms.
Working from the example design, I made a VHDL top level which connected 2 instanaces of my custom master to AXI port 0 and 1 of the HBM. At first this did not work, so I added 35 ps delays to all the signals between the VHDL master and the system Verilog HBM simulation model to ensure data hold time even in the presence of any delta delay cycles particularly in the clock route into the HBM core.
questa.simulate.vsim.more_options* -onfinish final +notimingchecks
With these changes the simulation seems to work on AXI busses 0 and 1.
I wanted to simulate the design using 1, 2, 4 ,8 AXI busses to see how the performace scaled, or how independent the interfaces are. I am tying address bits 31 downto 28 to a constant equals to the AXI port number as my understanding is that means the transactions will not need to be routed between ports. Have I got this right?
With 1 and 2 AXI port used the simulation works and I see an apporximate 10 % reduction in performance per port when I enable both. My understanding is that the 2 ports on 1 memory controller share resources (command and address busses) to the HBM stack so some level of delay is to be expected, but only a little as there will be many NOP cycles waiting for tRAS, and Column read latency etc.
When I extend to 4 AXI ports enabled, the simualtion run OK with another 4% slowing of performance per port. Given I thought I was using a separate memory controller for AXI ports 2 and 3, should I expect this further reduction in performance?
I then modified the design to use AXI ports 0..7.
Testing with all 8 ports in use, streams 0..3 appear to work but writes to stream 4 and 5 then 6 and 7 lockout after some 10s of cycles.
With 8 ports connected, but ports 0..3 not in use, the burst write operations to ports 4 ..7 all work, however when the read-add-write cycles start, the HBM accepts the first 16 reads but never responds with RVALID.
I tried increasing my hold time delay from 35 ps to 105 ps and then 1005 ps, but it made no difference.