02-03-2021 07:34 AM - edited 02-04-2021 10:47 AM
I am having an issue getting to a successful reconfiguration in a design using DFX managed with the DFX controller v1.0. The DFX controller receives the bitstream DDR3 memory via an AXI interface on a MIG. The design is on a US+ Virtex and I am using Vivado 2020.2.
The specific issue is that I am seeing that the partial reconfiguration is suspended at some point during the process, often with no error coming from the DFX controller. I have been troubleshooting this issue using a reconfigurable module which is a simple down counter but changing the size of the Pblock. I have found that when the Pblock is fairly small, up to a few clock regions, partial reconfiguration works fine. However when I expand the Pblock to be very big, around 20 clock regions, and thus increasing the partial bitstream size, that is when I get the fault to occur. Using the ILA I can capture the last address of the bitstream the DFX controller last accessed before suspending reconfiguration. This address which reconfiguration stops is consistent for any given build, but it isn't the same address for different builds.
Digging further down analyzing captures from the ILA, the commonality I am seeing is somewhat odd transactions on the bitstream AXI bus. Normally the DFX controller accesses the bitstream in the maximum burst size of 256 32-bit words. The ARLEN value for these transactions are 0xFF. When reconfiguration stops I am seeing instead the burst split up into a 255 word burst followed by a 1 word burst. For example one capture I took I saw this sequence of AXI reads:
This sequence, while unusual, shouldn't be a problem for the AXI interface to the MIG. Both are valid AXI transaction lengths and does not cross the 4KB address boundary which I know can cause problems with AXI. The MIG interface is responding correctly and I can see that the data coming from the bitstream AXI bus is being delivered to the ICAP. However, the DFX controller does not commence with reconfiguration after this 1 word burst.
I am not sure what is causing this split of the bitstream burst reads so that is a challenge to replicate in simulation. The point at which the reconfiguration halts is always several thousands of reads into the process. I have been successful at replicating the issue in sim by controlling the bitstream length register to instruct the DFX controller to fetch a bitstream which will necessarily involve an AXI transfer of 1 word. For example setting the bitstream size to 4 bytes. In doing so I am seeing a similar behavior as the DFX controller does not respond to subsequent partial reconfiguration triggers after requesting a 1 word burst. I don't observe the same behavior for other bitstream sizes. For example triggering reconfiguration with a bitstream size of 5 bytes, which means reading out 2 32-bit words, does not have a problem getting triggered again after.
03-15-2021 04:34 AM
Did you manage to get a resolution to the above problem? I am also experiencing issues with the DFX controller and different file sizes of reconfigurable module. I have only just start to debug it and thought it may be related to your AXI burst issue.
03-15-2021 06:01 AM
I did find a resolution. The error wasn't with the DFX controller but with some logic which handles translating the AXI transactions from the DFX controller to the MIG. The DDR3 in our design has data organized in words of 128 bits and the DFX controller receives data in 32 bit words. A calculation is done to determine ARLEN for the DDR3 interface that included a provision for rounding up when the number of 32 bit words don't align with a 128 bit word boundaries. It's that rounding logic which was incorrect, and it had the unfortunate property that it actually behaved correctly for nearly all transfer sizes except the particular case of a burst length of 0xFE which would halt the whole transaction. There was also a second problem caused by the 0x00 ARLEN transfer. The 32 bit data word address 0x7FFFFC on that transfer doesn't align with the 128 bit word boundary for the DDR3. Giving the MIG an address of 0x7FFFFC didn't work and it actually read back from 0x7FFFF0 instead resulting in incorrect data read. This lead to a CRC error reported on the ICAP way down the line.