cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Observer
Observer
2,658 Views
Registered: ‎07-09-2018

PR using MCAP. what is expected speed?

Jump to solution

hi all,

I implemented partial reconfiguration over MCAP (ICAP over PCIe).

I'm on Virtex UltraScale+ VCU1525. I used "PR over PCIe" option from PCIe IP and MCAP software from AR64761(https://www.xilinx.com/support/answers/64761.html).

 

The problem is that it is quite slow.

For very small bitstream (one clock region, ~4MB), it takes about 1s.

For some large bitstream (almost all device, ~70MB), it takes around 20s.

ICAP has bandwidth of 3.2Gb/s at 100MHz, so both should be around milliseconds.

I think the bottleneck resides at MCAP, since the software writes 32bit by 32bit at PCIe capability register.

 

My question is: is it the best achieveable speed using MCAP?

If so, I should try "transfer bitstream using DMA + PR using PR controller IP" for better performance...

which means more nights with no sleep :(

FPGA newbie http://github.com/csehydrogen
1 Solution

Accepted Solutions
Xilinx Employee
Xilinx Employee
2,611 Views
Registered: ‎11-17-2008

@csehydrogen,

 

Your observations are correct.  The best achievable speed reconfiguring over MCAP will vary per system.  Partial bitstream delivery through the MCAP is limited to one Dword configuration writes which in a typical system can give a bandwidth of 3 – 6 Mbytes/s. Most systems only send 1 configuration at a time and because configuration writes are non-posted, a second configuration write will not be sent until the completion from the proceeding write is received.

 

This is much different than delivery over ICAP which can be streamed and does not require the back-and-forth handshaking.  Maximum speed for an SSI device is actually 125MHz (x32 as you note), and up to 200MHz for monolithic devices, or for SSI if your PR region is on the master SLR.

 

Yes, a DMA transfer approach would provide a faster solution, either managed by the PR Controller or by the PCIe host via drivers such as those described here: https://www.xilinx.com/support/answers/65444.html.  Xilinx is writing an application note that shows an example of this approach in UltraScale+, with publication expected this fall.  I don't have an XAPP number yet, but it will be linked from the Partial Reconfiguration documentation page here:  https://www.xilinx.com/products/design-tools/vivado/implementation/partial-reconfiguration.html#documentation

 

thanks,

david.

View solution in original post

1 Reply
Xilinx Employee
Xilinx Employee
2,612 Views
Registered: ‎11-17-2008

@csehydrogen,

 

Your observations are correct.  The best achievable speed reconfiguring over MCAP will vary per system.  Partial bitstream delivery through the MCAP is limited to one Dword configuration writes which in a typical system can give a bandwidth of 3 – 6 Mbytes/s. Most systems only send 1 configuration at a time and because configuration writes are non-posted, a second configuration write will not be sent until the completion from the proceeding write is received.

 

This is much different than delivery over ICAP which can be streamed and does not require the back-and-forth handshaking.  Maximum speed for an SSI device is actually 125MHz (x32 as you note), and up to 200MHz for monolithic devices, or for SSI if your PR region is on the master SLR.

 

Yes, a DMA transfer approach would provide a faster solution, either managed by the PR Controller or by the PCIe host via drivers such as those described here: https://www.xilinx.com/support/answers/65444.html.  Xilinx is writing an application note that shows an example of this approach in UltraScale+, with publication expected this fall.  I don't have an XAPP number yet, but it will be linked from the Partial Reconfiguration documentation page here:  https://www.xilinx.com/products/design-tools/vivado/implementation/partial-reconfiguration.html#documentation

 

thanks,

david.

View solution in original post