cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Observer
Observer
295 Views
Registered: ‎02-12-2020

Anyone using TandemPCIe with Field Updates on Alveo Cards?

Jump to solution

I'm attempting to get TandemPCIe with Field Updates to work on Alveo cards, and I have managed to get a very simple example working (or at least I think it's working), when Xilinx mentioned that apparently TandemPCIe isn't supported on these cards at all?

I'm curious if anyone else has gotten this to work, and what they had to do to make it work, as I fear that I won't be getting any Xilinx support in this regard.

Here is the message I got from Xilinx about the compatibility:

Hi Justin,

Tandem PCIe is not supported on any of the Alveo Data Center Acceleration Platform U2x0 series.  There is a requirement for location of the sys_reset in layout, as well as not have DDR colocated in the Tandem bank.  The Alveo layout does not meet this requirement, thus it not being documented as supported in the User Guide 1289.   

Alveo is targeted to be a production Data Center card, and the servers we have tested against do not require a 100ms boot time requirement.  Additionally, an OS initiated bus-rescan or warm reboot will enumerate a device missed.   I have seen 1 server that this potentially was a problem in - but disabling "fast boot" in the BIOS alleviated the issue.

Please let me know if you have any other questions.

Sincerely,
Beth Price
Staff Product Applications Engineer 
0 Kudos
1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
Xilinx Employee
211 Views
Registered: ‎11-17-2008

Re: Anyone using TandemPCIe with Field Updates on Alveo Cards?

Jump to solution

@jrwagz,

If your goal is dynamic reconfiguration and not fast initial boot, then you should only consider DFX and not Tandem Configuration.  Tandem is specifically for the initial power-up of a device (or full reconfiguration) to meet 100ms enumeration; DFX is for on-the-fly reconfiguration of part of the device; Tandem with Field Updates is our pre-defined solution that combines these two functions. As Beth describes, bank 65 is a hot spot, where multiple aspects of these solutions overlap.  Rather than fight the overlap and come up short, we steer users to select the solution that meets their most critical needs.  In your case, let's focus on DFX as 100ms boot is not critical.

There are a few options, but one thing should remain consistent:  PCIe, bank 65, and the DDR MIG core that uses bank 65 will all stay in the static design. This avoids all conflict with this part of the device, ensures the PCIe end point remains up during dynamic reconfiguration, and potentially enables that PCIe end point as the configuration path for partial bitstreams.  You will determine both the design hierarchy (what is static and what is reconfigurable) as well as the physical layout (floorplan). We recommend keeping clocks and IO at the top unless you have a specific need to dynamically change these. All this information about the DFX solution is centered here: https://www.xilinx.com/products/design-tools/vivado/implementation/dynamic-function-exchange.html

Then in terms of reconfiguring over PCIe, let me point you to two paths.  First, you can use the PCIe end point directly as a path for configuration.  Select the "PR over PCIe" option, and this enables the MCAP within the one PCIe block to accept partial bitstreams.  Details can be found in PG213 for the streaming core, but note that the XDMA core also supports this capability.  This document links to AR64761, which covers the register mapping and such that is needed to deliver bitstreams over PCIe.  Now, this solution is compact, as no other FPGA resources are needed to set up the link, but it is slow, limited to single DWORD configuration writes.  For a faster performance solution, have a look at XAPP1338.  This document describes how to set up a PCIe to ICAP path to continuously stream configuration data to the ICAP.  This can speed up your configuration rate from 3-6MB/s to 500MB/s.

So, hopefully this gets you pointed in the right direction.  There is more to understand about the DFX solution, as you have full control over how your design is set up, but the example designs for either the PCIe IP or the app note can be a starting point for you.

thanks,

david.

View solution in original post

5 Replies
Highlighted
Observer
Observer
249 Views
Registered: ‎02-12-2020

Re: Anyone using TandemPCIe with Field Updates on Alveo Cards?

Jump to solution

I got more details from Xilinx regarding the compatibility of the Alveo cards with Tandem PCIe with Field Updates, and it appears that the main compatibility issue has to do with the layout and placement of the DDR in relation to the PCIe, and generally they find that folks with custom layouts similar to this get Tandem PCIe working, it usually comes at the expense of the DDR controlling not training correctly.  So, I'm going to try and see if just using DFX instead of Tandem PCIe with Field Updates will be sufficient for my use case or not. 

Here is the info from Xilinx on this topic:

Hi Justin, 

Because the DDR are shared in the configuration bank of I/O, if you are enabling that bank, you will run into calibration issues.  I have seen on general purpose FPGAs where people have gotten the constraints set with the DDR inside the tandem region, but it then didn’t load fast enough.   If you don’t really need to meet the 100ms boot time, then I would recommend a DFX based design (formerly known as Partial reconfiguration).    It is a bit more flexible than tandem with field updates. 

Basically - if you get it to work, that’s fine, but if something goes wrong we have not tested it and won’t be able to assist. 

Sincerely
Beth
Observer
Observer
228 Views
Registered: ‎02-12-2020

Re: Anyone using TandemPCIe with Field Updates on Alveo Cards?

Jump to solution

Upon further research, I have learned that I should stray away from trying to use TandemPCIe with Field Updates, and instead just use DFX (formerly Partial Reconfiguration) as my objective/requirement is not the 100ms PCIe boot time, but rather the ability to dynamically change our design on the fly, without locking up the PCIe bus.

Here are the details that pointed me in that direction:

-	Tandem is only there for the 100ms requirement.  If your focus is on DFX, then enabling Tandem is only going to make that more complex and difficult.   We only recommend using Tandem when there is a hard requirement with no other way around it.    Even then, we recommend finding an external solution (like tweaking the reset timing).   There are lots of rules around region requirements, and yet another interface tie-off to deal with doing Tandem + Field Update.       If you don’t have an absolute requirement, go DFX over PCIe.  It has a lot more testing and support behind it, because that is what we use in the Acceleration Flow.
-	The DDR calibration issue is that you end up with DDR banks split between the Tandem region and the non-Tandem region.  Calibration will fail because the DDR bank can’t get split across PBlocks that way (half in, half out, so to speak).  If you don’t instantiate those pins, then I believe you won’t have a problem, but it is again not something we have tested (nor plan to – because we are not seeing that hard tandem requirement). 

Long story short, if you are forced to use Custom Flow, instead of Acceleration Flow (the whole Vitis world) for compatibility reasons and you want to dynamically change your design on the fly, go with DFX instead of Tandem PCIe with Field Updates.

It's still not abundently clear to me how to make DFX work over PCIe, rather than over USB/JTAG, however I'm taking it on faith at this point that it is indeed possible, and I just need to learn how it is to be accomplished.

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
212 Views
Registered: ‎11-17-2008

Re: Anyone using TandemPCIe with Field Updates on Alveo Cards?

Jump to solution

@jrwagz,

If your goal is dynamic reconfiguration and not fast initial boot, then you should only consider DFX and not Tandem Configuration.  Tandem is specifically for the initial power-up of a device (or full reconfiguration) to meet 100ms enumeration; DFX is for on-the-fly reconfiguration of part of the device; Tandem with Field Updates is our pre-defined solution that combines these two functions. As Beth describes, bank 65 is a hot spot, where multiple aspects of these solutions overlap.  Rather than fight the overlap and come up short, we steer users to select the solution that meets their most critical needs.  In your case, let's focus on DFX as 100ms boot is not critical.

There are a few options, but one thing should remain consistent:  PCIe, bank 65, and the DDR MIG core that uses bank 65 will all stay in the static design. This avoids all conflict with this part of the device, ensures the PCIe end point remains up during dynamic reconfiguration, and potentially enables that PCIe end point as the configuration path for partial bitstreams.  You will determine both the design hierarchy (what is static and what is reconfigurable) as well as the physical layout (floorplan). We recommend keeping clocks and IO at the top unless you have a specific need to dynamically change these. All this information about the DFX solution is centered here: https://www.xilinx.com/products/design-tools/vivado/implementation/dynamic-function-exchange.html

Then in terms of reconfiguring over PCIe, let me point you to two paths.  First, you can use the PCIe end point directly as a path for configuration.  Select the "PR over PCIe" option, and this enables the MCAP within the one PCIe block to accept partial bitstreams.  Details can be found in PG213 for the streaming core, but note that the XDMA core also supports this capability.  This document links to AR64761, which covers the register mapping and such that is needed to deliver bitstreams over PCIe.  Now, this solution is compact, as no other FPGA resources are needed to set up the link, but it is slow, limited to single DWORD configuration writes.  For a faster performance solution, have a look at XAPP1338.  This document describes how to set up a PCIe to ICAP path to continuously stream configuration data to the ICAP.  This can speed up your configuration rate from 3-6MB/s to 500MB/s.

So, hopefully this gets you pointed in the right direction.  There is more to understand about the DFX solution, as you have full control over how your design is set up, but the example designs for either the PCIe IP or the app note can be a starting point for you.

thanks,

david.

View solution in original post

Highlighted
Observer
Observer
178 Views
Registered: ‎02-12-2020

Re: Anyone using TandemPCIe with Field Updates on Alveo Cards?

Jump to solution

@davidd ,

Thanks for the detailed response, this is very useful!

I have looked over the reference information you provided and have a clarifying question in regard to XAPP1338.  In the conclusiong of XAPP1338 it says:

This application note shows one way to continuously stream configuration data over PCIe to 
saturate the ICAP. Ultimately, the highest performance solution is one that maximizes delivery
bandwidth to the ICAP by sending configuration data as quickly as the silicon allows. This partial
bitstream delivery must be coupled with a sequence of events before, during, and after partial reconfiguration.

Can you point me to the particular details around that last part, specifically "This partial bitstream delivery must be coupled with a sequence of events before, during, and after partial reconfiguration.".

It seems to reference UG909 for those details, however I'm drawing a blank on what those steps would be.

Any help is greatly appreciated.

Regards,

Justin

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
165 Views
Registered: ‎11-17-2008

Re: Anyone using TandemPCIe with Field Updates on Alveo Cards?

Jump to solution

@jrwagz,

What this references is simply the set of recommendations we have for building a DFX design.  This is the part of your design that must be created to deal with the unique requirements of dynamic exchange.  What you don't want to do is to deliver a partial bitstream when the currently running design isn't ready for it -- you could lose data, hang the system, or other experience other undesirable results.  So you'll want to consider these types of things:

* What is happening in the RP before reconfiguration?  Make sure it's ready to be removed.  Some sort of handshaking is important.  Do drivers need to be unloaded or other static region actions need to occur?

* What sort of decoupling is needed around the RP?  Make sure unpredictable activity in that region cannot affect the static logic.  Activate decoupling before reconfiguration.

* How are the partial bitstreams delivered?  This is important, of course, and there are many ways to do this.  This is the "during" that I noted.

* After reconfiguration, how will decoupling be released and when?  Is a reset event needed beyond the dedicated GSR that occurs at the end of the partial bitstream?  Do new drivers need to be loaded?  What does the static logic need to understand about the new function that has been loaded?

Xilinx has IP that can help with some of these tasks, but you are certainly welcome to craft your own if they don't meet your needs.

https://www.xilinx.com/products/intellectual-property/nav-ip-utility/nav-partial-reconfiguration.html

thanks,

david.