Sign In

Don't have a Xilinx account yet?

  • Choose to receive important news and product information
  • Gain access to special content
  • Personalize your web experience on Xilinx.com

Create Account

Username

Password

Forgot your password?
XClose Panel
Xilinx Home

Soft Error Effects Mitigation

by Xilinx Employee on ‎06-30-2011 01:21 PM

It has been a while since I last posted on my ‘favorite’ subject:  Soft Error Effects.  My excuse?  I have been busy.

 

Now that we are rapidly descending into the nanometer realm, the terrible threat of “direct ionization” has struck.  What is this terrible new problem?  Well, it is our old friend the proton.  It turns out that for every ion ripped from the silicon lattice when an atmospheric neutron strikes, there are more than a few protons that are also liberated in the sub-nanometer device.  In the past, the proton’s charge was small enough and the stored charge at circuit nodes large enough that the protons did nothing at all.

 

That was the bad news.  So what is the good news?  Xilinx designs its FPGA devices to be immune to direct ionization effects.  It is getting tougher and tougher, but it does not look like we will have to worry until we descend below the 22 nanometer technology node.

 

So who is affected?  Well, start with ASIC devices, continue through ASSP devices, and then sprinkle in other FPGA vendors who have started shipping products with features smaller than 65 nanometers.

 

 

Now What am I to do?

 

I really cannot offer you any advice if you are using ASIC and ASSP devices; until those markets start designing in soft error effect mitigation features, you are pretty much “hung out to dry.”  No support, no data, no test results, and a large risk to reliability and availability are the result.

 

As I have already mentioned, Xilinx FPGA devices are designed for robustness in the atmospheric neutron environment; they are fabricated and assembled using ultra-low alpha materials so that they continue to have the lowest soft error failure in time (FIT) rate, bar none.

 

http://www.xilinx.com/support/documentation/user_guides/ug116.pdf

 

The above publication of our soft (and hard) failure rates remains the only such document publicly available today.  Makes you stop and think--what are they trying to hide?

 

So, the first rule of designing a robust, available, and reliable system is to use a Xilinx FPGA device.

 

 

New Feature:  Find & Fix Essential Bits

 

As I said above, we have been busy with mitigation methods.  The latest one to be released is the Soft Error Monitor IP Core.

 

http://www.xilinx.com/support/documentation/ip_documentation/sem/v1_3/ds796_sem.pdf

 

This is the new, fully supported IP core which includes error injection for testing (sort of a “beam in a box”), error identification, notification, and correction.  Along with this fully supported free IP is the feature of “essential bits” which are those bits in the configuration that cause a functional difference in the FPGA device.  Conversely, being a non-essential bit means that a flip on one of those bits has no effect on the device (improving availability and mean time between failures, but not having to take action when those flips occur).

 

In addition, the new core is able to find and fix entire frames if there is a rare multi-bit upset (more than three bits adjacent which cannot be repaired by the single error correct, double-error detect built-in hardware feature).

 

Note that physically adjacent 2-bit errors are in separate frames due to physical address interleaving, so all 2-bit multi bit upsets (MBU), are also corrected (as two separate single bit errors).

 

Support for Virtex-5 FPGA is from the older application note:

 

http://www.xilinx.com/support/documentation/application_notes/xapp864.pdf

 

The new SEU Monitor IP core will soon be available for Spartan-6 and 7 series devices.

 

 

New Software

 

Mentor Graphics released their synthesis tool last year:

 

http://www.mentor.com/products/fpga/synthesis/precision-hi-rel/

 

This tool has now found commercial applications in civil aerospace, automotive, and safety-critical markets. By creating robust state machines, selective triplication and voting, availability and reliability may be vastly improved upon.

 

Typically, the dataflow part of a design does not need to be robust, as higher-level data checks (CRC or parity) detect bad data and systems ask for a resend.  The control flow section of the design, however, needs to be as robust as possible, and having a state machine that gets stuck is a disaster.

 

 

Designing the most Reliable and Available Systems

 

Currently, one’s only choice is to start designing your next system right now with Xilinx FPGA devices.  There really is no other alternative; ASIC and ASSP devices are massively ‘broken’ and cannot be fixed without a significant engineering effort--we know, because we have invested in that effort for ten years now.  It is true that competitor’s FPGA devices are ‘following’ us, but because they are not in the lead, why take the risk?

Comments
by hans@unitron.com.au on ‎07-26-2011 11:04 PM

Iguess so

About the Author
  • Austin graduated from UC Berkeley in 1974 and 1975 with his BS EECS in Electromagnetic (E&M) Theory and MS EECS in Communications and Information Theory. He worked in the telecommunications field for 20 years designing optical, microwave, and copper-based transmission systems. Austin joined the IC Design department for the Virtex product line at Xilinx in 1998. His role for the last four years is working for Xilinx Research Labs, where he is looking beyond the present technology issues. Austin has 69 patents.