It has been a while since I last posted on my ‘favorite’ subject: Soft Error Effects. My excuse? I have been busy.
Now that we are rapidly descending into the nanometer realm, the terrible threat of “direct ionization” has struck. What is this terrible new problem? Well, it is our old friend the proton. It turns out that for every ion ripped from the silicon lattice when an atmospheric neutron strikes, there are more than a few protons that are also liberated in the sub-nanometer device. In the past, the proton’s charge was small enough and the stored charge at circuit nodes large enough that the protons did nothing at all.
That was the bad news. So what is the good news? Xilinx designs its FPGA devices to be immune to direct ionization effects. It is getting tougher and tougher, but it does not look like we will have to worry until we descend below the 22 nanometer technology node.
So who is affected? Well, start with ASIC devices, continue through ASSP devices, and then sprinkle in other FPGA vendors who have started shipping products with features smaller than 65 nanometers.
Now What am I to do?
I really cannot offer you any advice if you are using ASIC and ASSP devices; until those markets start designing in soft error effect mitigation features, you are pretty much “hung out to dry.” No support, no data, no test results, and a large risk to reliability and availability are the result.
As I have already mentioned, Xilinx FPGA devices are designed for robustness in the atmospheric neutron environment; they are fabricated and assembled using ultra-low alpha materials so that they continue to have the lowest soft error failure in time (FIT) rate, bar none.
The above publication of our soft (and hard) failure rates remains the only such document publicly available today. Makes you stop and think--what are they trying to hide?
So, the first rule of designing a robust, available, and reliable system is to use a Xilinx FPGA device.
New Feature: Find & Fix Essential Bits
As I said above, we have been busy with mitigation methods. The latest one to be released is the Soft Error Monitor IP Core.
This is the new, fully supported IP core which includes error injection for testing (sort of a “beam in a box”), error identification, notification, and correction. Along with this fully supported free IP is the feature of “essential bits” which are those bits in the configuration that cause a functional difference in the FPGA device. Conversely, being a non-essential bit means that a flip on one of those bits has no effect on the device (improving availability and mean time between failures, but not having to take action when those flips occur).
In addition, the new core is able to find and fix entire frames if there is a rare multi-bit upset (more than three bits adjacent which cannot be repaired by the single error correct, double-error detect built-in hardware feature).
Note that physically adjacent 2-bit errors are in separate frames due to physical address interleaving, so all 2-bit multi bit upsets (MBU), are also corrected (as two separate single bit errors).
Support for Virtex-5 FPGA is from the older application note:
The new SEU Monitor IP core will soon be available for Spartan-6 and 7 series devices.
Mentor Graphics released their synthesis tool last year:
This tool has now found commercial applications in civil aerospace, automotive, and safety-critical markets. By creating robust state machines, selective triplication and voting, availability and reliability may be vastly improved upon.
Typically, the dataflow part of a design does not need to be robust, as higher-level data checks (CRC or parity) detect bad data and systems ask for a resend. The control flow section of the design, however, needs to be as robust as possible, and having a state machine that gets stuck is a disaster.
Designing the most Reliable and Available Systems
Currently, one’s only choice is to start designing your next system right now with Xilinx FPGA devices. There really is no other alternative; ASIC and ASSP devices are massively ‘broken’ and cannot be fixed without a significant engineering effort--we know, because we have invested in that effort for ten years now. It is true that competitor’s FPGA devices are ‘following’ us, but because they are not in the lead, why take the risk?
You must be a registered user to add a comment here. If you've already registered, please log in. If you haven't registered yet, please register and log in.