SEU, Not Again!

 

I know, I know, I know: Soft error upsets (SEU) are old news. Why do I continue to bring this subject up? Only because Xilinx FPGA devices are getting better, and better.

 

It Has Been Six Years

 

Lots and lots of counting (of upsets): 150 nm for 172.21 Gigabit-Years (Gb-Yr), 130 nm for 98.88 Gb-Yr, 90 nm for 63.8 Gb-Yr, 65 nm for 41.77 Gb-Yr. To what end?

 

The original purpose was to find out what our soft error rate really was, in real life. Quickly, the program became the following: How do we improve this at the next technology node? How do we control alpha-emitting contamination? How do we modify our circuit design? What new features are needed by the customer?

 

What is the Result, So Far?

 

Configuration Memory (per JEDEC89A)

150 nm:  396 FIT/Mb

130 nm:  375 FIT/Mb

  90 nm:  240 FIT/Mb (V4)

  65 nm:  138 FIT/Mb (V5)

 

The goal was to keep the soft failure rate of the largest device in our product line constant, or even decrease it. How are we doing?

 

Configuration Memory Size of Largest Family Member

150 nm:  20.9 Mb

130 nm:  21.9 Mb

  90 nm:  40.8 Mb (V4)

  65 nm:  63.9 Mb (V5)

 

 

So, how is the overall soft failure rate doing for the largest part?

150 nm:  396 * 20.9 = 8,276

130 nm:  375 * 21.9 = 8,213

  90 nm:  240 * 40.8 = 9,792

  65 nm:  138 * 63.9 = 8,694

 

Roughly doubling the capabilities of the largest device each generation, and the failure rate for approximately ten times the capability from 150 nm to 65 nm, the result is only 5% higher.

 

One side note: We still have NO soft logic errors. Can’t say that about 65 nm ASIC or ASSP!

 

If you only needed the capabilities of the largest part in 150 nm, the same functionality part in 65 nm is far more reliable, as it has more robust and reliable bits, and even fewer bits.

 

But, Here Is the Sad Part

 

We are alone. All alone. No one else has this track record. No one else can show these results. No one has succeeded in even keeping the failure rate constant.  That is why it is so quiet: Be concerned, be very very concerned.

 

Match our published results against anyone. I really don’t care: ASIC, ASSP, uP, FPGA. You will find (after innumerable delays, NDA’s, and marketing ‘spin’) that Xilinx is alone in actually creating, selling, and supporting devices and proving that the soft error problems are actually getting less and less.

 

The newest devices offer built-in detection, and correction features for both configuration and user block RAM. There is a software tool to determine the de-rating factor for your design: Typically, only 1 in every 10 bit flips causes a functional failure (some designs have a de-rating as high as only 1 in 40).

 

The system architect may now take advantage of these new features to achieve the reliability and availability they require, by making straight-forward engineering choices: The resulting reliability and availability can all be predicted before the system hits the field.

 

Austin
Message Edited by austin.lesea on 11-04-2008 04:56 PM