I think everyone has heard of the “bathtub” curve used by reliability engineers to explain failures in electronics.

 

http://en.wikipedia.org/wiki/Bathtub_curve

 

Generally, there is an early phase, where the failures start out high, and then decrease; followed by a relatively low failure rate period lasting perhaps ten to twenty years; finally followed by a rising failure rate.

 

The early phase consists of latent defects which are not caught by testing, or even by burning in the components under accelerated conditions for a few tens of hours. The middle phase is dominated by random failures. The final phase, also called “wear out”, is due to the stresses of operation finally causing something critical to fail.

 

Design for Reliability

In addition to design for manufacturability, there is the need to design for reliability. If a copper interconnect wire is estimated to carry 1 milliampere for 15 years at 125 degrees C, then you might decide this is acceptable. Or, you might decide that, since the integrated circuit is for a laptop computer, a three-year life at 85 degrees C is fine, and so you use thinner and narrower traces in your layout.

 

Xilinx has objectives for the commercial, industrial, and military classifications of our products. These environments are defined by their maximum junction temperatures when it comes to reliability:  85 for commercial, 100 for industrial, and 125 degrees centigrade for military.

 

The goal is to have 15, 10, and 3 years of life at these three environments at the most extreme temperatures, and most extreme voltages. A failure is defined as when a population has more than a 0.1% chance of failure, or when you begin to see the failure rate increase to one in 1,000.

 

Prior to the “wear-out” phase, the reliability might be as good as 10 failures per billion hours, or 10 FIT.

 

The quarterly quality report details all this for each product after the fact:  Did the design meet its goals, and by how much is it exceeding its goals?

 

http://www.xilinx.com/support/documentation/user_guides/ug116.pdf

Note the results on page 18.

 

Wearing Out

Note that Xilinx® FPGA devices are not sacrificing operating life for performance.  A commercial microprocessor for a laptop computer, or a desktop computer might have a three- or five-year “lifetime”. As our components appear ubiquitously in the infrastructure (wired/wireless/networking), we do not have that option to design for performance while sacrificing the reliability over time.

 

The Mars rovers may have lost wheel motors, and have all kinds of mechanical problems in their old age, but their Virtex®1000 FPGA devices are still working just fine controlling those cranky motors and servos. It may be cold on Mars, but with the thin atmosphere, and buried inside the rovers, these devices are probably getting pretty hot when the sun is directly overhead.

 

http://en.wikipedia.org/wiki/Mars_Exploration_Rover

 

The system is designed to keep the electronics ambient from reaching 40 degrees C, which means the junction temperatures are probably not more than ten degrees C warmer, or only 50 degrees C.

 

Staying Cool

So, in addition to staying cool, reliability is also affected by the maximum voltage, so keeping your 1.00 volt supply at 1.00 volt is the best. Going to 1.05 volts may not seem like much of a difference, but that 5% is not recommended as not only is life shortened by a small amount, but operation is guaranteed only over a range of +/-5% for the critical core devices. We test to be sure it will operate over a wider range, but characterization concentrates on getting the best performance and lifetime in this narrow range of +/-5% on the voltage supplies.

 

Austin Lesea