UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor cindyott
Visitor
297 Views
Registered: ‎02-12-2019

Kintex 7 spontaneous reconfiguration as temperatures rise

Jump to solution

Hello -

I am trying to operate a Kintex 7 industrial-termperature part near its maximum operating temperature.

To do that, I need to know the junction temperature of the device. I am therefore

using Vivado and sysmon to monitor the current temperature and VCC values.

This procedure works fine at low operating temperatures. The chip has a heat sink

on top of it, and at room temperature (20C), the heat sink reads 39C and the chip's

current temperature reads 38.74C. (VCCInt = 1.017-1.020, VCCAux = 1.183).

We scanned the FPGA  installed on this board using the XilinxGO App to verify that

this was an industrial part.

 

I then heat the back of the PCB, through a steel plate and a heat pad, using a heat gun,

and read the current temperature and voltages about once a second. The JTAG pod

is at room temperature. According to sysmon, the voltages stay constant

(VCCInt = 1.017-1.022, VCCAux = 1.183). The temperature rises

gradually (about 1C per minute) until about 85C, at which point either Vivado says it lost contact

with the board, or the temperature reading becomes erratic. Our board stops operating

properly at the same time.

This sometimes happens several times within a minute (maybe once each ten seconds), especially

if we had been heating the board quickly in order to get to a high temperature, and

apparently until the internal temperature of the FPGA stops rising, at which point the FPGA stabilizes

and starts operating again at a higher temperature than before we started heating.

I have never seen this behaviour as the FPGA cools, but I can

cause it to happen at almost any temperature above about 60C (as reported by Vivado)

as I heat the board. Once I get the chip up to 89C, I cannot raise the temperature further

without this behavior happening, no matter how slowly I try to raise the temperature.

 

All the voltage regulators on the part are spec'ed up to 125C junction temperature.

 

We find that this temperatue-sensitive behavior is because signals INIT_B and Done

out of the chip both go low. This is part of a reboot (a reconfiguration) - this chip starts operating again,

but it has returned to its default state after power-up (as expected from a reboot),

which is not good in our system.

 

To locate the source of this loss of Done, we disconnected the JTAG cable (so JTAG could not cause a reboot),

made a version of the FPGA program which did not include ICAP (which might have caused

reboot if it was misbehaving), and verified using a logic analyser that the /Program input to

the FPGA was solidly high. None of this mattered - the FPGA still deasserts Done then reconfigures as we raise the

temperature. So the FPGA itself is causing the reboot. (The FPGA decides on its own to

reconfigure itself.)

 

Before a reboot starts, the votages VCCInt and VCCAux are always in range, according to Sysmon, 1.019-1.020

for VCCInt, and 1.819-1.809 for VCCAux. During a recovery, the temperature

readings are erratic, varying by +-2 degrees between readings,

and the VCCAux reads higher (1.846) while the temperature reads low.

Could these VCCAux readings be accurate, and could that be causing

the unrealistic temperature readings which we are seeing? Could it be

related in any way to the first reboot, where we saw nothing strange in VCCAux

before a simultaneous rise in VCCAux and fall in the reported temperature?

In other words, is the reported change in VCCAux a cause or an effect?

 

In summary, it seems like the reboot is happening because the temperature is rising (not because

it is at a specific value). How could this happen? (I see that there might be continous CRC

checks, but that kind of failure ought to leave Done high.) Please suggest what the FPGA

might be sensitive to which is causing this reboot.

 

Thanks.

Cindy Ott

Design Engineer

Picture Elements, Inc, for Ametek, Inc.

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Explorer
Explorer
235 Views
Registered: ‎09-17-2018

Re: Kintex 7 spontaneous reconfiguration as temperatures rise

Jump to solution

c,

As die temperature increases, the current goes up exponetially.  I suspect your power suuply(ies) foldback, tripping POR.  Large devices will easily thermally run away in your test.  Use a thermal forcing system (regulates by forcing heat or cold).  Even a temperature oven is inandequate to control and evalulate a design.  Clearly your heatsinking and power solution is not preventig thermal runaway, overcurrent shutdown.

Operating near 100 C die temperature means static current is quite large (see that in power estimator spreadsheet by setting to 100 C, worst case process, your heatsink and airflow, adding logic until die is at 100 C).  You will then see how much Iccint you need, airflow need, etc.  If it does work there, it will never work in reality.  First use the tools provided, then create that design before you test...

l.e.o.

Tags (1)
3 Replies
240 Views
Registered: ‎01-22-2015

Re: Kintex 7 spontaneous reconfiguration as temperatures rise

Jump to solution

Hi Cindy,

You’ve done an awesome job of documenting a very interesting problem.  -although, after all your struggles, I suspect you are using other adjectives beside “interesting” 😊

I have limited experience with what you are doing.  However, here are a few thoughts.

     I see that there might be continous CRC checks, but that kind of failure ought to leave Done high
Please verify whether or not “continuous CRC checks” are being done.  I assume you are referring to “Readback CRC” described in Chapter 8 of UG470.  If so, then Readback CRC can cause some of the things you are seeing (eg. pull INIT_B low).  It might be helpful to review the POST_CRC constraints you are using (ref UG912 pages 290-298).  Also, can you turn off Readback CRC:

set_property POST_CRC DISABLE [current_design];

and still get meaningful results from your tests?  Even if you can’t, try it anyway – and see what occurs.  Perhaps there is some interaction between Readback CRC and the JTAG-SYSMON you are using to view the FPGA tempertures and voltages.

Cheers,
Mark

Highlighted
Explorer
Explorer
236 Views
Registered: ‎09-17-2018

Re: Kintex 7 spontaneous reconfiguration as temperatures rise

Jump to solution

c,

As die temperature increases, the current goes up exponetially.  I suspect your power suuply(ies) foldback, tripping POR.  Large devices will easily thermally run away in your test.  Use a thermal forcing system (regulates by forcing heat or cold).  Even a temperature oven is inandequate to control and evalulate a design.  Clearly your heatsinking and power solution is not preventig thermal runaway, overcurrent shutdown.

Operating near 100 C die temperature means static current is quite large (see that in power estimator spreadsheet by setting to 100 C, worst case process, your heatsink and airflow, adding logic until die is at 100 C).  You will then see how much Iccint you need, airflow need, etc.  If it does work there, it will never work in reality.  First use the tools provided, then create that design before you test...

l.e.o.

Tags (1)
Visitor cindyott
Visitor
185 Views
Registered: ‎02-12-2019

Re: Kintex 7 spontaneous reconfiguration as temperatures rise

Jump to solution
Thank you for the insight. We checked our power
inputs using a scope, and the VCCAux does indeed
drop shortly before we loose Done. It looks like it is probably
thermal overload in the regulator.