02-15-2016 10:03 AM
We came across some weird behavior on a Kintex Ultrascale design last night and I'm looking for some insight.
Our design has several 10G Ethernet interfaces connected to network switches. This design is deployed to several units in the field and all of them have been stable for several months. We loaded a new bitstream to all of them and a day later one of the switches connected to one of the FPGAs started reporting that it was regularly receiving runt Ethernet packets on just one interface. We tried several things including resetting the switch, reprogramming the bitstream and reverting to older bitstreams. The behavior was always the same: runt packets from the same interface of just one FPGA. The other FPGAs with the same bitstream didn't show this behavior. We are not using a PROM and only programming/debugging over JTAG.
We finally cycled power on the faulty unit and tried loading a bitstream but the programming tool stalled. I was able to connect to the device using the Hardware Manager but I couldn't program the device (it stalled at 1% progress) and when I examined the system monitor all voltages were 0.0V and the temperature was -273 degC (probably reading zeros).
We did another cold reboot and everything returned to normal. We could program the device with the original bitstream, system monitor showed correct voltages and temperature, and the switch never saw any more runt packets.
I can't think of anything that could explain the behavior we saw. What in the FPGA would only get reset with a couple of cold reboots and not through regular JTAG programming? What conditions would allow me to connect to the device over JTAG but return all zeros from the system monitor? Any insight is greatly appreciated.
02-15-2016 11:35 AM
I can't really say how hot it was after the first powercycle when it wouldn't program because the system monitor was showing a temperature of absolute zero (and 0 volts for all power rails). Are you thinking that the chip wouldn't program if it was too hot after being powered up?
We were able to program it several times prior to the first powercycle when it would have been at it hottest. It sat idle (without a bitstream) for 10-15 minutes after the first powercycle while we tried several times (unsuccessfully) to program it before cycling power again. It was after this second powercycle that everything returned to normal.