UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer mbianconi
Observer
11,039 Views
Registered: ‎06-07-2010

Failsafe FPGA update

In our Spartan-6 FPGA the fallback multiboot mechanism described in UG380 has been successfully implemented.

However, we have found a way to defeat it by writing a particular corrupt image file to the flash memory's multiboot section: during the subsequent power cycle, the FPGA configuration logic apparently hangs and the golden image is not loaded into the FPGA.

 

I have attached the image bin file in question, the SPI flash is N25Q64. Maybe somebody has an idea about what is wrong with it.

Note that I came across this situation almost by chance, all the other corrupt image files have led to the FPGA being loaded from the golden image.

 

The status register shown afterwards looks like this:

[0] CRC ERROR: 0
[1] IDCODE ERROR: 0
[2] DCM LOCK STATUS: 1
[3] GTS_CFG_B STATUS: 0 (correct = 1)
[4] GWE STATUS: 0 (correct = 1)
[5] GHIGH STATUS: 0 (correct = 1)
[6] DECRYPTION ERROR: 0
[7] DECRYPTOR ENABLE: 0
[8] HSWAPEN PIN: 1
[9] MODE PIN M0: 1
[10] MODE PIN M1: 0 (correct = 1)
[11] RESERVED: 0
[12] INIT_B PIN: 1
[13] DONE PIN: 0 (correct = 1)
[14] SUSPEND STATUS: 0
[15] FALLBACK STATUS: 0

 

Answer 58249 shows two solutions. However, I reckon they address only cases where the update process is disrupted by e.g. a power failure, so that the sync word will not be written (Method 1) or the quickboot switch will not be turned to "ON" (Method 2).

It turns out that our system is resilient to these pitfalls, so we needn't worry about them.

 

It seems to me that our case can be handled by extra FPGA logic checking data integrity (e.g. CRC) before updating the flash, and reacting accordingly (e.g. by erasing the sector containing the sync word) in case something is wrong with the file itself.

 

Has anybody a better idea?

The best thing would be to make the fallback dependent on the DONE status (which would work in our case) or on the detection of bad packets (which is not our case), wouldn't it?

 

 

 

0 Kudos
2 Replies
Xilinx Employee
Xilinx Employee
10,890 Views
Registered: ‎07-23-2012

Re: Failsafe FPGA update

Why is the mode pin setting incorrect here? Both watchdog timeout error and CRC error triggers fall back.

Your statement "The best thing would be to make the fallback dependent on the DONE status (which would work in our case) or on the detection of bad packets (which is not our case), wouldn't it?" is covered by CRC check as it compares the expected CRC value with the computed CRC value once the bitstream data transfer is complete. This is a final check before asserting DONE pin.
-----------------------------------------------------------------------------------------------
Please mark the post as "Accept as solution" if the information provided answers your query/resolves your issue.

Give Kudos to a post which you think is helpful.
0 Kudos
Observer mbianconi
Observer
10,758 Views
Registered: ‎06-07-2010

Re: Failsafe FPGA update

The state of the mode pins as reported by the impact command: "read device status" has stumped me as well.

When the FPGA boots up correctly, the mode pins are reported as M1=1 and M0=1 (that's why I stated M1=0 (correct=1), meaning "in the favorable case", sorry if I got you confused on that), whereas when neither the golden nor the multiboot image can be loaded, the mode pins are shown as M1=0 and M0=1, which is how they are connected in HW (please see attachment). Peculiar, isn't it?

I have tested the behaviour on three different boards (where we use different kinds of Spartan-6 FPGAs) and measured the voltage on the relevant nets just to rule out a potential HW mess-up.

I reckon the issue has been reported here:

https://forums.xilinx.com/t5/Spartan-Family-FPGAs/IiMpact-Read-Device-Status/m-p/318315/highlight/true#M20968

Unfortunately there is no answer to it. However, I feel that the problem with the mode pins has nothing to do with the fallback failure I have experienced.

 

After reading answer 58249 once more, I wonder why the CRC check should not trigger fallback.

In fact, the corrupt file has been generated by opening a valid image bin-file on a text editor and removing a couple of lines in the middle at random. The sync word watchdog error may not be triggered, since a valid sync word is there, but I can see no reason why the configuration logic should get stuck without going to the end of the file and checking the CRC, since no power outage takes place.

Moroeover, the golden image can be loaded if I use a different corrupt multiboot image file (like test.bin in the attachment), which proves that the golden image is itself OK.

 

At this stage, my questions are:

1. Can the "bad packet error" scenario described in AR# 58249 and AR# 38077 be down to a particular corrupt file as well as by a power outage?

2. Can the golden image be forced by an external physical signal (preferably, but not necessarily static, in any case "derived" from the status of a HW switch), even in a multiboot scenario (i.e. a design with a header specifying to boot first from an application image then from a golden image, if the first goes wrong)?

 

The goal is to avoid a component change in the field in case an FPGA update fails due to a corrupt file.

 

0 Kudos