cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
3,971 Views
Registered: ‎06-30-2011

Regarding SEU detetction and correction

Hi,

 

I have a small query regarding the SEU detection and correction followed in Viretx FPGAs.

 

I see that both Readback CRC and FRAME_ECC are used in Virtex FPGAs. Whiile FRAME_ECC is capable of detecting and correcting single biit errors in the configuration memory frames, why is the Readback CRC available along with it. 

 

My confusion is, as Readback CRC can only detect errors in the frames and ECC can both detect and correc, tehn why use Readback CRC.

 

Please guide me with your valuable input.

 

 

Thanks

0 Kudos
3 Replies
Highlighted
Scholar
Scholar
3,968 Views
Registered: ‎02-27-2008

The 32 bit CRC can tell if any bits, anywhere, are corrupted, up to 31 of them. Beyond 31 bits, the coverage is still very very very good.... So 100% (all) upsets, single, or multiple, are covered by the CRC. Many markets (businesses) require this level of knowing if anything is wrong (safety critical systems for example). It is a feature that is unavailable in any other technology (ASIC, ASSP): 100% assurance of proper operation is very important, and useful.

The FRAME_ECC can find, and fix, a bit upset in a frame. It can also detect a 2 bit error. Bit errors larger than 4 bits can be masked (i.e. they may not be detected by the 12 bit syndrome).

The SEU Monitor IP core uses both checks: the FRAME_ERROR to find and fix upsets (and also to notify the system there has been an upset found, and fixed) as well as finding and replacing up to an entire frame of bits (the essential bits feature with frame replacement option). In the event that the frame can not be found with the even number of bits in error (4, 6, 8, ....) the CRC is the final check to let the user know that after all was done, the correction either was not possible, or errors still persist that can not be found. That means it is time to reconfigure, and start over from the beginning.

These multiple levels of hardware, and software, provide the most powerful mitigation of soft errors or any product, anywhere, today.

This use of multiple codes (code within a code) is a classic communications and coding theory means (similar to the now famous "turbo codes") and this particular one is patented by Xilinx for use in FPGA devices.

Austin Lesea
Principal Engineer
Xilinx San Jose
Highlighted
Scholar
Scholar
3,967 Views
Registered: ‎02-27-2008

Probability of not detecting an even number of bit errors beyond four with a 12 bit hamming code is ~ 1 in 4096. This low probability is not low enough for many markets, hence the need for another, more powerful code to check the checkers.
Austin Lesea
Principal Engineer
Xilinx San Jose
0 Kudos
Highlighted
Visitor
Visitor
3,954 Views
Registered: ‎06-30-2011

Thank you very much Mr. Austin.

The explanation above has answered my confusion.

Now that I see, Xilinx has chosen to embed FRAME_ECC(i.e. hamming code technique) to be used in conjunction with Readback CRC, can it be assumed that there is no other ECC which can correct the error in 2,4 or more bits of a frame?

I would be eagerly looking forward for your input.

Thanks

0 Kudos