cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
242 Views
Registered: ‎09-08-2020

SEM IP - Uncorrectable error after fault injection

Hello,

I'm currently doing a fault injection campaign using the SEM IP on a spartan-7 and artix-7.

My SEM IP is configured in "enhanced repair" mode.

I realized that some configuration bits are not correctable when I inject a fault on it (around 5% of injected bits among the essential bits of my design).

An example:

X7_SEM_V4_1
SC 01
FS 0B
ICAP OK
RDBK OK
INIT OK
SC 02
O> I
SC 00
I> N C000E7A7C3
SC 10
SC 00
I> O
SC 02
O>
SC 04
SED NG
PA 02000C02
LA 00000808
COR
END
FC 20
SC 08
FC 60
SC 00
I>

Some injections will also generate double error at detection leading to the same result.

I can provide to you the list of addresses of all the configuration bits that show this type of behaviour for my current device (xc7s50csga324).

My questions are the following:

How an injection of one error can result in an uncorrectable error?
Why the detected error is not at the same address as the injected error?
Is this phenomenon only due the fault injection or it can also be generated by an SEU?

As we have very little information about the link between configuration memory and the FPGA resource, any help to understand the origin of the problem or how to avoid it will be appreciated.

Thank you in advance.

Gaetan

 

 

0 Kudos
5 Replies
Highlighted
225 Views
Registered: ‎09-17-2018

A comment,

If you flip two non-adjacent bits in a frame, the enhanced repair will fail.

Some config ram bits affect more than one signal (bit).  The one that is most common is the GLUT_MASK bit for a CLB.  It changes readback and operation of a LUT to be a LUTRAM, or SRL so all bits in the LUT become '1' on readback (LUTRAM/SRL operation are on other CRAM bits as well).  If you really need to dig into this you should request support from your distributor FAE, or if you have one, your Xilinx FAE.

lowearthorbit

 

Highlighted
Visitor
Visitor
153 Views
Registered: ‎09-08-2020

Thank you very much for your answer!

I'm actually trying to understand why does the SEM IP get stuck so many times during my neutron beam tests... and eventually to see if there is any way to mitigate this phenomenon.

As there is a rather big amount of these single points of failure (according to my error injection campaign), they seem to be the main source of failure of the scrubbing system (more probable than multiple error in the same frame).

But the behavior can be very different from one bit to another:
-Double error in the same frame
-CRC error
-Single errors that are continuously corrected
-SEM IP stuck

SEUs on the GLUT_MASK bit seems to be a good explanation for CRC errors. Can we do something about it?
Do you have other examples of configuration bits that could cause the types of behaviors listed above?

Thank you again for your answer!

Gaetan

0 Kudos
Highlighted
148 Views
Registered: ‎09-17-2018

As I noted,

If you desire details, you will have to engage with your distributor, or direct Xilinx FAE.

You must have one amazing neutron source!  The SEM IP itself has a cross-section on the same order as the device SEFI, so basically, the probability the device itself completely goes nuts, stops altogether, or restarts by itself.  So, all together, if you worry about SEE, the SEM IP is useful to mitigate upsets in CRAM for a running design (recover from functional failures in your design).

If the basic device SEFI cross-section is too high for your needs, you have the wrong architecture for your system. You either need to go to a more robust device (UtraScale+ has much lower cross sections), or you need some form of redundancy (or both).

lowearthorbit

Highlighted
Visitor
Visitor
78 Views
Registered: ‎09-08-2020

This is a very interesting information!

It means that I probably misinterpreted the phenomena observed during my irradiation campaign. Not having been able to record the monitor output of the IP SEM, I considered that the persistent errors (no recovery before complete reconfiguration) in my design, which does not contain a feedback loop nor SRL/LUTRAM, came mostly from SEUs on the configuration memory which was no longer corrected by the IP SEM.

This is why I wanted to study these "blocking" phenomena of the memory scrubbing system. This interpretation must surely be wrong given the number of such events observed and almost no SEFI over the whole test campaign.

I'll have to study deeper the mechanisms that can lead to this kind of persistent error.

Unfortunately, as we are academic workers, we don't have any FAE to help us.


Thank you for your precious help!

Gaëtan

0 Kudos
Highlighted
73 Views
Registered: ‎09-17-2018

Study is encouraged,

Are you part of the Xilinx University Program?  If so, your professor who is registered in the program is able to get help.  While at Xilinx (for 20 years), I helped many students with their degrees.  The SEM IP was developed by myself an Ken Chapman for Virtex 4, where it was known as App Note 864.  It is now fully supported IP since Virtex 6.  It gets beam tested, and verified for proper operation in all modes for each new technology node.  It is a fundamental building block in safety critical systems, security, and space markets.  While only able to address type 2 blocks (configuration RAM), it is the first step in improving availability and reliability, and preventing fail-safe systems from failing unsafely.  I teach its use in my embedded class at UCSC Extension (among other things).

The longest beam test campaign was by CERN:  more than 6 months of data on a Artix series device.  Many hundreds of thousands of upsets, and nothing new was discovered.

lowearthorbit

0 Kudos