Sign In

Don't have a Xilinx account yet?

  • Choose to receive important news and product information
  • Gain access to special content
  • Personalize your web experience on Xilinx.com

Create Account

Username

Password

Forgot your password?
XClose Panel
Xilinx Home
Reply
Visitor
gabrielnazar
Posts: 6
Registered: ‎05-02-2012
0
Accepted Solution

Writing to one frame modifies other frames

Hi,

 

I am attempting to toggle randomly some configuration bits in a Virtex 5 device (XC5VLX110T), but I keep getting this strange problem. When I request a toggle on some specific frames, some of the following frames within the same column have several bit modified. I am using XAPP864 to perform the tests. Here's an example:

 

>r
FAddr = 018620
2AAA2A2A B1B17272 0A220A22 313100B3 0400A000 
15000000 20A00022 20A0000A 77FF020A BBBF5A5A 
A200A200 1F3F0C0C 00000002 00000000 00000000 
02000003 00000000 880C0000 0B0F0101 0101084C 
0000019D
00005457 000003CF 00001B1B 00000000 05550000 
00000000 00000000 00000000 00000000 00000000 
00000040 00000000 00000011 00000000 11041010 
00001188 00000000 00000000 00000404 00000000 

>t
FAddr = 01861f
Bit = 260

>
SBE 1D12 260 01861F D40

>
MBE 1D13 FFF 018620 004

>r
FAddr = 018620
2AAA2A2A B1B17272 0A220A22 313100B3 0400A000 
15000000 20A00022 20A0000A 77FF020A BBBF5A5A 
A200A200 1F3F0C0C 00000002 00000000 00000000 
02000003 00000000 880C0000 0B0F0101 01011919 
0000019D
00005457 000003CF 00001B1B 00000000 05550000 
00000000 00000000 00000000 00000000 00000000 
00000040 00000000 00000011 00000000 11041010 
00001188 00000000 00000000 00000404 00000000

 

The bit which I toggled is in a frame of row 3, top half (which is the top row of this device). Also, the minor address is 31, so this is not an interconnect frame. And, most importantly, the SEU controller is placed close to the ICAP (somewhere around the center of the device), through RANGE constraints, far from the bit being toggled. I have also inspected the design with FPGA editor to make sure that no net of the SEU goes anywhere near the top row. Therefore, I expect that this bit is not affecting the controller itself.

When I request the toggle, the controller reports both a single bit error in frame 01861F (which is expected) and a multiple bit error in frame 018620, the following one. Note that the last word of the fourth line had several bits changed, according to the readings performed before and after the toggle.

This phenomenon occurs only if that frame is occupied. If I force the circuit in that region to be somewhere else, I can toggle that bit without problems, but the same problem will happen on the other region where the circuit was placed.


The occupation of the frame, according to the .ll file is:

 

Bit 9762595 0x0001861f 3 Block=SLICE_X20Y140 Latch=AQ Net=EI_instr<4>
Bit 9762620 0x0001861f 28 Block=SLICE_X20Y140 Latch=BQ Net=EI_instr<5>
Bit 9762632 0x0001861f 40 Block=SLICE_X20Y140 Latch=CQ Net=EI_instr<6>
Bit 9762654 0x0001861f 62 Block=SLICE_X20Y140 Latch=DQ Net=EI_instr<7>
Bit 9762658 0x0001861f 66 Block=SLICE_X21Y141 Latch=AQ Net=DI_level<1>
Bit 9762659 0x0001861f 67 Block=SLICE_X20Y141 Latch=AQ Net=EI_instr<16>
Bit 9762684 0x0001861f 92 Block=SLICE_X20Y141 Latch=BQ Net=EI_instr<17>
Bit 9762695 0x0001861f 103 Block=SLICE_X21Y141 Latch=CQ Net=DI_level<0>
Bit 9762696 0x0001861f 104 Block=SLICE_X20Y141 Latch=CQ Net=EI_instr<18>
Bit 9762718 0x0001861f 126 Block=SLICE_X20Y141 Latch=DQ Net=EI_instr<19>
Bit 9762748 0x0001861f 156 Block=SLICE_X20Y142 Latch=BQ Net=DI_sp_mem<0>
Bit 9762759 0x0001861f 167 Block=SLICE_X21Y142 Latch=CQ Net=DI_sp_mem<2>
Bit 9762782 0x0001861f 190 Block=SLICE_X20Y142 Latch=DQ Net=DI_sp_mem<1>
Bit 9762787 0x0001861f 195 Block=SLICE_X20Y143 Latch=AQ Net=registres_2_0
Bit 9762812 0x0001861f 220 Block=SLICE_X20Y143 Latch=BQ Net=registres_2_1
Bit 9762824 0x0001861f 232 Block=SLICE_X20Y143 Latch=CQ Net=registres_2_2
Bit 9762846 0x0001861f 254 Block=SLICE_X20Y143 Latch=DQ Net=registres_2_3
Bit 9762914 0x0001861f 322 Block=SLICE_X21Y145 Latch=AQ Net=registres_22_0
Bit 9762951 0x0001861f 359 Block=SLICE_X21Y145 Latch=CQ Net=registres_22_2
Bit 9762973 0x0001861f 381 Block=SLICE_X21Y145 Latch=DQ Net=registres_22_3
Bit 9762978 0x0001861f 386 Block=SLICE_X21Y146 Latch=AQ Net=registres_10_0
Bit 9762979 0x0001861f 387 Block=SLICE_X20Y146 Latch=AQ Net=registres_18_0
Bit 9763004 0x0001861f 412 Block=SLICE_X20Y146 Latch=BQ Net=registres_18_1
Bit 9763015 0x0001861f 423 Block=SLICE_X21Y146 Latch=CQ Net=registres_10_2
Bit 9763016 0x0001861f 424 Block=SLICE_X20Y146 Latch=CQ Net=registres_18_2
Bit 9763037 0x0001861f 445 Block=SLICE_X21Y146 Latch=DQ Net=registres_10_3
Bit 9763038 0x0001861f 446 Block=SLICE_X20Y146 Latch=DQ Net=registres_18_3
Bit 9763042 0x0001861f 450 Block=SLICE_X21Y147 Latch=AQ Net=DI_op2<19>
Bit 9763043 0x0001861f 451 Block=SLICE_X20Y147 Latch=AQ Net=registres_11_16
Bit 9763068 0x0001861f 476 Block=SLICE_X20Y147 Latch=BQ Net=registres_11_17
Bit 9763079 0x0001861f 487 Block=SLICE_X21Y147 Latch=CQ Net=DI_op2<21>
Bit 9763080 0x0001861f 488 Block=SLICE_X20Y147 Latch=CQ Net=registres_11_18
Bit 9763102 0x0001861f 510 Block=SLICE_X20Y147 Latch=DQ Net=registres_11_19
Bit 9763106 0x0001861f 514 Block=SLICE_X21Y148 Latch=AQ Net=registres_3_16
Bit 9763107 0x0001861f 515 Block=SLICE_X20Y148 Latch=AQ Net=registres_19_16
Bit 9763132 0x0001861f 540 Block=SLICE_X20Y148 Latch=BQ Net=registres_19_17
Bit 9763143 0x0001861f 551 Block=SLICE_X21Y148 Latch=CQ Net=registres_3_18
Bit 9763144 0x0001861f 552 Block=SLICE_X20Y148 Latch=CQ Net=registres_19_18
Bit 9763165 0x0001861f 573 Block=SLICE_X21Y148 Latch=DQ Net=registres_3_19
Bit 9763166 0x0001861f 574 Block=SLICE_X20Y148 Latch=DQ Net=registres_19_19
Bit 9763171 0x0001861f 579 Block=SLICE_X20Y149 Latch=AQ Net=registres_23_16
Bit 9763196 0x0001861f 604 Block=SLICE_X20Y149 Latch=BQ Net=registres_23_17
Bit 9763208 0x0001861f 616 Block=SLICE_X20Y149 Latch=CQ Net=registres_23_18
Bit 9763230 0x0001861f 638 Block=SLICE_X20Y149 Latch=DQ Net=registres_23_19
Bit 9763330 0x0001861f 738 Block=SLICE_X21Y151 Latch=AQ Net=registres_7_0
Bit 9763367 0x0001861f 775 Block=SLICE_X21Y151 Latch=CQ Net=registres_7_2
Bit 9763389 0x0001861f 797 Block=SLICE_X21Y151 Latch=DQ Net=registres_7_3
Bit 9763586 0x0001861f 994 Block=SLICE_X21Y155 Latch=AQ Net=registres_16_24
Bit 9763623 0x0001861f 1031 Block=SLICE_X21Y155 Latch=CQ Net=registres_16_26
Bit 9763645 0x0001861f 1053 Block=SLICE_X21Y155 Latch=DQ Net=registres_16_27
Bit 9763651 0x0001861f 1059 Block=SLICE_X20Y156 Latch=AQ Net=registres_4_24
Bit 9763676 0x0001861f 1084 Block=SLICE_X20Y156 Latch=BQ Net=registres_4_25
Bit 9763688 0x0001861f 1096 Block=SLICE_X20Y156 Latch=CQ Net=registres_4_26
Bit 9763710 0x0001861f 1118 Block=SLICE_X20Y156 Latch=DQ Net=registres_4_27
Bit 9763778 0x0001861f 1186 Block=SLICE_X21Y158 Latch=AQ Net=registres_19_4
Bit 9763779 0x0001861f 1187 Block=SLICE_X20Y158 Latch=AQ Net=registres_8_24
Bit 9763804 0x0001861f 1212 Block=SLICE_X20Y158 Latch=BQ Net=registres_8_25
Bit 9763815 0x0001861f 1223 Block=SLICE_X21Y158 Latch=CQ Net=registres_19_6
Bit 9763816 0x0001861f 1224 Block=SLICE_X20Y158 Latch=CQ Net=registres_8_26
Bit 9763837 0x0001861f 1245 Block=SLICE_X21Y158 Latch=DQ Net=registres_19_7
Bit 9763838 0x0001861f 1246 Block=SLICE_X20Y158 Latch=DQ Net=registres_8_27
Bit 9763842 0x0001861f 1250 Block=SLICE_X21Y159 Latch=AQ Net=registres_3_4
Bit 9763843 0x0001861f 1251 Block=SLICE_X20Y159 Latch=AQ Net=registres_11_4
Bit 9763868 0x0001861f 1276 Block=SLICE_X20Y159 Latch=BQ Net=registres_11_5
Bit 9763879 0x0001861f 1287 Block=SLICE_X21Y159 Latch=CQ Net=registres_3_6
Bit 9763880 0x0001861f 1288 Block=SLICE_X20Y159 Latch=CQ Net=registres_11_6
Bit 9763901 0x0001861f 1309 Block=SLICE_X21Y159 Latch=DQ Net=registres_3_7
Bit 9763902 0x0001861f 1310 Block=SLICE_X20Y159 Latch=DQ Net=registres_11_7

 

These are some registers of the pipeline and some of the register file of a MIPS-compatible softcore processor.

I have also created my own controller to interact with ICAP and replace XAPP864 and possibly avoid this problem, but the same thing happened with the new design. This reinforces the hypothesis that the bit is not affecting the controller, since a completely different design presented the same problem.

 

So, is there a reasonable explanation to why this is happening?

 

Regards,

Gabriel Nazar

Xilinx Employee
austin
Posts: 3,625
Registered: ‎02-27-2008
0

Re: Writing to one frame modifies other frames

Yes,

 

Imentioned it in my reply to your other question.  Some frames have incomplete decoding (the same frame maps to more than one address).

 

We know what we are doing, so we don't care to decode addresses completely.....

 

 

Austin Lesea
Principal Engineer
Xilinx San Jose
Visitor
gabrielnazar
Posts: 6
Registered: ‎05-02-2012
0

Re: Writing to one frame modifies other frames

Austin,

 

Again, thank you for replying. Does this mean that writes to bit 0x260 in frame 0x01861F will always map also to bits in frame 0x018620? But why doesn't this happen when the frame is not occupied? The example below was done also using XAPP864 with a design that does not use frame 01861F.

 

Note that in this case, the writing maps exclusively to frame 01861F, leaving the following frame untouched. It seems to me that what happened on my earlier post is related to the frame being in use and not only to how addresses are mapped to the actual memory cells.

 

Also, note that the SBE is corrected by XAPP864, but the MBE remained, meaning that there are actually separate cells and that writing on one will only write on the others under specific circumstances.

 

Thank you very much for your help.

 

Regards,

Gabriel

 

>r
FAddr = 018620


00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 
00000000

00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 


>t
FAddr = 01861f
Bit = 260

>
SBE 1D12 260 01861F D40

>r
FAddr = 018620


00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 
00000000

00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000

Xilinx Employee
austin
Posts: 3,625
Registered: ‎02-27-2008
0

Re: Writing to one frame modifies other frames

OK,

 

That is puzzling.


I am aware of multiply mapped frames (incomplete address decoding).

 

I am aware of bits that do not exist at all (and xapp864 will not find, fix, nor flip these, and the crc ignores them as well).

 

LUTRAM and SRL are masked so that the FRAME_ECC and CRC all work (do not detect a dynamic change in SRL nor in LUTRAM).

 

A 1, 3, 5, 7 etc. bit error in a frame appears as a 1 bit error, and after correction it will be apparent that it did not correct (if the syndrome is pointing to a valid location, otherwise you know immediately that it is uncorrectable).


A 2, 4, 6, 8 etc. bit flip also will appear as a 2 bit error, and marked uncorrectable.  The SEU Monitor IP knows that it has been unsuccessful if it tries to correct, and the CRC is still wrong (32 bit CRC catches 100% of errors or 31 bits or less, and 1/2^32 probability if mis-identfying an error (not finding the error) with 32 or more errors.

 

The largest multiple bit upset (but extremely rare) in V5 is ~7 bits (less than the probability the device itself has a hardware failure).

 

There are bits that tell the LUT that it is now a SRL, so if you flip that bit, you may suddenly get 32 bits in error (as you shifted a LUT by one as a shift register).


There are bits that control the BRAM, that if you flip one bit, you read back every bit as its inverse, or you substitute a spare column for an existing column, causing 256 errors.

 

In any multiple error case, the data pattern in the BRAM, LUT, may cause less than 32 errors, or 256 errors to appear.  If the unused column, and the replacement column are both all zeros, you see no errors, other than the control bit itself being wrong.

 

It is a very complex behavior, with a myriad of details.


What is it you are trying to do?  Why?  What do you want to know, versus what problem are you trying to solve?  Xilinx supports the commercial use of our parts, but we do not have the time nor the resources to answer every academic question, and we have no interest in helping anyone reverse engineer the part.

 

If you have a commercial application, and you are doing a fault analysis, email me directly with your good bitstream .bin file, and the readback .bin file with the error, and I will analyze them, and tell you exactly what is different, and what that bit (those bits) did exactly (I perform this service two or three times a year for those who absolutely need to know what happened).

 

Such an analysis may aid the designer to change the design so that when failures occur, the system fails safely, or fails in a recognizable manner.  Ultimately, just using the CRC error indication (enter its occurence into the log), or finding and fixing transient SEU, is sufficient for 95% or users to easily achieve their reliability and availability goals.  In the cases where more serious mitigation methods are needed, we will describe how to get to whatever levels of reliability and availability are required.  This is unique to Xilinx FPGA technology, and can not be matched by any other technology (or supplier) whatsoever.

 

If you are learning, then please continue to learn.  And I will respond as I can with what I am aware of.  Your description sounds a bit odd, and other than what I have already said, I don't know what is happening...

 

 

 

Austin Lesea
Principal Engineer
Xilinx San Jose
Visitor
gabrielnazar
Posts: 6
Registered: ‎05-02-2012
0

Re: Writing to one frame modifies other frames

Austin,

 

First of all, thank you for taking the time to explain these details to me. It's nice to see how dedicated you are to the user community forums.

 

Now regarding the problem at hand. I am not developing a commercial application, I am a PhD candidate. I am performing fault injection on some designs and I got this problem where I inject one fault and several faults appear.

The first thing that came to mind was that the fault injector was corrupting itself and messing around with other frames, but I have eliminated this possibility by inspecting the routed design with FPGA Editor.

I have noticed that there seems to be a pattern on how this unexpected behavior appears. It is always related to bits on used frames of CLB columns, with minor address 31. On the following frames of that column, some bits are flipped near the position of the originally toggled bit (0x260 in the example). Furthermore, it only happens when the user flip-flops are in use at that specific area.

You mentioned that this could be related to changing a LUT into a SRL or modifying BRAM contents. As these cells store user content, wouldn't they be ignored by the scanning performed by XAPP864? Also, the block type on the modified frame is 000, so it is certainly not BRAM content, though I guess it could be a LUT behaving as a shifter (although the pattern of the changes in the others frames does not match a shift).

 

Finally, since I am trying to emulate the effects of radiation on an FPGA, perhaps the most important question is: if this first bitflip were caused by a radiation-induced SEU, would the other bits in the other frames flip as well or does this happen strictly because the bit is being flipped via ICAP? As far as I understood your previous answer, I am thinking that the other bit would flip along, right?

 

Regards,

Gabriel

Xilinx Employee
austin
Posts: 3,625
Registered: ‎02-27-2008
0

Re: Writing to one frame modifies other frames

Gabriel,


Radiation, or XAPP864, they will do the same thing.


I suggest you do a verify, copy image.bin  the good image to good.bin, and then create the bad condition, and do another verify, and  copy the impact.bin file to bad.bin, and email them to me at austin@xilinx.com.


I am presuming this is a XUPV5 board, whiich I use, so I don't need anything else.  I will then get back to you with exactly what our tools (internal database of the schematics, etc.) say is happening.


I am happy to further that sort of research, as it helps all of the Xilinx customers understand some of the "unusual" things that may occur.

 

 

Austin Lesea
Principal Engineer
Xilinx San Jose
Visitor
gabrielnazar
Posts: 6
Registered: ‎05-02-2012
0

Re: Writing to one frame modifies other frames

Austin kindly examined the binary files I sent him and it turns out that one of the possibilities he had raised was true. Indeed the bit flip was turning a regular logic-implementing LUT into a shift register, which naturally corrupts its configuration bits. As these bits are located in other frames, those frames presented MBEs.

I guess the problem didn't show up when the configuration was empty because shifting a bunch of 0's yields the same bunch of 0's.

 

So, problem solved!

 

Gabriel Nazar

Xilinx Employee
austin
Posts: 3,625
Registered: ‎02-27-2008
0

Re: Writing to one frame modifies other frames

Gabriel,

 

Thank you for sending me the files.  It is great that folks experiment, and find "strange" behaviors, and wish to understand them better.  The good news is that this is one of those "strange" behaviors we already understand.

 

The basic premise in commercial products is that if the soft reliability can be brought to the same level as the hardware failure reliability, we are "done" (it makes no sense to be any better than the hardware probability of failure).  If a system can not tolerate a hardware failure, then the system must have redundancy, which implies that the system gets architected quite differently (active-standby, 1:N, hot spare, etc...).  Soft errors are not your problem.

 

So, one can not get too excited about a "really bad bit" as there are so few of these.  But, if you really want to continue operating without stopping, and fix the errors, the new SEU Monitor IP does allow replacement of entire frames, which means that even these "bad bits" and the resultant bit flips may be automatically corrected.

 

 

Austin Lesea
Principal Engineer
Xilinx San Jose