UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Visitor mfaero
Visitor
316 Views
Registered: ‎05-30-2019

Xilinx MicroBlaze TMR Fault Injection Help

Jump to solution

I am trying to get working the example design referenced in the only answer to the following question:

https://forums.xilinx.com/t5/Embedded-Processor-System-Design/TMR-subsystem-comparator-test/td-p/912178

 

Unfortunately, I do not have a KC705 board handy to be able to test the provided example directly. Instead, the design was re-targeted to a Nexys Video board. All that really needed to be changed was the polarity of the reset signal and the location of the LED pins. I also added an ILA debug core to monitor the break signals and other various signals.

In XSDK, I copied the code into a new project on top of the hardware design and made sure that the optimization level was set to -O1 in the build settings, as prescribed in the comments of the provided code. I also added a break handler function using __attribute__((break_handler)) that saves the registers and generates a "suspend" signal, as instructed by PG268 p.11, which describes the MicroBlaze TMR subsystem.

When the error is injected, it is immediately detected by the TMR Manager. The Comparator output reflects the error, and on the next cycle, and the TMR Manager's "status" signal changes to 0x1403, indicating a fault in processor 1. This all works as expected.

The trouble is, the break signal output is never generated by the TMR Manager. When the MicroBlaze is re-programmed, the break handler runs for some reason (which I can tell because the "suspend" signal goes high, something that only happens in the break handler I wrote), but the ILA debug cores never show the break signal going high. In another, very similar project, the break signal does eventually go high, but it takes over 40 seconds after error detection for it to happen. This is an unacceptable delay for my use case.

So my questions are as follows:

1. How is the MicroBlaze TMR subsystem supposed to work here? It's my understanding from PG268 that the break signal should be generated immediately. Why isn't it? Is this some design choice or is something not working the way it's supposed to be?

2. Why is the break handler running when I reprogram the device? Shouldn't the code start from the "reset" vector, not the break signal vector?

3. For some reason, the ELF file shows the break handler vector existing at address 0x10, but according to the MicroBlaze Processor Reference Guide (UG984), the break handler vector should be located at 0x18. Why is this in the wrong place? __attribute__((break_handler)), available through the MicroBlaze compiler, should be placing this break handler into the right location.

My team and I have been struggling to get this system to work properly for over a month now. Any help would be very much appreciated as we don't want to give up on using this IP.

0 Kudos
1 Solution

Accepted Solutions
Visitor mfaero
Visitor
255 Views
Registered: ‎05-30-2019

Re: Xilinx MicroBlaze TMR Fault Injection Help

Jump to solution

My team has discovered the answers to these questions. For the sake of anyone else getting started with the TMR IP suite, I am going to update this post with some details and resources.

1. How is the MicroBlaze TMR subsystem supposed to work here? It's my understanding from PG268 that the break signal should be generated immediately. Why isn't it? Is this some design choice or is something not working the way it's supposed to be?

     A: In our case, the break handler was not operating as expected because of a couple of configuration parameters in the TMR Manager IP:  C_BRK_DELAY_WIDTH and C_BRK_DELAY_RST_VALUE. If C_BRK_DELAY_WIDTH > 0, the TMR Manager will have the ability to delay the break signal by the amount of clock signals in the Break Delay Initialization Register (BDIR), whose reset value is determined by C_BRK_DELAY_RST_VALUE. If a delay is never desired, set C_BRK_DELAY_WIDTH to 0. If a delay is desired, set C_BRK_DELAY_WIDTH to an appropriate number of bits and either (i) set C_BRK_DELAY_RST_VALUE to the desired delay, or (ii) write the value into the TMR's BDIR at runtime and upon resets.

     One minor issue with the way this register is described in PG268 (p. 46) is that this does not seem to only apply to fatal signals as implied. I injected an error in a single processor (which is not a fatal error), and this value still determined the delay of the break signal being generated by the TMR Manager.

     An additional caveat here is to ensure that the Recover is Reset bit in the TMR Manager's Control Register (CR) is set to 1. Otherwise, the break signal won't be generated at all after the Microblaze issues its "suspend" instruction.

2. Why is the break handler running when I reprogram the device? Shouldn't the code start from the "reset" vector, not the break signal vector?

     A: This one, I don't have a good answer for. It may have just been a bug. However, I'll take the opportunity here to note that you can change the code at the reset vector (as encouraged by the "Recovery of the MicroBlaze Subsystem" instructions in PG268, p.11) by overriding the default startup files.

     This process is described in UG1043, the Embedded System Tools Reference Manual, pp. 39-43. It will tell you where to find the assembly files that need to be added to your project and modified. One thing those instructions are unclear about is exactly where the object files are that you need to pull in. It lists crti.o, crtbegin.o, crtend.o, and crtn.o files as necessarily files, but doesn't give you any additional hints. A search will show many different copies of these files, which can be narrowed down by architecture, but given the configurability of MicroBlaze, it may be unclear which versions you need. My tip is to go to Project → Properties → C/C++ Build → Settings → MicroBlaze gcc linker and add "-v" at the end of the "command" entry. Then, when your project is built, you'll see the path to the object files being used for your projects. Copy those into your project as well. At that point, you can create a custom start function that checks the TMR Manager's FFR and performs whatever recovery steps you wish.

3. For some reason, the ELF file shows the break handler vector existing at address 0x10, but according to the MicroBlaze Processor Reference Guide (UG984), the break handler vector should be located at 0x18. Why is this in the wrong place? __attribute__((break_handler)), available through the MicroBlaze compiler, should be placing this break handler into the right location.

     A: This is a really tricky one. I don't know why __attribute__((break_handler)) puts the code in the wrong place, but it definitely does.This is something that really needs to be fixed. I don't know if any Xilinx employees will see this (as this post has been entirely ignored so far), but this is one I'd really like to know the answer to. The workaround is fairly similar to the process described in the answer to question 2 above. Once you've pulled the startup files in and informed the linker not to use the standard startup files, you can manually place the break handler where it's supposed to go. You'll likely need to add the following to the linker script:     

     .vectors.break 0x18 : {
          KEEP (*(.vectors.break))
     }

     Then, add the following to your crt0.S (or equivalent) startup file (assuming you've called your break handler "_break_handler":

_vector_interrupt:
    brai _interrupt_handler

    .section .vectors.break, "ax"  // Lines above are for context
    .align 2

_vector_break:
    brai _break_handler  // Lines below are for context

    .section .vectors.hw_exception, "ax"
    .align 2

     Then, your break handler will actually be called when the MicroBlaze receives a break signal.

     If you don't want the break handler to be put into the interrupt handler section as well (which is where __attribute__((break_handler)) puts it), you'll need to remove the attribute. In this case, you'll need to add the following epilogue to your code:

// return from break
    asm volatile("lwi\tr15, r1, 0\n"
    			 "rtbd\tr16, 8\n"
    			 "addik\tr1, r1, 32\n"); // NOTE: This number will need to be changed based on the automatically-generated
    									 // preamble for this function. If more local variables are added, do the following:
    									 //		1. Compile the program.
    									 //		2. Locate this function in the ELF file (hint: check location 0x18, i.e., the break vector, for the address)
    									 //		3. Check the preamble for the instruction that increments the stack (r1 holds the stack pointer)
    									 //		4. Modify the last argument of this addik instruction to be the number from the instruction in step 3.
    									 //		5. Re-compile the program.

     Obviously this is pretty fragile considering that the value to be added to the stack pointer will change if local variables are added to this function, but I don't know how to make the break handler attribute stop putting the break handler in the right place. Again, input from a Xilinx employee would be very useful here.

 

I hope this can help somebody else. The TMR IP seems to have a pretty large learning curve, so hopefully this can help alleviate some of that.

Resources (current at the time of this post):

PG268 (MicroBlaze Triple Modular Redundancy IP Product Guide) v1.0 (https://www.xilinx.com/support/documentation/ip_documentation/tmr/v1_0/pg268-tmr.pdf)
UG1043 (Embedded System Tools Reference Manual) v2018.3 (https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_3/ug1043-embedded-system-tools.pdf)
UG984 (MicroBlaze Processor Reference Guide) v2018.3 (https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_3/ug984-vivado-microblaze-ref.pdf)

0 Kudos
1 Reply
Visitor mfaero
Visitor
256 Views
Registered: ‎05-30-2019

Re: Xilinx MicroBlaze TMR Fault Injection Help

Jump to solution

My team has discovered the answers to these questions. For the sake of anyone else getting started with the TMR IP suite, I am going to update this post with some details and resources.

1. How is the MicroBlaze TMR subsystem supposed to work here? It's my understanding from PG268 that the break signal should be generated immediately. Why isn't it? Is this some design choice or is something not working the way it's supposed to be?

     A: In our case, the break handler was not operating as expected because of a couple of configuration parameters in the TMR Manager IP:  C_BRK_DELAY_WIDTH and C_BRK_DELAY_RST_VALUE. If C_BRK_DELAY_WIDTH > 0, the TMR Manager will have the ability to delay the break signal by the amount of clock signals in the Break Delay Initialization Register (BDIR), whose reset value is determined by C_BRK_DELAY_RST_VALUE. If a delay is never desired, set C_BRK_DELAY_WIDTH to 0. If a delay is desired, set C_BRK_DELAY_WIDTH to an appropriate number of bits and either (i) set C_BRK_DELAY_RST_VALUE to the desired delay, or (ii) write the value into the TMR's BDIR at runtime and upon resets.

     One minor issue with the way this register is described in PG268 (p. 46) is that this does not seem to only apply to fatal signals as implied. I injected an error in a single processor (which is not a fatal error), and this value still determined the delay of the break signal being generated by the TMR Manager.

     An additional caveat here is to ensure that the Recover is Reset bit in the TMR Manager's Control Register (CR) is set to 1. Otherwise, the break signal won't be generated at all after the Microblaze issues its "suspend" instruction.

2. Why is the break handler running when I reprogram the device? Shouldn't the code start from the "reset" vector, not the break signal vector?

     A: This one, I don't have a good answer for. It may have just been a bug. However, I'll take the opportunity here to note that you can change the code at the reset vector (as encouraged by the "Recovery of the MicroBlaze Subsystem" instructions in PG268, p.11) by overriding the default startup files.

     This process is described in UG1043, the Embedded System Tools Reference Manual, pp. 39-43. It will tell you where to find the assembly files that need to be added to your project and modified. One thing those instructions are unclear about is exactly where the object files are that you need to pull in. It lists crti.o, crtbegin.o, crtend.o, and crtn.o files as necessarily files, but doesn't give you any additional hints. A search will show many different copies of these files, which can be narrowed down by architecture, but given the configurability of MicroBlaze, it may be unclear which versions you need. My tip is to go to Project → Properties → C/C++ Build → Settings → MicroBlaze gcc linker and add "-v" at the end of the "command" entry. Then, when your project is built, you'll see the path to the object files being used for your projects. Copy those into your project as well. At that point, you can create a custom start function that checks the TMR Manager's FFR and performs whatever recovery steps you wish.

3. For some reason, the ELF file shows the break handler vector existing at address 0x10, but according to the MicroBlaze Processor Reference Guide (UG984), the break handler vector should be located at 0x18. Why is this in the wrong place? __attribute__((break_handler)), available through the MicroBlaze compiler, should be placing this break handler into the right location.

     A: This is a really tricky one. I don't know why __attribute__((break_handler)) puts the code in the wrong place, but it definitely does.This is something that really needs to be fixed. I don't know if any Xilinx employees will see this (as this post has been entirely ignored so far), but this is one I'd really like to know the answer to. The workaround is fairly similar to the process described in the answer to question 2 above. Once you've pulled the startup files in and informed the linker not to use the standard startup files, you can manually place the break handler where it's supposed to go. You'll likely need to add the following to the linker script:     

     .vectors.break 0x18 : {
          KEEP (*(.vectors.break))
     }

     Then, add the following to your crt0.S (or equivalent) startup file (assuming you've called your break handler "_break_handler":

_vector_interrupt:
    brai _interrupt_handler

    .section .vectors.break, "ax"  // Lines above are for context
    .align 2

_vector_break:
    brai _break_handler  // Lines below are for context

    .section .vectors.hw_exception, "ax"
    .align 2

     Then, your break handler will actually be called when the MicroBlaze receives a break signal.

     If you don't want the break handler to be put into the interrupt handler section as well (which is where __attribute__((break_handler)) puts it), you'll need to remove the attribute. In this case, you'll need to add the following epilogue to your code:

// return from break
    asm volatile("lwi\tr15, r1, 0\n"
    			 "rtbd\tr16, 8\n"
    			 "addik\tr1, r1, 32\n"); // NOTE: This number will need to be changed based on the automatically-generated
    									 // preamble for this function. If more local variables are added, do the following:
    									 //		1. Compile the program.
    									 //		2. Locate this function in the ELF file (hint: check location 0x18, i.e., the break vector, for the address)
    									 //		3. Check the preamble for the instruction that increments the stack (r1 holds the stack pointer)
    									 //		4. Modify the last argument of this addik instruction to be the number from the instruction in step 3.
    									 //		5. Re-compile the program.

     Obviously this is pretty fragile considering that the value to be added to the stack pointer will change if local variables are added to this function, but I don't know how to make the break handler attribute stop putting the break handler in the right place. Again, input from a Xilinx employee would be very useful here.

 

I hope this can help somebody else. The TMR IP seems to have a pretty large learning curve, so hopefully this can help alleviate some of that.

Resources (current at the time of this post):

PG268 (MicroBlaze Triple Modular Redundancy IP Product Guide) v1.0 (https://www.xilinx.com/support/documentation/ip_documentation/tmr/v1_0/pg268-tmr.pdf)
UG1043 (Embedded System Tools Reference Manual) v2018.3 (https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_3/ug1043-embedded-system-tools.pdf)
UG984 (MicroBlaze Processor Reference Guide) v2018.3 (https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_3/ug984-vivado-microblaze-ref.pdf)

0 Kudos